Saturday, July 7, 2018

World Cup in Russia | Fall of Data Science

This world cup has turned out to be the death bed for Data Science and Machine Learning algorithms. Before the world cup I had seen at least two prediction reports coming from well established sources. One from UBS and the other from Goldman Sachs both concluding that it would Germany yet again.

Today I saw another prediction done by Sports Illustrated (image by the side) after the conclusion of the group stages. This one predicted a Brazil - Spain final.

As we know Germany was thrown out of the world cup in the group stages only. Spain was eliminated by Russia in the Round of 16 while today saw the last of Brazil at the hands of Belgium. Thus the top 3 teams most favored by algorithms to win the world cup are gone. And so far how it is going it is almost set to land in the hands of a first time winner. In fact while signing off with the predictions the data scientists should have taken into account that no team, apart from Italy in 30s and Brazil in 50s, in modern times has defended the world cup successfully.

The big financial organizations use predictive algorithms to predict the market and adjust their trade accordingly.  But how far does it makes sense to make algorithms to predict outcome of the football world cup?

As we know we are required to feed or train the algorithms with data; more the data we use to feed the algorithm, better would be the output. However in football there are many fallacies which gets introduced when operating under such an assumption. Historically 3 teams, Brazil, Germany and Italy has dominated the competition; they have 13 times in between them. Other all countries combined have won the trophy 7 times. Thus any prediction would always end up favoring these 3 countries; Italy didn't qualify this time so pointer moves to Brazil or Germany.

However football world cup is once-in-four-years event. I am not sure how algorithms are taking care of the team performance in the intervening years. And even if it does because most of the contests (apart from quadrennial Copa or Euro) in the years between are friendly or inconsequential, the passion and commitment in play at world cup is much higher as the stakes are high too.

FIFA ranking another indicator which could be used by the algorithms. That against a misleading one. Else how do you justify lowly ranked Russia beating Spain or Japan giving Belgium a run for their spot in quarterfinals. It's like the errors which are inherent of the FIFA ranking creeps into the predictive algorithm and leads to flawed prediction.

Out of the 8 teams who will be competing in the quarterfinals only 3 are present in the FIFA ranking top 10 whereas we also have Russia ranked around 70.

It would not be efficient for predictive algorithms to predict the outcome of an event just based on historical data of team performance. There are many other factors which also impact performance which may not have a historical precedent. It could be like the coach getting replaced at the last moment for Spain, key player getting injured as for Egypt etc. Even their home advantage factor should be counted evident from the run of the Russian team in this world cup; Russians consider this to be  the weakest one in many years .

Sometimes weather could be playing a crucial factor as becomes visible in the South American qualifiers played high in the Andes. To be more accurate in prediction, algorithms could have taken into performance of individual players at club level and how many players of a particular national team also play their club football together. For example Germany has tasted success in the past because most of the German national team members play their club football together at Bayern Munich. Or in case of Spain it would Barcelona and Real Madrid.

Predicting could be really tough when you see previous champions and runners up like Italy and Netherlands not able to cut the qualifiers. And also when you consider un-explainable factors like a South American country never have won the competition in Europe.

So the simple route of data - algorithm - result may not be the correct path to predict the outcome of a competition like world cup. It's more complex than that. There is more of passion and pride involved parameters which no formula would be able to consider as inputs. This world cup have for the time being has trashed their outcome. Let's enjoy the game and leave out all number crunching for the less enjoyable things.

P.S: Here is another article on the failure of AI in this World Cup.

https://medium.com/@bakhtiyari/artificial-intelligence-failed-in-world-cup-2018-6af10602206a

No comments:

Post a Comment