Political Data Science

Posts

Posts mit dem Label "Sports" werden angezeigt.

Group Games Predictions

And the graph for the group games:

Here are some updated graphs of the random forest prediction: The lines show which teams will win more than 50 per cent (33 per cent) of their matches. Of course, this means a lot of randomness. As we have seen, even Spain does not win every match... and if you loose the wrong match...

Rank-Order of Soccer World Cup Predictions

In this post I start to present some simple graphs to visualize the results from the random forest prediction. Here, we see the mean of the winning probability of all teams in increasing order:

RandomForest to predict the Soccer World Cup

I now present the R-code for the random forest prediction (the group-variable was added here ): df=read.csv("WorldCup2014Test.csv") WC=read.csv("WorldCup2014.csv") #I am setting all NAs to 0. This might be a bad idea, but it works. df[is.na(df)]=0 #We want to run a randomforest as classifier #First, we code a response variable Y with "w" (win), "d" (draw) #and "l" (loss) df$Y=ifelse(df[,4]-df[,3]>0,"w",ifelse(df[,4]-df[,3]==0,"d","l")) df$Y=as.factor(df$Y) library(randomForest) #We want to auto tune the random forest: This requires #response and predictors to be in a matrix y=as.matrix(df$Y) x=as.matrix(df[,-c(1,2,3,4,5,6,76)]) rf.tune = tuneRF(x=x,y=as.factor(y), type="pob", doBest=T) #The random forest includes a confusion matrix rf.tune$confusion #The model is poor in draws, but quite good in predicting wins #Let's predict the World Cup! WC[is.na(WC)]=0 xte=as.m...

Spain wins the World Cup… probably

For all possible combinations with two teams I calculated the probability for each team to win. The model is a tuned random forest for classification of “win”, “draw”, or “loose”. The test data is the whole data from codecentrics (more or less). The plot shows the mean winning probability for each team but also shows the variance in the predictions. For the second plot only the games of the group-phase were used. So this plot shows the group-winners.

DataMining the Soccer World Cup 2014

To predict the results of sport events is always fun - and an enormous challenge. Especially in a tournament like the soccer world cup: Teams compete who play very rarely against each other. In soccer, luck is one of the most important factors (which means most of the data available represents noise…) - especially in a knock out tournament… So what could be more hopeless and more fun than using data-mining to predict the results. The codecentric guys did an excellent job in providing data on past tournaments. But matchday is coming soon and we need a dataset with the actual tournament to apply a model. And this is what I did. The R-script creates a new dataset with all possible combinations of teams playing against each other in the 2014 world cup. And it reconstructs the actual values of the codecentric-features for these matches. So, if you build a model on the codecentric data, you can easily apply this new dataset to predict new values. In the following posts, I will demons...

Political Data Science

Dieses Blog durchsuchen

Posts

Group Games Predictions

New Graphs Soccer Prediction

Rank-Order of Soccer World Cup Predictions

RandomForest to predict the Soccer World Cup

Spain wins the World Cup… probably

DataMining the Soccer World Cup 2014