I now present the R-code for the random forest prediction (the group-variable was added here ): df=read.csv("WorldCup2014Test.csv") WC=read.csv("WorldCup2014.csv") #I am setting all NAs to 0. This might be a bad idea, but it works. df[is.na(df)]=0 #We want to run a randomforest as classifier #First, we code a response variable Y with "w" (win), "d" (draw) #and "l" (loss) df$Y=ifelse(df[,4]-df[,3]>0,"w",ifelse(df[,4]-df[,3]==0,"d","l")) df$Y=as.factor(df$Y) library(randomForest) #We want to auto tune the random forest: This requires #response and predictors to be in a matrix y=as.matrix(df$Y) x=as.matrix(df[,-c(1,2,3,4,5,6,76)]) rf.tune = tuneRF(x=x,y=as.factor(y), type="pob", doBest=T) #The random forest includes a confusion matrix rf.tune$confusion #The model is poor in draws, but quite good in predicting wins #Let's predict the World Cup! WC[is.na(WC)]=0 xte=as.m...