Hier kommt das versprochene Beispiel zu k-means Clustering. Angelehnt ist es an Ledolter, Johannes. 2013. Business analytics and data mining with R. Hoboken, NewJersey: Wiley . df = read.csv ( "http://www.biz.uiowa.edu/faculty/jledolter/DataMining/protein.csv" ) View ( df ) #what is the right number of clusters? #One idea is to look at the sum squared error (SSE) for #each possible number of clusters. #Formula to calculate SSE: wss <- ( nrow ( df ) - 1 ) * sum ( apply ( df [ , - 1 ] , 2 , var ) ) for ( i in 2 : 20 ) wss [ i ] <- sum ( kmeans ( df [ , - 1 ] , centers=i ) $withinss ) plot ( 1 : 20 , wss , type= "b" , xlab= "Number of Clusters" , ylab= "Within groups sum of squares" ) #The "elbow" now indicates the optimal number of clusters. The #idea is that at a certain point additional clusters do not #reduce the SSE much... #In this case, we might decide to take 5 clusters. #But ...