Taken into acount, how long it took to just load the data from the kaggle competition, working with it is kinda scarry.
Therefore, I gonna speed up my system by using GPU computing in R. Modern graphic cards are very powerful and can - in principle - work like a highperformance cluster.
Following this brilliant tutorial by Chi Yau, I managed to get Cuda running on my Ubuntu machine. It took some time, especially the "make" command at the end. But, as you will see shortly, it is absolutely worth the affort.
The second part of the tutorial shows how to install rpud (don't try to install it directly from R-Studio, but follow these steps.) Again, it took some time, especially until I realized that the type="source" parameter has to be added when installing the packadge.
Finaly, everything is working and I followed Chi Yau's example and calculated a distant matrix for datasets with huge number of vectors.
The gain in speed is unbelivable!
And here is the code for the test:
Therefore, I gonna speed up my system by using GPU computing in R. Modern graphic cards are very powerful and can - in principle - work like a highperformance cluster.
Following this brilliant tutorial by Chi Yau, I managed to get Cuda running on my Ubuntu machine. It took some time, especially the "make" command at the end. But, as you will see shortly, it is absolutely worth the affort.
The second part of the tutorial shows how to install rpud (don't try to install it directly from R-Studio, but follow these steps.) Again, it took some time, especially until I realized that the type="source" parameter has to be added when installing the packadge.
Finaly, everything is working and I followed Chi Yau's example and calculated a distant matrix for datasets with huge number of vectors.
The gain in speed is unbelivable!
And here is the code for the test:
library(rpud) test.data <- function(dim, num, seed=17) { set.seed(seed) matrix(rnorm(dim * num), nrow=num) } ti <- c() for (i in c(500,1000,1500,2000,2500,3000,3500,4000,4500,5000)){ m <- test.data(100,i) ti <- c(ti, system.time(dist(m))["elapsed"]) } tiCuda <- c() for (i in c(500,1000,1500,2000,2500,3000,3500,4000,4500,5000)){ m <- test.data(100,i) tiCuda <- c(tiCuda, system.time(rpuDist(m))["elapsed"]) } barplot(matrix(c(ti, tiCuda), 2,10, byrow=T), beside = T, main = "CPU vs GPU: distant matrix", xlab = "number of vectors", ylab = "seconds", col = c("darkgray", "darkorange3"), names.arg = c(500,1000,1500,2000,2500,3000,3500,4000,4500,5000))
Kommentare
Kommentar veröffentlichen