Political Data Science

Posts

Es werden Posts vom Dezember, 2014 angezeigt.

Using ngrams with #RTextTools

This littel example shows a workaround for a bug in RTextTools. Using ngramLength would lead to an error. But we can use the RWeka library and tm, then go back to RTextTools: library ( RTextTools ) texts <- c ( "This is the first document." , "Is this a text?" , "This is the second file." , "This is the third text." , "File is not this." ) library ( RWeka ) library ( tm ) TrigramTokenizer <- function ( x ) NGramTokenizer ( x , Weka_control ( min = 3 , max = 3 ) ) dtm <- DocumentTermMatrix ( Corpus ( VectorSource ( texts ) ) , control = list ( weighting=weightTf , tokenize = TrigramTokenizer ) ) as.matrix ( dtm ) isText <- c ( T , F , T , T , F ) container <- create_container ( dtm , isText , virgin=F , trainSize= 1 : 3 , testSize= 4 : 5 ) models=train_models ( container , algorithm= c ( "SVM" , ...

#Hipster Frisuren und #Bluecard korrelieren

Soeben bin ich auf folgenden Zusammenhang gestoßen: Die Suchanfragen in Deutschland nach den Begriffen "EU blue card" und "hipster frisuren" korrelieren sehr stark, laut google correlate : Gehen hochqualifizierte Migranten in Deutschland gleich zum Friseur? Wohl eher eine nette Scheinkorrelation...

My favorite #statisticsjoke