This littel example shows a workaround for a bug in RTextTools.
Using ngramLength would lead to an error. But we can use the RWeka library and tm, then go back to RTextTools:library(RTextTools) texts <- c("This is the first document.",
"Is this a text?", "This is the second file.",
"This is the third text.",
"File is not this.") library(RWeka) library(tm) TrigramTokenizer <- function(x) NGramTokenizer(x, Weka_control(min = 3, max = 3)) dtm <- DocumentTermMatrix(Corpus(VectorSource(texts)), control=list(weighting=weightTf, tokenize = TrigramTokenizer)) as.matrix(dtm) isText <- c(T,F,T,T,F) container <- create_container(dtm, isText, virgin=F, trainSize=1:3, testSize=4:5) models=train_models(container, algorithm=c("SVM","BOOSTING")) classify_models(container, models)
It was an old question on stackoverflow, which I just have answered.
Kommentare
Kommentar veröffentlichen