creating datasets,stemming and stop word removal
Tokenize the data after removing stop-words and stemming.
For each data set ( not each file) count the number of time a token appears. Do not count all tokens. Create an arff (WEKA format) file for each Data set. The attribute will be token and the value will be count.
6 freelancer menawar dengan rata-rata $47 untuk pekerjaan ini
I have been working as a software developer for more than one and half year on python scripting and having good knowledge on algorithms and data structures