data mining - Sentiment analysis training set -


i using nltk python sentiment analysis , data has 200,000 reviews. use naive bayes classifier, need have training set labeled. since data not labeled, manually created 100 reviews positive , negative. don't think way it. heard need have 20% of data training set train classifier , apply rest 80% of data.

is there better way generate training set naive bayes classifier? thank help, , please let me know if questions not clear understand.

we have had great success using 100-200 training samples (depending on specific classification) classify hundreds of thousands of paragraphs high degree of accuracy.

we did hand-filter randomly selected samples ensure not similar each other (and therefore represent different ways express concept). used rapidminer classification rather nltk, expect algorithms similar.

run classifier 100 reviews, run against set of 100 random reviews not in training set. check accuracy, , add more reviews training set if accuracy not want be.


Comments

Popular posts from this blog

php - How to modify a menu to show sub-menus -

shopping cart - Page redirect not working PHP -

python - Installing PyDev in eclipse is failed -