Friday, April 11, 2014

Week 12: Reading Notes

IIR Chapter 13
Text classification and Naive Bayes
1. The text classification problem
TRAINMULTINOMIALNB(C,D)
1 V EXTRACTVOCABULARY(D)
2 N COUNTDOCS(D)
3 for each c C
4 do Nc COUNTDOCSINCLASS(D, c)
5 prior[c] Nc/N
6 textc CONCATENATETEXTOFALLDOCSINCLASS(D, c)
7 for each t V
8 do Tct COUNTTOKENSOFTERM(textc, t)
9 for each t V
10 do condprob[t][c] Tct+1
åt′ (Tct′+1)
11 return V, prior, condprob
APPLYMULTINOMIALNB(C,V, prior, condprob, d)
1 W EXTRACTTOKENSFROMDOC(V, d)
2 for each c C
3 do score[c] log prior[c]
4 for each t W
5 do score[c] += log condprob[t][c]

6 return argmaxcC score[c]

2. Relation to multinomial unigram language model
3. The Bernoulli model
4. Properties of Naive Bayes
4.1 A variant of the multinomial model
5. Feature selection: is the process  of selecting a subset  of the terms occurring
in the training set and using only this subset as features in text classification.
5.1 Mutual information



No comments:

Post a Comment