I'm working on an implementation of A Naive Bayes Classifier. Programming Collective Intelligence introduces this subject by describing Bayes Theorem as:
Pr(A | B) = Pr(B | A) x Pr(A)/Pr(B)
As well as a specific example relevant to document classification:
Pr(Category | Document) = Pr(Document | Category) x Pr(Category) / Pr(Document)
I was hoping someone could explain to me the notation used here, what do Pr(A | B) and Pr(A) mean? It looks like some sort of function but then what does the pipe mean, etc? (I am a little lost)
Thanks in advance.