views:

60

answers:

2

hi..i m in a doubt... i need to calculate term frequency of term in a document... what i did is simply just " counted the no of times that term appears in that document"...if that term appeared say 138 times i took the tf value as 138....m i doing right..?? as i read somewhere that termfrequency (tf)= term count/ no of words in the document...if this is true den how do i calculate the no of words in a document..is der some regex for it???

pls do reply..thank u

A: 

In most regular expression implementations there is the notion of a word boundary, \b. So a regex that would match one word could look like this: \b(\w+)\b.

Basically, what the regex says is: Match a word boundary, then at least 1 word character (\w+) and then a word boundary again. The enclosing parenthesis simply add the matched word to a group so that you can extract it later. This is probably not necessary in your case, so you can remove those if you like.

I hope that helps you a bit.

klausbyskov
thanks guys....really appreciated..i m using c#
jaskirat
A: 

you don't mention what language/program your using. Most text editors will tell you how many words are in the document. In unix you can use the 'wc -w filename' command.

Winter