I have a MySQL database table with a couple thousand rows. The table is setup like so:
id | text
The id
column is an auto-incrementing integer, and the text
column is a 200-character varchar.
Say I have the following rows:
3 | I think I'll have duck tonight
4 | Maybe the chicken will be alright
5 | I have a pet duck now, awesome!
6 | I love duck
Then the list I'm wanting to generate might be something like:
- 3 occurrences of 'duck'
- 3 occurrences of 'I'
- 2 occurrences of 'have'
- 1 occurrences of 'chicken'
- .etc .etc
Plus, I'll probably want to maintain a list of substrings to ignore from the list, like 'I', 'will' and 'have. It's important to note that I do not know what people will post.
I do not have a list of words that I want to monitor, I just want to find the most common substrings. I'll then filter out any erroneous substrings that are not interesting from the list manually by editing the query.
Can anyone suggest the best way to do this? Thanks everyone!