I have to do a final project for my computational linguistics class. We've been using OCaml the entire time, but I also have familiarity with Java. We've studied morphology, FSMs, collecting parse trees, CYK parsing, tries, pushdown automata, regular expressions, formal language theory, some semantics, etc.
Here are some ideas I've come up with. Do you have anything you think would be cool?
A script that scans Facebook threads for obnoxious* comments and silently hides them with JS (this would be run with the user's consent, obviously)
An analysis of a piece of writing using semantics, syntax, punctuation usage, and other metrics, to try to "fingerprint" the author. It could be used to determine if two works are likely written by the same author. Or, someone could put in a bunch of writing he's done over time, and get a sense of how his style has changed.
A chat bot (less interesting/original)
I may be permitted to use pre-existing libraries to do this. Do any exist for OCaml? Without a library/toolkit, the above three ideas are probably infeasible, unless I limit it to a very specific domain.
Lower level ideas:
Operations on finite state machines - minimizing, composing transducers, proving that an FSM is in a minimal possible state. I am very interested in graph theory, so any overlap with FSMs could be a good venue to explore. (What else can I do with FSMs?)
Something cool with regex?
Something cool with CYK?
Does anyone else have any cool ideas?
*obnoxious defined as having following certain patterns typical of junior high schoolers. The vagueness of this term is not an issue; for the credit I could define whatever I want and target that.