views:

443

answers:

6

I'm looking for syntatic examples or common techniques for doing regular expression style transformations on words instead of characters, given a procedural language.

For example, to trace copying, one would want to create a document with similar meaning but with different word choices.

I'd like to be able to concisely define these possible transformations that I can apply to a text stream.

Eg. "fast noun" to "rapid noun", but "go fast." wouldn't get transformed (no noun afterwards.
Or: "Alice will sing song" to "song will be sung by Alice"

I'd expect this to be done in grammatical checkers, such as detecting passive voice.

A C# implementation for this sort of language-processing would be really neat, but I think the bulk of any effort is coming up with the right rules - Keeping the rules clear and understandable seems like a place to begin.

A: 

A good place to start would be SIL's CARLAStudio for its "Computer Assisted Related Language Adaptation" suite. Alternatively SIL's Adapt It. SIL has a huge range of linguistic analysis software, which is the direction you appear to be going. It's certainly a big jump from regular expressions, which don't care about the meaning, to something that can handle linguistic analysis.

boost
I suspect I phrased the question wrong, and I'll try to understand where I went wrong.I expected that I'd write the rule-set, I'm looking for examples of rule-sets and if there's a standard way of defining them.
Procedural Throwback
A: 

If you want something more robust for natural language parsing/transforming, you could try the C# port of OpenNLP.

CVertex
+2  A: 

One good place to start researching would be "Word Net" - it's a dictionary of semantics, grouping words together by similar meaning, and also recording the relationships between words in useful ways.

There are a bunch of software projects leveraging the Word Net corpus, one of them may be what you need.

Bevan
Thanks for a starting point. Can you recommend any projects that change the text based on a human entered rule?
Procedural Throwback
+2  A: 

You could try Jason Rennie > WordNet-QueryData-1.47 > WordNet::QueryData

boost
A: 

I am not aware of any syntaxes that exist for English language processing like you discuss. You would need to create your own DSL using one of the toolsets (such as Word Net) out there.

Myrddin Emrys
A: 

If you aren't tied to a particular language, Haskell has Aarne Ranta's Grammatical Framework:

http://www.grammaticalframework.org/

which is explicitly designed to generate parsers, etc for natural language processing of this sort.

Edward Kmett