views:

208

answers:

3

Hello. Say I have a base form of a word and a tag from the Penn Treebank Tag Set. How can I get the conjugated form? For example for "do" and "VBN" how can I get "done"?

I thinks this task is already implemented in some nlp library, so I'd rather not invent the bicycle. Does something like that exist?

+1  A: 

What you want to do here is create a sparse array holding the answers, indexable via the term itself as one key, and the PTTS-code (CC, TO, VBD) as the other key.

pbr
Is there a library with such functionality? It feels like it can be already done by someone
roddik
It's just a dataset; doesn't need much of an interface. Making a library would be pretty easy; there's not much to do other than getters and setters, and the data itself with all the right answers.those right answers do have to be typed in at least once, by someone motivated with sufficient need of a working solution.Many things aren't yet libraries; this is how new ones start. If you take the first steps of entering the sparse array dataset, and share that, others can make it into a "library" for others to use in the futures.This is how open source projects are born.-pbr
pbr
A: 

If you have a class:

public Treebank {
    public String conjugate(String base, String formTag);

    ...
}

Then:

String conjugated = treebank.conjugate(base, formTag);

If you don't have the Treebank class it might look a bit like this:

public Treebank {
    private Map<String, Map<String, String>> m_map = new HashMap<String, Map<String, String>>();

    public Treebank() {
        populate();
    }

    public String conjugate(String base, String formTag) {
        return m_map.get(base, formTag);
    }

    private void populate() {
        InputStream istream = openDataFile();

        try {
            for (Record record = readRecord(istream); record !== null; record = readRecord(istream)) {

                // Add the entry
                Map<String, String> entry = m_map.get(record.base);

                if (entry == null)
                    entry = new HashMap<String, String>();

                entry.put(record.formTag, record.conjugatedForm);
                m_map.put(record.base, entry);
           }
        }
        finally {
            closeDataFile(istream);
        }
    }

    // Data management - to be implemented.
    private InputStream openDataFile()                     { ... }
    private Record      readRecord(InputStream istream)    { ... }
    private void        closeDataFile(InputStream istream) { ... }

    private static class Record {
        String base;
        String formTag;
        String conjugatedForm;
    }
}

A better solution might involve a database instead of a data file. I would also refactor the data access code into a Data Access Object.

richj
+1  A: 

SimpleNLG does this. For example, getting the superlative (widest) of the adjective (wide) is as easy as:

String superlative = new Adjective("wide").getSuperlative();

Of course it handles irregularities as well.

roddik