tags:

views:

186

answers:

1

I'm interfacing with WordNet, and some of the terms I'd like to classify (various proper names) are capitalised in the database, but the input I get may not be capitalised properly. My initial idea here is to write a predicate that produces the various capitalisations possible of an input, but I'm not sure how to go about it.

Does anyone have an idea how to go about this, or even better, a more efficient way to achieve what I would like to do?

+1  A: 

It depends on what Prolog implementation you're using, but there may be library functions you can use.

e.g. from the SWI-Prolog reference manual:

4.22.1 Case conversion

There is nothing in the Prolog standard for converting case in textual data. The SWI-Prolog predicates code_type/2 and char_type/2 can be used to test and convert individual characters. We have started some additional support:

downcase_atom(+AnyCase, -LowerCase)

Converts the characters of AnyCase into lowercase as char_type/2 does (i.e. based on the defined locale if Prolog provides locale support on the hosting platform) and unifies the lowercase atom with LowerCase. upcase_atom(+AnyCase, -UpperCase) Converts, similar to downcase_atom/2, an atom to upper-case.

Since this just downcases whatever's passed to it, you can easily write a simple predicate to sanitise every input before doing any analysis.

ire_and_curses
I'm using SWI-Prolog (I was going to mention that, but forgot), so that's a great help, thanks! It's a bit slow unfortunately, due to having to search the whole database for words that are identical to the search term lowercased, rather than just being able to do a lookup. But it works, so cheers again!
arnsholt
@arnsholt:Glad its useful. The obvious alternative, as you suggest, would be to try two lookups, the first with the provided input, and a secondary lookup with the first letter capitalised if the first finds no match. You can use char_type to do the test and generate the conversion (see the manual page for the details). I would expect two lookups to be more efficient than a database match. Something to try, anyway.
ire_and_curses