ansaurus

Question

Answer 1

+4 A:

OpenCyc is a computer-usable database of real-world concepts and meanings. From their web site:

OpenCyc is the open source version of the Cyc technology, the world's largest and most complete general knowledge base and commonsense reasoning engine. OpenCyc can be used as the basis of a wide variety of intelligent applications

Beware though, that it's an enormously complex reasoning engine -- real-world facts never were simple. Documentation is quite sparse and the learning curve is steep.

intgr 2009-11-17 14:24:26

Thank you, I've never heard of it before; I'll look into it. my question was more towards the how, rather than which tool.

dassouki 2009-11-17 14:29:22

Answer 2

A:

It should be fairly straightforward. You can use straight synonyms in addition to a series of words to define each word. The word order in the definition is sometimes important. Each word can have multiple definitions, of course.

You can develop a rating system to see which definitions are the closest match to the input, then display the top 3 or 4 words.

xpda 2009-11-17 14:25:27

So the hard part, in your opinion is doing the keyword engine

dassouki 2009-11-17 14:26:59

That would be really hard if you had to do it by hand. However, if you have access (and rights) to a dictionary, you can read it into a database and use that. That will take the most design work, though.

xpda 2009-11-17 19:55:11

Answer 3

A:

This sounds like a job for Prolog.

leppie 2009-11-17 14:26:10

I would say it should be done with a comptuer ;)

Janusz 2009-11-17 14:30:45

the question is more about the "how" rather than the "which tool", thanks for the input though

dassouki 2009-11-17 14:31:38

The tool would be a start....

leppie 2009-11-17 15:03:50

Answer 4

+1 A:

First, there must be some way of associating concepts (like 'snow') with particular words.

So rather than simply storing a wordlist, you would also need to store concepts or properties like "red", "fruit", and "edible" as well as the keywords themselves, and model relationships between them.

At a simple level, you could have two tables (don't have to be database tables): a list of keywords, and a list of concepts/properties/adjectives, then you model the the relationship by storing another table which represents the mapping from keyword to adjective.

So if you have:

keywords:

0001  aardvark
....
0050  strawberry
....
0072  tomato
....
0120  zoo

and concepts:

0001  big
0002  small
0003  fruit
0004  vegetable
0005  mineral
0006  metal
....
0250  black
0251  blue
0252  red
....
0570  edible

you would need a mapping containing:

0050 -> 0003
0050 -> 0252
0050 -> 0570
0072 -> 0003
0072 -> 0252
0072 -> 0570

You may like to think of this as modelling an "is" relationship: 0050 (a strawberry) "is" 0003 (fruit), and "is" 0252 (red), and "is" 0570 (edible).

Nick Dixon 2009-11-17 14:36:12

Thank you, so the hard part is building those relationships, and in a way, your suggestion seems like building a search engine. One day I thought of is actually having antonyms. so if you're looking for a "sad", right off the bat, it'll take all the words out that are related to "sad"'s antoynm.

dassouki 2009-11-17 14:40:14

Antonynms can carry its own challenges to some extent as some words can have antonyms that aren't related,e.g. bitter and sour are both antonyms of sweet. Similarly, sad and mad are both antoynms of happy but what relation do they have is another question.

JB King 2009-11-17 14:47:34

Answer 5

+3 A:

Any approach would basically involve having a normalized database. Here is a basic example of what your database structure might look like:

// terms
+-------------------+
| id | name         |
| 1  | tomatoes     |
| 2  | strawberries |
| 3  | peaches      |
| 4  | plums        |
+-------------------+

// descriptions
+-------------------+
| id | name         |
| 1  | red          |
| 2  | edible       |
| 3  | fruit        |
| 4  | purple       |
| 5  | orange       |
+-------------------+

// connections
+-------------------------+
| terms_id | descript_id  |
| 1        | 1            |
| 1        | 2            |
| 1        | 3            |
| 2        | 1            |
| 2        | 2            |
| 2        | 3            |
| 3        | 1            |
| 3        | 2            |
| 3        | 5            |
| 4        | 1            |
| 4        | 2            |
| 4        | 4            |
+-------------------------+

This would be a fairly basic setup, however it should give you an idea how many-to-many relationships using a look-up table work within databases.

Your application would have to break apart strings and be able to handle normalizing the input for example getting rid of suffixes with user input. Then the script would query the connections table and return the results.

evolve 2009-11-17 14:38:10

As I've said before, the daunting task becomes of building a keyword engine for 50k+ words

dassouki 2009-11-17 14:45:59

A social method might be best, have users offer keywords, and then have moderators confirm them.

evolve 2009-11-17 14:49:02

ya it's all about social media. The problem with that, is that I'll be riding on the dream that people will actually use the app

dassouki 2009-11-17 15:12:05

You have to use it first, you'll end up contributing a lot of data to start. You gotta start somewhere or you won't get anywhere.

evolve 2009-11-17 15:30:41

Answer 6

+1 A:

How will your engine know that

"An incredibly versatile ingredient, essential for any fridge chiller drawer. Whether used for salads, soups, sauces or just raw in sandwiches, make sure they are firm and a rich red colour when purchased",
"mildly acid red or yellow pulpy fruit eaten as a vegetable", and
"an American musician who is known for being the lead singer/drummer for the alternative rock band Sound of Urchin"

all map to the same original word? Natural language definitions are unstructured, you can't store them in a normalized database. You can attempt to structure it by reducing to an ontology, like Princeton's WordNet, but creating and using ontologies is an extremely difficult problem, topic of phd theses and well funded advanced research.

Dustin Getz 2009-11-17 14:56:58

That makes sense, but I guess the sentences you mentioned, although valid, fall a bit outside my scope. the same analogy could be about vague explanations, such as "big blue thing" (sky, sea, the monster from "monsters vs. aliens".

dassouki 2009-11-17 15:09:22

Answer 7

+3 A:

To answer the "how" part of your question, you could utilize human computation: There are hordes of bored teenagers with iPhones around the globe, so create a silly game whose byproduct is filling your database with facts -- to harness their brainpower for your purposes.

Sounds like an awkward concept at first, but look at this lecture on Human Computation for an example.

intgr 2009-11-17 14:59:13

you're a genius

dassouki 2009-11-17 15:04:24

this is highly dependent on bad spelling teenagers AND the hope that lots of people will download and use the app

dassouki 2009-11-17 15:10:37

Both of these issues are addressed in that presentation."bad spelling teenagers" -- build the game such that the goal is validating others' factoids."people will download and use the app" -- create a web-based game

intgr 2009-11-17 15:15:21

Have you ever heard of 20q.net? This is a perfect example of getting the masses to populate your database.

NickLarsen 2009-11-20 14:58:18

Answer 8

A:

what about using a dictionary, and performing a full-text search over the definitions (after removing link words and article, like 'and', 'or'...), then returning the word which has the best score (highest number of matching words or maybe a more complicated scoring method) ?

Adrien Plisson 2009-11-17 15:41:27

that sounds great, but 2 different descriptions could lead to the same word.

dassouki 2009-11-17 16:07:03

yes, but there is a lot of words which have multiple meaning, thus you will always have multiple definitions which may lead to the same word...

Adrien Plisson 2009-11-17 16:12:31

ansaurus

tags:

views:

answers:

Building a reverse language dictionary

related questions