tags:

views:

32

answers:

1

As I understand it, the Freebase taxonomy generally boils down to this hierarchy:

Domain Category > Domain > Type > Topic

I have an application that receives input and does a bit of natural language processing that spits out a bunch of terms--some useful and some not. In an initial effort to systematically "decide" whether a term is useful, my thought is to "test" it against Freebase by assuming it's a topic and seeing whether Freebase has the term classified under at least one type.

So what I'm trying to do now is, given a topic, find its type IDs (and names, ideally). If none are returned, that tells me something about the so-called topic. If one or more types is returned, then I not only have some measure of the term's usefulness, but also an ability to overlay the Freebase taxonomy and give folks a different method of accessing it (via that tree metaphor).

For example, I might receive "Politics", "Political organization", "administration", "photo", "MSN", etc. from the NLP engine. What kind of MQL query can tell me which type(s) are connected to those topics, if any?

Thanks for your help.

UPDATE

I just had one of those grandiose head slap moments. I stepped away from the query I'd been tinkering with for a while and when I got back, I saw the error of my ways. I was trying to make this way too difficult and, as always, the simple solution that I couldn't see was exactly what I needed to see:

[{
  "id": null,
  "name": "Politics",
  "type": [{"id": null, "name": null }]
}]​

This leads me to a slightly different question, though. What I get back is multiple topics, one of which is en/politics and a bunch of others whose id is /m/..., etc. I understand that the Freebase system is complex, but I'm a long way from understanding that complexity. For this kind of exercise, am I mostly likely to want the /en/ topic?

A: 

In general, the /en/ topics are more notable than /m/ topics. The /m/ IDs are automatically assigned to any new topic that gets added to Freebase, but the /en/ have to be added manually or semi-automatically by the community. So far, most of the /en/ keys come from Wikiedia (which has its own notability requirements) but they can come from anywhere.

Here is a list of some of the other popular namespaces that are used in Freebase.

Also, since you mentioned using NLP to match topics from text to Freebase, you might be interested in reading about the experimental Reconciliation API. This is how you would find the "best match" for a topic given the contextual clues available in your data.

narphorium
Thanks, this is great. The Reconciliation API looks cool, but may not be a fit for me right now since it seems to require a lot more (Freebase-centric) information than I'll have to send it. Even if I got back multiple responses, I wouldn't be able to vet them at a systems level.
Rob Wilkerson
I understand; its certainly a problem that many people have. Maybe the Search API (www.freebase.com/docs/web_services/search) would be sufficient in your case. It lets you pass in a topic name and gives each result a numerical score estimating how closely it matches by comparing it against Freebase data as well as Wikipedia blurbs.
narphorium