tags:

views:

52

answers:

4

Im looking to build a topic map to catagorize content.

For example the Topic 'Art' may have sub categories of 'Art History', 'Painting', 'Sculpture' etc etc.

I've crawled a few online resources, but I've hit a problem related to how I wish to use the heirarchy.

I've got a lot of content that I wish to index by topic. So to give the above example, if a user searches for 'Art' then they will not only get anything that mentions 'Art', but also anything that mentions 'Painting', even if it doesnt mention 'Art'. Fair enough.

But if, in another part of my heirarchy, I have 'House Maintenance', for example, then that might also have a subtopic of 'Painting'.

But then if a user searches for 'Art', my engine will say 'well, Painting is a sub category of 'Art', so I'll include this peice of content thats all about the best colour to paint your bathroom walls....

Has anyone come across this problem before? I've tried googling, but without knowing the exact terminology its hard to make headway....

EDIT: More succinctly, 'Painting' is a subtopic of 'Art', but if something is about 'Painting' then it doesnt neecssarily follow that its about 'Art', since 'Art' is not the only parent of 'Painting'.

A: 

I don't know of a specific name for that, but I don't think it should really be a problem, either. All it calls for is that Art/Painting and House Maintenance/Painting are understood as separate entities. Someone searching for "art" gets subcategories of Art, so gets Art/Painting. Someone searching for "house maintenance" gets subcategories of House Maintenance, so gets House Maintenance/Painting. Someone searching for "painting" gets Art/Painting and House Maintenance/Painting, which is appropriate.

chaos
The problem is that my content has no context - i only have lumps of text. So if a piece of text mentions 'Painting' then should it go in the Arts/Painting Node, or in the HM/Painting node? Or both?
Visage
Oh, I see. I didn't understand that you were talking about automatic categorization. That's the term I'd suggest you google, then.
chaos
A: 

Since you want to process House/Painting and Art/Painting differently, then it seems like you'll need two distinct entries for Painting (one for each meaning). Which one you associate a given 'lump of text' with could be based on context clues from the text itself, if your text processor is powerful enough.

For example, whenever you have a conflict like this, look in the text - do you see other words there? Like 'sink', 'wall', 'hard wood', or 'windows'? Or do you see other terms like 'Monet', 'impressionism', 'canvas', and 'gallery'? That'll allow you to automate the decision, and should be fairly accurate. The only snag is that this presumes you have a fairly healthy dictionary of 'related terms' lying around somewhere.

On the user-end, when Painting is selected, you'd simply have to either merge all the results together, or present the user an option to select which parent topic they want to be viewing results from.

Chris
A: 

Information Architecture for the World Wide Web would give you a good start on organizing information... it's a good read, but might not be so technically detailed.

Tony Collen
A: 

If the Topic Map you are creating is built on Topic Maps technology, then subjectIdentifiers can be used to distinguish between two Topics with the same name (both named "Painting") that actually represent two different Subjects (Painting as an Art form, and Painting in the sense of home renovation).

If someone queries about Art and you drill down to Painting, then you can return only those entries related to 'Painting as an Art form' because those Painting entries are no longer thrown together on one heap.

dafmetal