tags:

views:

49

answers:

2

I am trying to insert an XML file of the following kind:

<thing>
    <name>one</name>
    <type>metal</type>
    <type>round</type>
</thing>
<thing>
    <name>two</name>
    <type>round</type>
</thing>
<thing>
    <name>three</name>
    <type>metal</type>
    <type>round</type>
</thing>

into an SQL database. There are lots of <thing> elements and each has one or more <type> elements. There are a lot of things, but only a few repeated different patterns of the <type> element which each thing might have, so I have created a table of thing, which has columns id and pattern, a table of pattern, a table of type_pattern which has columns type_id and pattern_id, and a table of type which has a txt column for the word, such as metal or round, and an id column. As I parse the XML file, I wish to categorize each thing into a particular type_pattern, which is the pattern of types which it matches. For example, things one and three in the above match a pattern of having metal and round type, but thing two has a different pattern of only round type. So the database table for the above might look like

thing

 id   pattern_id name
 1    1          one
 2    2          two
 3    1          three

type_pattern

 pattern_id type_id
 1          1
 1          2
 2          1

type

 id  txt
 1   metal
 2   round     

The point is, I want to not have a table of thing and type, but a table of thing, type_pattern, and type.

My question is, given a list of types, how should I write an SQL query to get the pattern id?

Or am I going about this the wrong way?

A: 

i would consider

thing as an entity
and
type as an entity (don't know if it really is an entity or an value object)

so accordingly you would have a thing table and a type table

thing-table:
id | name

type table:
id | txt

and then a many-to-many table say

thingTypes:
thingId | typeId

when you wish to select all the types of a thing just query the thingTypes for thingId = "the specific thing id". and alternatively you can query the thingTypes for a specific typeId and get all thingIds back that reference the specific type.

hacktick
There are hundreds of thousands of `thing` elements, but they all have one of only a few different patterns of possible selections of `type` elements, which is why I have a separate table for `pattern`.
Kinopiko
+1  A: 

You need to count the number of types in the list before executing:

select pattern_id
from type_pattern
where type_id in (...list of types...)
group by pattern_id
having count(*) = #of types in the list

Alternatively, you could sort the result by the number of matched types, so you'll get which pattern matches completely, and which patterns are a close match:

select pattern_id, count(*) matches
from type_pattern
where type_id in (...list of types...)
group by pattern_id
order by 2 desc

Update: If you do not want the patterns that fit even more types, you can constrain the query like this:

select pattern_id
from type_pattern t1
where type_id in (...list of types...) and not exists ( 
    select 1 
    from type_pattern t2 
    where t2.pattern_id = t1.pattern_id
    and t2.type_id not in (...list of types...))
group by pattern_id
having count(*) = #of types in the list

or, if you still want to rank the patterns, here's the query that shows how much "hits" and "misses" each pattern has, and ranks them by "hits":

select pattern_id, 
    sum(case when type_id in (...list of types...) then 1 else 0 end) matches, 
    sum(case when type_id in (...list of types...) then 0 else 1 end) extras 
from type_pattern
group by pattern_id
order by 2 desc
Milan Babuškov
This doesn't completely solve the problem, since it detects multiple patterns which match even if the pattern has more types which don't match.
Kinopiko
@Kinopiko: you're right, I updated the answer.
Milan Babuškov