views:

54

answers:

2
"question_id": 58640
"tags": ["polls", "fun", "quotes"]
"title": "Great programming quotes"
"question_id": 184618
"tags": ["polls", "fun", "comment"]
"title": "What is the best comment in source code you have ever encountered?"
"question_id": 3734102
"tags": ["c++", "linux", "exit-code"]
"title": "Why cant' I return bigger values from main function ?"
"question_id": 2349378
"tags": ["communication", "terminology", "vocabulary"]
"title": "New programming jargon you coined?"
"question_id": 3723817
"tags": ["open-source", "project-management", "failure", "fail"]
"title": "How to make an open source project fail"
"question_id": 3699150
"tags": ["testing", "interview-questions", "job-interview"]
"title": "Interview question please help"

This is just a text extract some questions that I got using the SO API.

To make this query-able, I want to use SQLite to store the data.

How should I store the tags column?

Since the limit here on SO is five tags, I can use five columns tag1, tag2 ..., but I think there would be something more elegant that can be done. Something that scales to any number of tags being there, and can also handle basic queries like

select title from table where tag has "c++" and "boost" but not "c"
+2  A: 

Hello, This is a many to many relationship : questions have multiple tags, tags can appear in multiple questions. This means you have to create three tables, one for the questions, one for the tags, and one for the links between these tables. The resulting query would look like this:

SELECT title FROM question
       INNER JOIN question_tag_link USING (question_id)
       INNER JOIN tag USING (tag_id)
            WHERE tag_name IN('c++', 'boost')
              AND NOT EXISTS(
           SELECT * FROM tag t1
            WHERE t1.tag_name = 'c'
              AND t1.question_id = question.question_id);

Not so simple, but I think it is the price to pay if you don't want to be limited. If there are less than 64 different tags, you could use the SET field type, but you would loose very much flexibility (hard to add a new tag).

greg0ire
The query won't exclude questions that are tagged with 'c' - it needs a separate `and not exists (...` condition.
Mark Bannister
@Mark Bannister : indeed! I fixed my code, but I am not 100% sure whether the question_id = question_id condition will work or not. You have enough rep, feel free to edit my post.
greg0ire
@gregDire: I have added the `question` table name to the `= question_id` field condition, and the t1 table alias, in the NOT EXISTS clause - otherwise, it looks fine to me.
Mark Bannister
@Mark Bannister: it looks much more correct like this, thank you!
greg0ire
+1  A: 

alt text

select distinct a.QuestionTitle
from
(
select q.QuestionID, QuestionTitle, TagName 
from QuestionTags as x
join Question     as q on q.QuestionID = x.QuestionID 
join Tag          as t on t.TagID      = x.TagID 
where TagName in ('c++', 'boost')
) as a
left join
(
select q.QuestionID, QuestionTitle, TagName 
from QuestionTags as x
join Question     as q on q.QuestionID = x.QuestionID 
join Tag          as t on t.TagID      = x.TagID 
where TagName = 'c'
) as b on b.QuestionID = a.QuestionID
where b.QuestionTitle is null
order by a.QuestionTitle ;
Damir Sudarevic
The query won't exclude questions that are tagged with 'c' - it needs a separate `and not exists (...` condition.
Mark Bannister
@Mark, thank you. Corrected.
Damir Sudarevic