views:

164

answers:

9

I have a table structure like this (vertical design). I can have unlimited number of attributes (eg: city, phone etc.) for each user.

Table: tbl_UserAttributes

┌────────┬───────────┬────────────┐
| UserID │ FieldName │ Value      |
├────────┼───────────┼────────────┤
│ 341    │ city      │ MyCity1    │
│ 772    │ phone     │ 1234567890 │
│ 033    │ city      │ MyCity2    │
│ 044    │ sex       │ M          │
│ 772    │ firstname │ MyName     │
│ ---    │ ---       │ ---        │
└────────┴───────────┴────────────┘

I have to implement a search feature which should output rows which we apply query like this for a horizontally designed table:

SELECT 
    FieldName 
FROM 
    tbl_UserAttributes 
WHERE 
    city='%Mumbai%' AND 
    sex='M' AND ...

Please dont ask me to change the database design.

UPDATE: At present, I have a JOIN solution in place which is very slow and it hangs the server some times. Any alternate methods?

+1  A: 

To those who offer help this is the classic case of EAV (Entity Attribute Value). It is highly NOT recommended when designing applications.

JonH
Yep, it is not reccomended. Unfortunately the system is in place for a long time and I cannot change.@JonH, can you give me an example or url?
NinethSense
Kind of sad, because it could be sped up some by making a separate table mapping field names to IDs, so at least the second column of the main table could be easy-to-match ints.
Mike D.
+2  A: 

Seen this before. Instead of searching for things that match city and sex and whatever, count how many attributes match your search query. If this count is equal to the number of attributes in your search query, it is one of your results.

Brian
+7  A: 

EAV table is a good thing as long as you don't need to search for multiple values at once in which case it becomes a bad thing.

You cannot index several values at once because they are located in different records.

In an SQL Server table you could create an indexed view over multiple values and use it for the searches.

In Oracle, you could cluster the table by UserID which would keep all records with the same UserID within one data page which would use an index on the most selective value and quickly scan for the other values.

In PostgreSQL, you could store all value in a single array and index it with a GIN index.

In MySQL, you can do neither of this.

Here's a query which will return the values:

SELECT  *
FROM    tbl_UserAttributes tcity
JOIN    tbl_UserAttributes tsex
ON      tsex.userid = tcity.userid
WHERE   tcity.fieldname = 'city'
        AND tcity.value LIKE '%Mumbai%'
        AND tsex.fieldname = 'sex'
        AND tsex.value = 'M'

but don't expect it to be very fast.

Update:

If you needed to have the exact matching, you could create a composite index on (fieldname, value, userid), put the most selective fieldname into the first table and use STRAIGHT_JOIN to force the order:

SELECT  *
FROM    tbl_UserAttributes tcity
STRAIGHT_JOIN
        tbl_UserAttributes tsex
ON      tsex.userid = tcity.userid
WHERE   tcity.fieldname = 'city'
        AND tcity.value = 'Mumbai'
        AND tsex.fieldname = 'sex'
        AND tsex.value = 'M'

However, this won't help with your current query, since you are looking for a wildcard match in which case the indexes are not very helpful. And your second table won't benefit much from the index unless you are querying a maternity hospital database.

Still it will save you some time since the index scan can be used instead of a table scan.

Quassnoi
Actually, I already have exactly this query in place but it is damn slow and some times the MySql Server gets hung. Looking for alternate options.
NinethSense
+1  A: 

You will have to join on the same table, on userid = userid, obviously with one side of the join being "where fieldname = 'city' and value = 'houston'" and the other side being "where fieldname = 'sex' and value = 'M'". Hope you don't want to have too many different fields to search at the same time!

Quassnoi beat me by 30 seconds.

JeeBee
Definitely there are too many different fields. I am developing an advanced search screen.
NinethSense
Can you enumerate all the possible search field names up front? Could you run a script every hour/minute to copy data from this crunky table into a new horizontal table that your search will work with in order to get the user ID?
JeeBee
+1  A: 

Maybe it's not recommended but I understand where it is coming from. In exchange for flexibility you get some pain in the neck querying.

To query for two attributes you need a join.

select a1.userid from tbl_UserAttributes a1, tbl_UserAttributes a2 where
a1.userid=a2.userid 
and a1.FieldName='city' and a1.Value='Mumbai'
and a2.FieldName='sex' and a2.Value='M'

It'll get unwieldy pretty soon.

Upd:

As Brian says, you are better off counting the number of matches.

select userid, count(*) from tbl_UserAttributes 
where (FieldName='city' and a1.Value='Mumbai')
or (FieldName='sex' and a2.Value='M')
group by userid
having count(*)=2

This should work much faster

yu_sha
Need 1 join per search field. Ugly in reality, and SLOW.
C. Ross
Actually, I already have exactly this query in place but it is damn slow and some times the MySql Server gets hung. Looking for alternate options.
NinethSense
Actually, Brian suggested a smart thing. I'll update my response
yu_sha
The other benefit of counting the matches is that you can replace your "where (FieldName='city' and a1.Value='Mumbai')" type clauses with a check against a table. So, you could make a function that takes in something like "city,Mumbai,sex,M" and returns a table... Makes for a more generic query.
Brian
+1  A: 
select ua.userID
from tblUserAttributes ua
INNER JOIN tblUserAttributes ua2
ON ua.userID = ua2.userID
and ua2.firstname = 'john'
INNER JOIN tblUserAttributes ua3
ON ua.userID = ua3.userID
and ua3.lastname = 'smith'
where ua.sex = 'M'
Gabe Roffman
+1  A: 

If the Inner join method mentioned here before is not acceptable all I can suggest is that you aggregate your table into a horizontal format every X minutes and then use the new table as a base for your searching.

Martin Beeby
+2  A: 

Are there a fixed set of FieldNames?

If there are can I suggest setting up a view to turn it horizontal and make it easy to query from. In SQL Server 2005 it would be something like:

SELECT *
FROM
(SELECT [UserID], [FieldName], [Value]
FROM [tbl_UserAttributes] ) ps
PIVOT
(
MAX([Value])
FOR [FieldName] IN
( [City], [Phone], [sex], [firstname])
) AS pvt

This should make it horizontal although all required [FieldName] values need to be in the IN () section to pull out a field for each. Also using the Max means if you have multiple values for the same FieldName it will pull out the Max one.

PeteT
I wonder what will be the MySQL equivalent :)
NinethSense
If they knew the entire domain of field names, makes me wonder why they went with this design then...
CSharpAtl
I did say if they knew, remember some people are very poor at database design, this looks like it to me. Also they may not need absolutely ever field to search, specifying the IN () set for the needed fields for all the searches would at least get them a working solution.
PeteT
This is an approach we've tried with similar crappy databases. Asks the users to pick a subset of fields that would be quickly searchable, others would take forever. Sounds like his platform can't handle the "forever" columns, so it'll be an all or none type deal.
Stephanie Page
+1  A: 

I think rather than trying to do it as one slow query, I'd do it as several quick queries, one for each condition. Maybe something like this.

CREATE TEMPORARY table search_results (User_id,score)
  SELECT User_id, 1 FROM tbl_UserAttributes
    WHERE FieldName ='blah' and FieldValue='x'; //should put an index on search_results.User_id

UPDATE search_results s JOIN tbl_UserAttributes u USING (User_id)
SET s.score=s.score+1 WHERE u.FieldName ='foo' and FieldValue='y';

repeat UPDATE for however many conditions.

SELECT User_id FROM search_results WHERE score= 'number of conditions'.

The above SELECT can be joined to tbl_UserAttributes to output whichever fieldnames you need.

dnagirl
There will be way too many quick queries which having been run together will take much more time than a one slow query.
Quassnoi