views:

174

answers:

2

Lets say I have a FirstName > MiddleName > LastName hierarchy (~10k rows, for sake of the question). This means you could have "John > Mary-Anne > Eddy" or "Eddy > John > Jacob" row. The point being that the hierarchy makes little sense and is very foreign to the user (unlike, say, a Country > State > City structure).

Because its so unstructured and confused, I want to provide the user with an auto complete input box. As they type, it should search for possible substring matches, and when they "root" their search string at a level, it will then restrict the results to below that level.

Now, because there are lots of people named "John", it makes little sense that if they type "John" they only get back results like

  • John > Allen > Alexander
  • John > Allen > Burschawitz
  • John > Allen ... repeat 100 times ...

Because they'll never see the unique row "Jason > John > Smith".

Instead, they should get back something like ("*" is just an arbitrary indicator to the user of "hey, lots more rows below this exist"):

  • John > Allen > *
  • Jason > John > Smith
  • Mike > John > *
  • Mary > Elena > Johnason

If they type "John > Al" then the results would be limited to anything under "John >", but should be grouped similarly to above.

I hope the explanation is clear. The requirements are a bit loose. Just reasonable ones so that a person can search through the tree and find what they are after.

Right now, I have some interesting SQL that looks for the search term in the row, figures out its position, does some substring'ing, group bys, and order by's to get the above results, but its not performing well enough.

I'm trying to solve this problem on a typical LAMP stack (except with Oracle). Its not shared hosting, so I do have full control over the server. The data changes small amounts every few weeks, and the search results can stay stale for a reasonable amount of time (e.g, a cron that updates the search index isn't out of the question).

A: 

I have no idea about Oracle but I am sure that you should be able to create a query similar to the following:

SELECT FirstName + ' > ' + MiddleName + ' > ' + LastName AS FullName
FROM WHATEVER_TABLE
WHERE FirstName + ' > ' + MiddleName + ' > ' + LastName LIKE 'SOME_USER_INPUT%'
ORDER BY FirstName, MiddleName, LastName

Notes:

MySQL provides a construct called "CONCAT_WS" that allows you to concatenate arbitrary number of columns with a separator. See if you have something similar available.

NULL values in any of the columns might result in the entire concatenated string to be null. Again check your database reference manual to see if you have something to tackle this. An construct such as "IFNULL" will help when used as follows:

IFNULL( FirstName, '' ) + ' > ' + IFNULL( MiddleName, '' ) + ' > ' + IFNULL( LastName, '' ) AS FullName

You can experiment with optimization in many ways, starting by indexing the three columns. Alternatively, you can add another column to your table that stores the concatenated string with separator, you can use that column for searching instead. Indexing that column will be very efficient but adds the overhead of updating the column each time any of the columns are changed.

You might be able to create computed columns in the table as well. Oracle allows that i think. If so there will be no need to keep all four columns in sync.

Salman A
A: 

Argh. Sorry I wasn't able to describe my problem. Anyways, here's the solution I came up with.

Basically, create a second table from the 3-column table that contains all the distinct values for each successive level of the hierarchy, as well as a column to indicate the depth of that row in the hierarchy.

E.g. From mytable(A, B, C), create search_t(A, B, C, level)

So, with "One > Two > Three", you create 3 rows (A, B, C, level):

  • "One", null, null, 1
  • "One", "Two", null, 2
  • "One", "Two", "Three", 3

When searching, you can constrain the level by picking a value for level and providing values for the upper-level columns:

WHERE A='One' and level > 1 and (B like '%t%' or C like '%t')

It can be a bit simplified and generic if you create a search_str column and perform the LIKE matching against that instead.

WHERE A='One' and level > 1 and search_str like '%t%'

In retrospect, this would probably have been more apparent if the data were already in an adjacency-list model.

Richard Levasseur