views:

424

answers:

8

Using mysql and PHP

I am using MATCH AGAINST clauses already.

It is working fine against individual tables. Like if i want to search in shops table. No problem.

What i want is to be able to search and DISPLAY results from different tables in a single result page.

Eg if i type "chocolate clothes"

i may get 4 results as follows:

Shop1 result

ShopItem1 result

ShopItem2 result

Shop2 result

and of course the most relevant results should be ranked first.

i have quite a few questions. design wise as well as implementation wise

1) should i change my design? i am thinking of having a separate table called search results that will contain data from both SHOPS and SHOPPRODUCTS table. however that means i have some data duplication.

2) should i keep my current design? if so, then how on earth can i get the search results sorted by relevancy across 2 different tables?

i saw that rottentomatoes organised their search results in different groups. however, we prefer the search results not to be restricted by different types especially when we have paging that is going to be even more difficult to navigate UI wise.

http://www.rottentomatoes.com/search/full%5Fsearch.php?search=girl

OR that is actually the best way out?

I hope that someone can give me guidance on this kind of thing esp if you have experience in generating search results across what would seem like multiple tables.

since by demand, i will put the table structures here

CREATE TABLE `shopitems` (
  `id` int(10) unsigned NOT NULL auto_increment,
  `ShopID` int(10) unsigned NOT NULL,
  `ImageID` int(10) unsigned NOT NULL,
  `name` varchar(100) NOT NULL,
  `description` varchar(255) NOT NULL,
  `pricing` varchar(45) NOT NULL,
  `datetime_created` datetime NOT NULL,
  PRIMARY KEY  (`id`)
) ENGINE=MyISAM AUTO_INCREMENT=31 DEFAULT CHARSET=utf8;

/*Table structure for table `shops` */

DROP TABLE IF EXISTS `shops`;

CREATE TABLE `shops` (
  `id` int(11) NOT NULL auto_increment,
  `title` varchar(100) default NULL,
  `description` text,
  `keywords` text,
  `url` varchar(255) default '',

  `owner_id` varchar(255) default NULL,
  `datetime_created` datetime default NULL,
  `created_by` varchar(255) default NULL,
  `datetime_modified` datetime default NULL,
  `modified_by` varchar(255) default NULL,

  `overall_rating_avg` decimal(4,2) default '0.00',


  PRIMARY KEY  (`id`),
  FULLTEXT KEY `url` (`url`),
  FULLTEXT KEY `TitleDescFullText` (`keywords`,`title`,`description`,`url`)
) ENGINE=MyISAM AUTO_INCREMENT=3051 DEFAULT CHARSET=utf8;

i intend to search through the description and the name columns of the shopproducts table.

but as you can see it has not been implemented yet.

although the search for the shops is already up and running.

A: 

If you show the table structures so we can understand the difference between the two, it will be much easier to give an answer.

[EDIT]

After your edit I understand your question even less. It looks like there is nothing to search for in that shops table. Isn't this just a super-table with no real product content?

This would mean that you can already search through your whole product db and that you just have to display the shop name in the search result by joining the two tables.

[/EDIT]

Would it be a possibility for you to create a view that holds data from both tables and search this view instead of the tables?

tharkun
i am not sure about the view. i am using a shared hosting on site5. already they dont allow me to use stored procedures. i am not sure about views.
keisimone
hang on, does full text search even work with a VIEW?
keisimone
I would assume so.
tharkun
Okie here is the thing, we want to display SHOPS AND SHOP PRODUCTS kinda like in rottentomatoes, where you may search through actors or movies.you assumed about VIEW being able to work with Full text search? i tried searching in mysql documentation, there is nothing there to suggest that it can or it cannot.I am leaning towards cannot. I tried to create a full text index for a view that is an exact replica of the shops table but i cannot.Have you done any full text search before? have you personally tried to do a search on a VIEW?I am curious enough to try all sorts of possibilities
keisimone
I use full text search but haven't tried it with views. And maybe you're right that it doesn't work since mysql views are not good friends with indexing. But still. Is there anything to search for in the shops table?
tharkun
yes we are storing data on shops AND their products. we are not an ecommerce store hence it is necessary. Consider rottentomatoes example where they store data on movies AND actors. we are searching in description and url and title columns of the shops table.
keisimone
ok, thanks for the clarification!
tharkun
A: 

I suggest you the first option. Redundancy isn't always evil.

So I would make a table like this:

CREATE TABLE search_results
(
   ...
   `searchable_shop_info` VARCHAR(32),
   `searchable_shopitem_info` TEXT
   FULLTEXT KEY `searchable` (`searchable_shop_info`, `searchable_shopitem_info`)
) Engine=MyISAM;

Then you can still use SELECT * FROM search_results WHERE MATCH (searchable_shop_info, searchable_shopitime_info) AGAINST ('search query string');

Ifju
may i ask why you recommend that over other options?
keisimone
A: 
iceangel89
i am curious. that means i need to run that query EVERYTIME for any search input. because different input would have different relevancy scores.Maybe i am a bit lacking in understanding. i am suspicious whether your method is resource intensive. Please enlighten me.I am willing to consider all possibilities.
keisimone
thanks for your efforts. but you have not done fulltext search, so i dont think you see the problem. I am quite sure you cannot do fulltext search on VIEW.
keisimone
hmm, ok, i dont know how u will maintain the results table. but i guess triggers will be an option
iceangel89
i dont think you quite understand my purposes of the search_results table. they are just mere clones of the data in shops and shop products table. bad thing is when i do update on shops OR shop products, i have to update BOTH the shops table and the search_results table. good thing is somehow it is easier to search a single table rather than 2 tables and display results accordingly.
keisimone
i mean u might be able to use triggers to update the results table when the shops or products table are updated
iceangel89
"oh and i am also not really familiar with full text search so i dunno if this method will affect anything"then thanks for trying. but i rather you didn't supply your answer then.
keisimone
A: 

If I understand your questions right, the answer is very simple:

  1. Don't change the design. It's perfectly fine. That's how it's supposed to be.
  2. Do a joined query like this:
SELECT * FROM shops
LEFT OUTER JOIN shopitems ON (shopitems.shopid = shops.id)
WHERE 
    MATCH (shops.title, shops.description, shops.keywords,
           shopitems.name, shopitems.description) 
    AGAINST ('whatever text')
Slawa
1)you understand wrongly.2) the query doesnt even work at all, let alone for the purposes of my question.
keisimone
+1  A: 

I am not sure I understood correctly, but here are my 2 cents.

From what I can see, the problem is that you have 2 tables with very different layouts, so I will assume that you want to base the fulltext search on these fields:

  • for shops: title, description and keywords
  • for shopitems: name and description

Solution 1: Layout consistency -- does not use index...

If you could somehow change the name of your columns for shopitems, it would immediately get much simpler.

Select id From
(Select id, text1, text2, text3 From table1
 UNION
 Select id, text1, text2, text3 From table2)
Where MATCH(id, text1, text2, text3) AGAINST('keyword1 keyword2 keyword3')

However I can understand that it would be impractical to change everything that already exists. Note that with aliasing, adding a third (dummy) text column to shopitems could do the trick though.

Solution 2: Post-treatment

I should remark that the value computed can actually be returned (and thus used). Therefore you can create a temporary table with this value! Note that if you wish to return the 'title' and 'description' both columns should have the same type to be dealt with in an unifrom manner...

Select id, title, description From
(
 Select id, title, description, MATCH(id, title, description, keywords) AGAINST('dummy') As score
        From shops
        Where MATCH(id, title, description, keywords) AGAINST('dummy')
 UNION
 Select id, name As title, description, MATCH(id, name, description) AGAINST('dummy') As score
        From shopitems
        Where MATCH(id, name, description) AGAINST('dummy')
)
ORDER BY score DESC

I have no idea of the performance of this query though, I wonder if mysql will optimize away the double call to MATCH / AGAINST in each of the Selects (I hope it does).

The catch is that my query is merely a demonstration. The downside of using aliasing is that now you don't know from which table they come from any longer.

Anyway, I hope it helped you.

Matthieu M.
thanks. i think your answer at least made more sense than the other answers. i will at least give you a upvote. The other answers are, i feel, shoot from the hip style.. disappointing.
keisimone
Both your solutions have an issue of collision of id, but that can be resolved by adding another field to each table and putting in the table name in that field for all its rows.however, that also means that when i display my results on a webpage, i have to retrieve again all the associated info for all since i only have just the id.
keisimone
Yes the problem of double retrieve is annoying, which is why I suggested to try to have more similar layouts of the table if it is possible. Note that in the second solution you could ask to retrieve more information (title, description) and smooth the differencies by aliasing. I can try to come up with a more complete solution if you tell me which rows you need for each of your tables and which alterations you are ready to make on your tables structures.
Matthieu M.
A: 

I would go for the UNION. That is thepurpose of the statement.

Teo
+2  A: 

Here are a few "rules of the game" that you must keep in mind for solving this problem. You probably know these already, but stating them clearly may help confirm for other readers.

  • All indexes in MySQL can reference only columns in a single base table. You can't make a fulltext index that indexes across multiple tables.
  • You can't define indexes for views, only base tables.
  • A MATCH() query against a fulltext index must match against all the columns in the fulltext index, in the order declared in the index.

I would create a third table to store the content you want to index. No need to store this content redundantly -- store it solely in the third table. This borrows a concept of a "common superclass" from object-oriented design (insofar as we can apply it to RDBMS design).

CREATE TABLE Searchable (
  `id` SERIAL PRIMARY KEY,
  `title` varchar(100) default NULL,
  `description` text,
  `keywords` text,
  `url` varchar(255) default '',
  FULLTEXT KEY `TitleDescFullText` (`keywords`,`title`,`description`,`url`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;

CREATE TABLE `shopitems` (
  `id` INT UNSIGNED NOT NULL,
  `ShopID` INT UNSIGNED NOT NULL,
  `ImageID` INT UNSIGNED NOT NULL,
  `pricing` varchar(45) NOT NULL,
  `datetime_created` datetime NOT NULL,
  PRIMARY KEY (`id`),
  FOREIGN KEY (`id`) REFERENCES Searchable (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;

CREATE TABLE `shops` (
  `id` INT UNSIGNED NOT NULL,
  `owner_id` varchar(255) default NULL,
  `datetime_created` datetime default NULL,
  `created_by` varchar(255) default NULL,
  `datetime_modified` datetime default NULL,
  `modified_by` varchar(255) default NULL,
  `overall_rating_avg` decimal(4,2) default '0.00',
  PRIMARY KEY (`id`),
  FOREIGN KEY (`id`) REFERENCES Searchable (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;

Notice the only table with an auto-increment key is now Searchable. The tables shops and shopitems use a key with a compatible data type, but not auto-increment. So you must create a row in Searchable to generate the id value, before you can create the corresponding row in either shops or shopitems.

I've added FOREIGN KEY declarations for illustration purposes, even though MyISAM will silently ignore these constraints (and you already know that you must use MyISAM to have support for fulltext indexing).

Now you can search the textual content of both shops and shopitems in a single query, using a single fulltext index:

SELECT S.*, sh.*, si.*,
  MATCH(keywords, title, description, url) AGAINST('dummy') As score
FROM Searchable S
LEFT OUTER JOIN shops sh ON (S.id = sh.id)
LEFT OUTER JOIN shopitems si ON (S.id = si.id)
WHERE MATCH(keywords, title, description, url) AGAINST('dummy')
ORDER BY score DESC;

Of course, for a given row in Searchable only one table should match, either shops or shopitems, and these tables have different columns. So either sh.* or si.* will be NULL in the result. It's up to you to format the output in your application.


A couple of other answers have suggested using Sphinx Search. This is another technology that complements MySQL and adds more sophisticated full-text search capability. It has great performance for queries, so some people have gotten pretty enchanted with it.

But creating indexes and especially adding to an index incrementally is expensive. In fact, updating a Sphinx Search index is so costly that the recommended solution is to create one index for older, archived data, and another smaller index for recent data that is more likely to be updated. Then every search has to run two queries, against the two separate indexes. And if your data doesn't naturally lend itself to the pattern of older data being unchanging, then you may not be able to take advantage of this trick anyway.


Re your comment: Here's an excerpt from the Sphinx Search documentation about live updates to an index:

There's a frequent situation when the total dataset is too big to be reindexed from scratch often, but the amount of new records is rather small. Example: a forum with a 1,000,000 archived posts, but only 1,000 new posts per day.

In this case, "live" (almost real time) index updates could be implemented using so called "main+delta" scheme.

The idea is that since it's costly to update a Sphinx Search index, their solution is to make the index you update as small as possible. So that only the most recent forum posts (in their example), whereas the larger history of archived forum posts never changes, so you build a second, larger index for that collection once. Of course if you want to do a search, you have to query both indexes.

Periodically, say once a week, the "recent" forum messages would become considered "archived" and you'd have to merge the current index for recent posts to the archived index, and start the smaller index over. They do make the point that merging two Sphinx Search indexes is more efficient than reindexing after an update to the data.

But my point is that not every data set naturally falls into the pattern of having an archived set of data that never changes, versus recent data that updates frequently.

Take your database for example: You have shops and shopitems. How can you separate these into rows that never change, versus new rows? Any shops or products in the catalog should be permitted to update their description. But since that'd require rebuilding the entire Sphinx Search index every time you make a change, it becomes a very expensive operation. Perhaps you'd queue up changes and apply them in a batch, rebuilding the index once a week. But try explaining to the shop vendors why a minor change to their shop description won't take effect until Sunday night.

Bill Karwin
I don't usually advise using a * selector in the result of the query. It might seem a good idea on the spur of the moment but it usually hampers forward compatibility with the software that is supposed to deal with the result.
Matthieu M.
@Matthieu M: Yes, I agree, I use the wildcard only in ad hoc queries and examples for StackOverflow. I don't use the wildcard for production code. But this issue is orthogonal to the fulltext search question.
Bill Karwin
Hi Bill, thanks for your answer. It is very clear and it is illuminating. I have some questions about Sphinx Search though. "n fact, updating a Sphinx Search index is so costly that the recommended solution is to create one index for older, archived data, and another smaller index for recent data that is more likely to be updated. Then every search has to run two queries, against the two separate indexes. And if your data doesn't naturally lend itself to the pattern of older data being unchanging, then you may not be able to take advantage of this trick anyway." can you elaborate this part?
keisimone
wow thanks Bill! And i thought the only problem i had with using Sphinx is that its not usable at shared hosting over at site5. I know i am not facing this issue yet, but what if after some time, i have scaling issues. What should i consider so that my full text searching is good EVEN for single base table like shops?
keisimone
I've been thinking that more small-to-medium size web sites should use an external search solution, such as Google Custom Search (http://www.google.com/cse/) or Yahoo Build Your Own Search Service (http://developer.yahoo.com/search/boss/) instead of trying to implement full search in-house. Let someone else maintain the iron for scalable search! All you have to worry about then is making your site SEO-friendly.
Bill Karwin
This is the first time i am hearing this about the Google Custom Search and Yahoo BOSS. Sorry to bother you. I need to minimise time experimenting with yet another stuff i am unfamiliar with. But dont they just index webpage? How can such service index the data in my own database? I am really confused by this.
keisimone
Have you ever used Google with the "`site:`" syntax so that results only include URLs matching the site pattern you specify? So what if you could include a search textbox on your web page that invokes Google, limits the search to your site, and perhaps even other pre-chosen parameters that you want? And then you can integrate the search results into your page as well. That's the idea. It also depends on your site being indexed by Google (or Yahoo if you use BOSS instead), and that works better if you design your site so that each shop and shopitem has its own page, with a "pretty" URL.
Bill Karwin
That's the concept. There are tutorials at each of the sites I linked to, that can explain how to use them better than I can. They have example code, etc.
Bill Karwin
I see. thanks. because currently i am displaying the search results like thisPIC of item followed by short desc followed by some links to certain actions like review it, etc.Will these custom searches enable me to display the search results in my own format?
keisimone
Bill Karwin
thanks Bill. You have been most helpful. You have earned a lot of goodwill from me and not just my points.
keisimone
Thanks, I'm glad to help. By coincidence this week I'm developing a presentation about text-search solutions for the PostgreSQL Conference West in Seattle. I'll publish the slides afterwards at http://slideshare.net/billkarwin. BTW, I found a nice Javascript wrapper for using Yahoo BOSS: http://icant.co.uk/sandbox/yboss/
Bill Karwin
A: 

I would go with your first alternative, creating a seperate search table.

We have done this once when we needed to search for data across several SOA systems.

The benefits of this approach are:

  • faster response to search requests
  • more control over organizing of search results

The drawbacks are:

  • slower time to save data, since it must be written two places
  • extra space used to store data
Shiraz Bhaiji