ansaurus

Question

Sphinx without using an auto_increment id

Answer 1

+2 A:

sphinx only requires ids to be integer and unique, it doesn't care if they are auto incremented or not, so you can roll out your own logic. For example, generate integer hashes for your string keys.

stereofrog 2009-10-29 16:15:06

I'm a bit worried about having colliding ids with that approach - or maybe I read you wrong?

squeeks 2009-10-29 16:26:07

yes, it's totally justified, because you never know when hashes are going to collide... however, with "only" 2mln rows and 64bit ids you have enough space to play around, e.g. think about using hash+timestamp or hash+user_id - really depends on your application.

stereofrog 2009-10-29 16:42:26

Would an idea be to use unixtime + microtime at time of insert? I could then use that as the time of insertion as well as document id, two birds with one stone.

squeeks 2009-10-29 16:49:19

yes, as a primary key this would be perfect, however i'd like to warn you against "two birds" approach - it usually causes more problems as it seems to solve. But that's another story.

stereofrog 2009-10-29 17:02:19

btw reading your another comment, if your product codes are purely alphanumeric (i.e. only a-z0-9) the simples option would be treat them as 36-base integers and simply convert to/from decimal while reading/writing the db

stereofrog 2009-10-29 17:06:03

I think that would be a good idea worth trying. Cheers.

squeeks 2009-10-29 17:42:34

Answer 2

+1 A:

Sphinx doesnt depend on auto increment , just needs unique integer document ids. Maybe you can have a surrogate unique integer id in the tables to work with sphinx. As it is known that integer searches are way faster than alphanumeric searches. BTW how long is ur alphanumeric product code? any samples?

Sabeen Malik 2009-10-29 16:38:17

They vary in length from 4 to 13 characters in length.

squeeks 2009-10-29 16:41:16

Answer 3

+3 A:

Sure - that's easy to work around. If you need to make up your own IDs just for Sphinx and you don't want them to collide, you can do something like this in your sphinx.conf (example code for MySQL)

source products {

  # Use a variable to store a throwaway ID value
  sql_query_pre = SELECT @id := 0 

  # Keep incrementing the throwaway ID.
  # "code" is present twice because Sphinx does not full-text index attributes
  sql_query = SELECT @id := @id + 1, code AS code_attr, code, description FROM products

  # Return the code so that your app will know which records were matched
  # this will only work in Sphinx 0.9.10 and higher!
  sql_attr_string = code_attr  
}

The only problem is that you still need a way to know what records were matched by your search. Sphinx will return the id (which is now meaningless) plus any columns that you mark as "attributes".

Sphinx 0.9.10 and above will be able to return your product code to you as part of the search results because it has string attributes support.

0.9.10 is not an official release yet but it is looking great. It looks like Zawodny is running it over at Craig's List so I wouldn't be too nervous about relying on this feature.

casey 2009-10-30 14:37:56

Answer 4

A:

I think it's possible to generate a XML Stream from your data. Then create the ID via Software (Ruby, Java, PHP).

Take a look at http://github.com/burke/mongosphinx

chris 2010-05-13 23:25:56

ansaurus

tags:

views:

answers:

Sphinx without using an auto_increment id

related questions