tags:

views:

412

answers:

3

I have an index that stores birth-dates, and I would like to search for anybody whose birth-date is within X days of a certain month/day. For example, I'd like to know if anybody's birthday is coming up within a certain number of days, regardless of what year they were born. How would I perform this query this using Solr? (on the "birthdate" field)

As a follow-up, assuming this query is executed very often, should I be indexing something other than the birth-date? Such as just the month-day pair? What is the most efficient way to do such a query (from the query and indexing standpoint)?

+2  A: 

If a day/month pair is tricky (I don't know whether it is or not) why not have a field of "their birthday in 1980" (whether they were alive then or not). Then you just need to do the search against 1980. This is effectively a day/month pair, but stored in a type you can use easily.

Note that 1980 is a leap year, which is why I chose it - otherwise those with a birthday of February 29th could be hard to represent.

Alternatively, a "day/month" pair in the form of an integer:

(100 * month) + day

would give you a simple representation which would be easy to search and index. I've usually found that storing data in a single field is simpler than using two fields. Then again, I've never used Solr...

EDIT: I've had another idea. It's a bit balmy, but even so...

Store the birth date in a format which is effectively month, day, year. I don't know if Solr could easily do it in MM/dd/yyyy format and then do a lexicographic order search, but the alternative is

(100000 * month) + (1000 * dayOfMonth) + (year - 1900)

(This is assuming you don't need it to store birth dates earlier than 1900. I'm sure you can tailor it.)

You can still recover the original birth date, but the ordering will be in birthday order, with the oldest person first for any particular date.

It does mean it's hard to sort people by their actual age though. I don't know if that's an issue for you.

Anyway, as I said it's a bit off-the-wall, but it might help :)

Jon Skeet
A day/month pair is not tricky, it just means "yet another field in the schema" that has to be present for all records. This is what I'm trying to avoid.Setting a date of a birthday in a particular year is almost the same as a month/day pair, it just has a CONSTANT year (instead of "no year").
DarylBeattie
A: 

You could store the birthday as a number from 1 to 366. Then search that value. The advantage is that you can then search with day ranges quite easily. The disadvantage is that you can't easily use this field for finding people whose birthday is this month.

Mr. Shiny and New
+2  A: 

You need to remember that Solr uses Lucene, and that as of now - everything is stored and indexed as a string.

Range query as is won't work because the dates are usually internally indexed as YYYYMMDD

Having a seperate field in the index that just stores MMDD strings would be easily searchable. Or if you don't want an extra field, and are willing to index the dates differently, rearrange the order when indexing so that birthdates are indexed MMDDYYY

Then you can construct rangequeries, because everything you need to match against is in the front of the string, and lucene matches lexiographically

(A rangequery that was ba -> bc would match BAt, BAseball, but not BEcause.)

Indexing like this is a onetime fixed cost, and doesnt destroy anything other than internal arrangement chronologically. If that's a problem, use two fields, disk space is cheap!)

Max
Okay great, thanks. It's a shame nobody has a solution for doing this search directly on the birth-date field...I have messed around with adding my own FieldType of "BirthdayField" that indexes like it's an IntField, but queries like it's a DateField... Lotsa work.
DarylBeattie
There are probably ways to search directly on this field, but not efficiently. Due to the fact you'd need a lot of additional logic, and would not be able to get complete answers in a single query.
Max
Yes, it was the "directly on this field" question that I really wanted answered, instead everybody provided answers for the "follow-up" question that I asked. And I already had a solution for the follow-up (along the lines of what you suggested).
DarylBeattie
Solr offers date functionality above and beyond that offered by lucene alone, so your statements about date format are misleading: http://lucene.apache.org/solr/api/org/apache/solr/schema/DateField.html Regardless, you're correct -- the simplest solution is definitely storing a MMDD-type string.
Frank Farmer