tags:

views:

609

answers:

3

I would like to do the equivalent of this SQL but with Solr as my data store.

SELECT
   DISTINCT txt
FROM
   my_table;

What syntax would force Solr to only give me distinct values?

http://localhost:8983/solr/select?q=txt:?????&fl=txt

EDIT: So faceted searching seems to fit, but as I investigated it, I realized I had only detailed half of the problem.

My SQL query should have read...

SELECT
   DISTINCT SUBSTR(txt,0,3)
FROM
   my_table;

Any possibility of this with Solr?

A: 

take a look at faceted search

Tim Mahy
+5  A: 

Faceting would get you a results set that contains distinct values for a field.

E.g.

http://localhost:8983/solr/select/?q=*%3A*&rows=0&facet=on&facet.field=txt

You should get something back like this:

<response>
<responseHeader><status>0</status><QTime>2</QTime></responseHeader>
<result numFound="4" start="0"/>
<lst name="facet_counts">
 <lst name="facet_queries"/>
 <lst name="facet_fields">
  <lst name="txt">
        <int name="value">100</int>
        <int name="value1">80</int>
        <int name="value2">5</int>
        <int name="value3">2</int>
        <int name="value4">1</int>
  </lst>
 </lst>
</lst>
</response>

Check out the wiki for more information. Faceting is a really cool part of solr. Enjoy :)

http://wiki.apache.org/solr/SimpleFacetParameters#Facet_Fields

Note: Faceting will show the indexed value, I.e. after all the filters have been applied. One way to get around this is to use the copyfield method, so that you can create a facet version of the txt field. THis way your results will show the original value.

Hope that helps.. Lots of documentation on faceting available on the wiki. Or I did write some with screen shots.. which you can check out here:

http://www.craftyfella.com/2010/01/faceting-and-multifaceting-syntax-in.html

CraftyFella
+2  A: 

I would store the substring in a different field (let's call in txt_substring), then facet on txt_substring as CraftyFella showed.

Normally I'd use the n-gram tokenizer, but I don't think you can facet on that.

Mauricio Scheffer