tags:

views:

329

answers:

2

I have a Solr schema that has a "url" field:

   <fieldType name="url" class="solr.TextField"
        positionIncrementGap="100">
   </fieldType>

   <fields>
        <field name="id" type="string" stored="true" indexed="true"/>
        <field name="url" type="url" stored="true" indexed="false"/>
        <field name="chunkNum" type="long" stored="true" indexed="false"/>
        <field name="origScore" type="float" stored="true" indexed="true"/>
        <field name="concept" type="string" stored="true" indexed="true"/>
        <field name="text" type="text" stored="true" indexed="true"
            required="true"/>
        <field name="title" type="text" stored="true" indexed="true"/>
        <field name="origDoctype" type="string" stored="true" indexed="true"/>

        <field name="keywords" type="string" stored="true" indexed="true"/>
    </fields>
    <uniqueKey>id</uniqueKey>
    <defaultSearchField>text</defaultSearchField>
<solrQueryParser defaultOperator="OR"/>

I can add SolrInputDocuments with all the fields and query them back using the text field and/or with a filter query on "concept". But when I try to query a specific url, I don't get any results. My code looks like:

    SolrQuery query = new SolrQuery();
    query.setQuery("url:" + ClientUtils.escapeQueryChars(url));
    //query.setQuery("*:*");
    //query.addFilterQuery("url:" + ClientUtils.escapeQueryChars(url));

    List<Chunk> retCode = null;

    try
    {
        QueryResponse resp = solrServer.query(query);
        SolrDocumentList docs = resp.getResults();
        retCode = new ArrayList<Chunk>(docs.size());
        for (SolrDocument doc : docs)
        {
            LOG.debug("got doc " + doc);
            Chunk chunk = new Chunk(doc);
            retCode.add(chunk);
        }
    }
    catch (SolrServerException e)
    {
        LOG.error("caught a server exception", e);
    }
    return retCode;

I've tried with and without the ClientUtils.escapeQueryChars and I've tried using a query of "url:" or a filter query on url. I never get anything back. Any hints?

+1  A: 

Whats the actual type of "url"? In your schema.xml you should have a set of "fieldType" elements which list the actual Solr backing classes and filters that make up a data type.

For your "fieldType" for the "url" you are interested in the "class" attribute. E.g. the most basic free-text type has a class="solr.TextField". You might be using a type that has some wacky filters on it and Lucene/Solr ends up indexing your data differently from what you would expect.

Download Luke and look at your index visually:

http://www.getopt.org/luke/

It will help you "look" at your data - like I said, maybe its stored differently than what you expect.

Cody Caughlan
Oh right, I left that part out. The "url" type is just a clone of the text type with the analyzer stuff stripped out. I've also tried making it a clone of the string type.
Paul Tomblin
A: 

Dammit, another stupid one on my part: Thanks to Cody's suggestion of using Luke, I discovered this inconvenient part of the schema:

    <field name="url" type="url" stored="true" indexed="false"/>

Changing that to indexed="true" fixed the problem.

Paul Tomblin
Indeed you cannot search on an un-indexed field :)
jeje