views:

1707

answers:

6

I use solr to search for documents and when trying to search for documents using this query "id:*", I get this query parser exception telling that it cannot parse the query with * or ? as the first character.

HTTP Status 400 - org.apache.lucene.queryParser.ParseException: Cannot parse 'id:*': '*' or '?' not allowed as first character in WildcardQuery

type Status report

message org.apache.lucene.queryParser.ParseException: Cannot parse 'id:*': '*' or '?' not allowed as first character in WildcardQuery

description The request sent by the client was syntactically incorrect (org.apache.lucene.queryParser.ParseException: Cannot parse 'id:*': '*' or '?' not allowed as first character in WildcardQuery).

Is there any patch for getting this to work with just * ? Or is it very costly to do such a query?

A: 

I'm assuming with id:* you're just trying to match all documents, right?

I've never used solr before, but in my Lucene experience, when ingesting data, we've added a hidden field to every document, then when we need to return every record we do a search for the string constant in that field that's the same for every record.

If you can't add a field like that in your situation, you could use a RegexQuery with a regex that would match anything that could be found in the id field.

Edit: actually answering the question. I've never heard of a patch to get that to work, but I would be surprised if it could even be made to work reasonably well. See this question for a reason why unconstrained PrefixQuery's can cause a problem.

Ryan Ahearn
A: 

Actually, I have been using a workaround for this. I append a character to the id, eg: A1, A2, etc.

With such values in the field, it is possible to search using the query id:A*

But would love to find whether a true solution exists.

cnu
A: 

Lucene doesn't allow you to start WildcardQueries with an asterisk by default, because those are incredibly expensive queries and will be very, very, very slow on large indexes.

If you're using the Lucene QueryParser, call setAllowLeadingWildcard(true) on it to enable it.

If you want all of the documents with a certain field set, you are much better off querying or walking the index programmatically than using QueryParser. You should really only use QueryParser to parse user input.

Joe Shaw
+1  A: 
id:[a* TO z*] id:[0* TO 9*] etc.

I just did this in lukeall on my index and it worked, therefore it should work in Solr which uses the standard query parser. I don't actually use Solr.

In base Lucene there's a fine reason for why you'd never query for every document, it's because to query for a document you must use a new indexReader("DirectoryName") and apply a query to it. Therefore you could totally skip applying a query to it and use the indexReader methods numDocs() to get a count of all the documents, and document(int n) to retrieve any of the documents.

dlamblin
+4  A: 

If you want all documents, do a query on *:*

If you want all documents with a certain field (e.g. id) try id:[* TO *]

Daniel Papasian
A: 

If you are just trying to get all documents, Solr does support the *:* query. It's the only time I know of that Solr will let you begin a query with an *. I'm sure you've probably seen this as the default query in the Solr admin page.

If you are trying to do a more specific query with an * as the first character, like say id:*456 then one of the best ways I've seen is to index that field twice. Once normally (field name: id), and once with all the characters reversed (field name: reverse_id). Then you could essentially do the query id:*456 by sending the query reverse_id:654* instead. Hope that makes sense.

You can also search the Solr user group mailing list at http://www.mail-archive.com/[email protected]/ where questions like this come up quite often.

mbaird