views:

186

answers:

1

Firslty, I'm aware of some similar questions along the lines of this one, but I think this situation is different enough to warrant its own question.

I'm running a Solr index, through a jetty install on a LAMP server. I currently use the simplexml_load_file function to bring in the search results and then parse them trough a couple of functions. I was happy with this process until I started running into a fundamental problem.

The field names don't get passed through the simplexml function. For example, this result;

<doc>
  <float name="score">0.73325396</float>
  <str name="add1">Ravensbridge Drive</str>
  <str name="comments">0</str>
  <str name="company">Stratstone Lotus Leicester</str>
  <str name="feed_id"/>
  <str name="id">1711765</str>
  <str name="pcode">LE4 0BX</str>
  <str name="psearch">LE4</str>
  <str name="rating">0</str>
</doc>

Will look like this in the simplexml object;

 [doc] => Array
 (
   [0] => SimpleXMLElement Object
   (
     [float] => 0.73325396
     [str] => Array
     (
       [0] => Ravensbridge Drive
       [1] => 0
       [2] => Stratstone Lotus Leicester
       [3] => SimpleXMLElement Object
       (
         [@attributes] => Array
         (
           [name] => feed_id
         )
       )
       [4] => 1711765
       [5] => LE4 0BX
       [6] => LE4
       [7] => 0
     )
   )

When a full dataset is found, there is 11 bits of data stored in the array, but when some are missing, data moves around and my parser comes unstuck.

So, I've looked at libraries/classes to do it properly. Namely, the two main ones; Apache Solr and solr-php-client but both seem over complicated, with little amount of actual real world examples, and neither look like they support different solr cores, of which I use several.

Whats the best thing to use? I've got pretty stuck here now, any help would be MASSIVELY appreciated.

Thanks!

+1  A: 

Definitely, use one of the existing clients. As for the multiple core support, it's as simple as creating an instance of the client for each instance of Solr.

The Solr extension is much more powerful while still quite intuitive to use. Here there are a couple of sample code snippets that make a basic query and get the results back using both libraries:

PHP Solr extension

<?php
$options = array
(
    'hostname' => 'localhost',
    'port'     => '8080',
    'path'     => '/solr'
);

$client = new SolrClient($options);

$query = new SolrQuery();
$query->setQuery('fox');
$query->setStart(0);
$query->setRows(50);
// specify which fields do we want to retrieve
$query->addField('id')->addField('title_t')->addField('source_t');

$res = $client->query($query)->getResponse();

// how does he response look like?
var_dump($res);
/*
object(SolrObject)[4]
  public 'responseHeader' => 
    object(SolrObject)[5]
      public 'status' => int 0
      public 'QTime' => int 0
      public 'params' => 
        object(SolrObject)[6]
          public 'fl' => string 'id,title_t,source_t' (length=19)
          public 'indent' => string 'on' (length=2)
          public 'start' => string '0' (length=1)
          public 'q' => string 'fox' (length=3)
          public 'wt' => string 'xml' (length=3)
          public 'rows' => string '50' (length=2)
          public 'version' => string '2.2' (length=3)
  public 'response' => 
    object(SolrObject)[7]
      public 'numFound' => int 39
      public 'start' => int 0
      public 'docs' => 
        array
          0 => 
            object(SolrObject)[8]
              ...
          1 => 
            object(SolrObject)[9]
              ...
          2 => 
            object(SolrObject)[10]
              ...
          (...)
*/
// how does a document look like?
var_dump($res->reponse->docs[0]);
/*
object(SolrObject)[8]
  public 'id' => int 11408
  public 'source_t' => string 'CBD News Headlines' (length=18)
  public 'title_t' => string 'Hunting across Southeast Asia weakens forests' survival' (length=55)
*/

solr-php-client (official example of use)

require_once 'library/SolrPhpClient/Apache/Solr/Service.php';

$solr = new Apache_Solr_Service('localhost', '8080', '/solr');

if (!$solr->ping()) {
    exit('Solr service not responding.');
}

$offset = 0;
$limit = 50;

$query = 'fox';
$res = $solr->search($query, $offset, $limit);

// how does he response look like?
var_dump($res->response);

/*
object(stdClass)[6]
  public 'numFound' => int 39
  public 'start' => int 0
  public 'docs' => 
    array
      0 => 
        object(Apache_Solr_Document)[46]
          protected '_documentBoost' => boolean false
          protected '_fields' => 
            array
              ...
          protected '_fieldBoosts' => 
            array
              ...
      1 => 
        object(Apache_Solr_Document)[47]
          protected '_documentBoost' => boolean false
          protected '_fields' => 
            array
              ...
          protected '_fieldBoosts' => 
            array
              ...
     (...)
*/

// how does a document look like?
var_dump($res->response->doc[0]);

/*
object(Apache_Solr_Document)[46]
  protected '_documentBoost' => boolean false
  protected '_fields' => 
    array
      'publicationTime_i' => int 1257724800
      'publicationDate_t' => string 'Mon, 9 Nov 2009' (length=15)
      'url_s' => string 'http://news.mongabay.com/2009/1108-hance_corlett.html' (length=53)
      'language_s' => string 'EN' (length=2)
      'title_t' => string 'Hunting across Southeast Asia weakens forests' survival' (length=55)
      'text' => string 'A large flying fox eats a fruit ingesting its seeds.' (length=52)
      'id' => int 11408
      'relevance_i' => int 27
      'source_t' => string 'CBD News Headlines' (length=18)
  protected '_fieldBoosts' => 
    array
      'publicationTime_i' => boolean false
      'publicationDate_t' => boolean false
      'url_s' => boolean false
      'language_s' => boolean false
      'title_t' => boolean false
      'text' => boolean false
      'id' => boolean false
      'relevance_i' => boolean false
      'source_t' => boolean false
*/
nuqqsa
Thanks for the reply nuqqsa! Am I right in thinking the path var in the options array would be the path to the core? The solr files are currently outside of my root at the minute, so I'll have to move them inside I guess? Theres is a few more q's I have but I'll try and get my head round the documentation first, thanks again!
thebluefox
These examples make a call to Solr's REST interface at `http://localhost:8080/solr`, where /solr (path) is the path relative to the domain. This has to be adapted to wherever your instance of Solr API is located. What do you mean with that the files are outside of the root? And yes, http://lucene.apache.org/solr/tutorial.html#Getting+Started is a good start point! :)
nuqqsa