views:

116

answers:

1

Hi, I am trying to implement a dictionary-type service. I send a request with php using cURL to dict.org with the dict protocol. This is my code (which on its own works and may be helpful for future readers):

$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "dict://dict.org/define:(hello):english:exact");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$definition = curl_exec($ch);
curl_close($ch);

echo $definition;

The server returns the definition, as expected, along with several headers (that I do not need). The response looks something like this:

220 miranda.org dictd 1.9.15/rf on Linux 2.6.26-2-686 <auth.mime> <[email protected]>
250 ok
150 3 definitions retrieved
151 "Hello" gcide "The Collaborative International Dictionary of English v.0.48"
Hello \Hel*lo"\, interj. & n.
   An exclamation used as a greeting, to call attention, as an
   exclamation of surprise, or to encourage one. This variant of
   {Halloo} and {Holloo} has become the dominant form. In the
   United States, it is the most common greeting used in
   answering a telephone.
   [1913 Webster +PJC]
(... some content removed)

.
250 ok [d/m/c = 3/0/162; 0.000r 0.000u 0.000s]
221 bye [d/m/c = 0/0/0; 0.000r 0.000u 0.000s]

I was wondering if:

a) Is there a way to specify to curl (or an option in the dict protocol) to not return all that extra information (i.e. 250 ok [d/m/c = 3/0/162; 0.000r...])

b) You probably noticed that the dict response returns information that is not displayed in the most user friendly way. I was wondering if anybody knew of any existing php library that will allow me to display this in a nicer way. Otherwise I'd have to code my own.

c) If this is not the way most dictionary websites retrieve their definitions, how do they do it? In my understanding the most comprehensive dictionary database is the one at dict.org (which supports the dict protocol and is where I am sending my cURL request).

Thank you!

+1  A: 

Before I start let me state that I don't know the specific of the dict protocol.

I doubt that you'll be able to create a request that only delivers the text. The information you wish to discard looks like status information and is therefore useful.

The way I'd handle this is as follows:

  1. Read the curl response data into an array so that each line is an separate entry in the array. You could use explode() and split at the new line character (\n) to do this.
  2. Iterate the array, EG for ($response as $responseLine) {}
  3. perform a regex (or some other form of pattern matching) on $responseLine to find the definition. It looks like the actual text is the only $responseLine which doesn't start with a number.

You may want to check what characterset the dict protocol uses. I haven't mentioned any error handling, but that should be straight forward.

Benedict Cohen
thank you! do you happen to know what the <auth.mime> does?
yuval
that's exactly how I went about doing that. Thanks!
yuval