views:

119

answers:

3

For a particular PHP script I'm working on, I'm going to use a system call to the Lynx web browser to convert HTML to plain text and capture the output:

$text = `lynx -dump stackoverflow.com`;

/*
#[1]Stack Overflow [2]RSS

[3]login | [4]about | [5]faq
____________________________
[6]logo homepage
  * [7]Questions
  * [8]Tags
  * [9]Users
  * [10]Badges
*/

What I'd like to do however, is fallback gracefully onto a different method in case Lynx isn't available on the server. How do you check if a program exists in your PATH? Oh, and it needs to work on both Windows and Linux... :p

I'm not writing a SO screen scraper, don't worry...

+3  A: 

In Linux I would use which

which linx

No idea about windows. You can probably see what is the error message you get when calling a none existent app, I expect it is a very constant one
(Hope it isn't the blue screen, then you will have to use some way to color-pick the screen :-D )

Itay Moav
You can install 'which' on windows as well.
gacrux
A: 

One possible solution would be to use the popen function. Attempt to open an input pipe from lynx. If it returns false, you can use your fallback method. Take a look at the PHP popen function documentation for details and implementation examples.

I just noticed that this was for Windows, so, I'll have to use my fallback and say that your mileage will vary. If you want to ensure that the lynx utility is available, I would recommend that you make sure it's there and that you (and your script) know where it is. There's nothing wrong with a configuration file pointing to locations of the prerequisites to script execution.

Gary Chambers
A: 

Why not using curl_* functions or fopen or even fsockopen?

Actually, for page dump file_get_contents is enough (allow_url_fopen should be enabled in your php config).

Read the respective manuals on php.net to get more information.

Jet
Lynx converst html pages to nicely formatted plain text. file_get_contents() doesnt.
gnud
lyxn on windows... heh - good luck. I think, strip_tags will help you.
Jet
http://home.pacific.net.sg/~kennethkwok/lynx/
nickf
Well, if you need exactly lynx - then which under *nix and no idea about win... maybe google about registry access, but I'm not sure if you'll find something mature enough.
Jet