views:

146

answers:

4

I would like to submit a form to a CGI script localy (w3c-markup-validator), but it is too slow using curl and apache, I want to use this CGI script more than 5,000 times in an another script. and currently it takes more than one hour.

What should I do to give the form directly to the CGI script (I upload a file with curl)?

edit: It seems to be too complicated and time consuming for what I needed, so I waited 1 hour and a half, each time I needed to test my generated xhtml files. In definitive I didn't test any of the answers below, so the question will remain open.

+4  A: 

Depending on the details of the script you might be able to create a fake CGI environment using HTTP::Request::AsCGI and then sourcing the CGI script with the "do" operator. But when it comes to speed and maintainability your best bet would be to factor the important part of the script's work into its own module, and rewrite the CGI as a client of that module. That way you don't have to invoke it as a CGI -- the batch job you're talking about now would be just another program using the same module to do the same work, but without CGI or the webserver environment getting in the way.

hobbs
thanks for your help! I'll try HTTP:Request::AsCGI; it's not my script it's w3c-markup-validator check perl script; and I didn't find any module for doing so. and since I'm not a perl guru I don't think I'll will refactor w3c work.
Pierre Guilbert
+1  A: 

CGI is a pretty simple API. All it does is read data either from an environment variable (for GET requests) or from stdin (for POST requests). So all you need is to do is to set up the environment and call the script. See the docs for details.

Aaron Digulla
It's not as simple as you seem to think it is. In the case of file upload you would also need to set up the mime encodings. It doesn't just read the file directly from STDIN.
Kinopiko
thanks for the advice, if I didn't suceed trying all your answer, I'll maybe use tidy.
Pierre Guilbert
+1 You're right; you must property encode the file. But that's not too hard, either: The mime header is pretty much static, the length comes from an environment variable, so all that's left is the encoding of the data.
Aaron Digulla
ps: I will not use tidy.. it doesn't validate as well as w3c-markup-validator
Pierre Guilbert
+2  A: 

OK, I looked at the source code for this thing and it is not easy extract the validation stuff from all the rest. So, here is what I would.

First, ditch curl. Starting a new process for each file you want to validate is not a good idea. You are going to need to write a driver script that takes a list of URL's and submits them to your local server running on localhost. In fact, you might later want to parallelize this because there will normally be a bunch of httpd processes alive anyway. Well, I get ahead of myself.

This script can use LWP because all you are doing is submitting some data to the CGI script on localhost and storing/processing results. You do not need full WWW::Mechanize functionality.

As for the validator CGI script, you should configure that as a mod_perl registry script. Make sure you preload all necessary libraries.

This should boost documents processed per second from 1.3 to something more palatable.

Sinan Ünür
I was using curl like that: curl -F output=soap12 -F "uploaded_file=@/path;type=text/html" http://localhost/w3c-markup-validator/check; and you're right it was not a good idea. how the driver script is working? I'll give all the file to curl with LWP?Thansk!
Pierre Guilbert
You submit the file and any other options to the CGI script using LWP. There is no need for `curl`. Of course, you could use `libcurl` as well. See http://search.cpan.org/perldoc/WWW::Curl
Sinan Ünür
A: 

If the script uses CGI.pm, you can run it from the command line by supplying the '-debug' switch (to CGI.pm, in the use statement.) That will then allow you to send the post variables on stdin. You may have to tweak the script a little to make this work.

Sean McMillan
the w3c-markup-validator perl script, doesn't use cgi.pm
Pierre Guilbert