tags:

views:

120

answers:

4

I'm trying to grab/fetch text from another URL using cURL. The location of where I grab the text from is within a blank HTML document with dynamic (not static) data, therefore there are no HTML tags to filter. This is what I've got so far:

$c = curl_init('http://url.com/dataid='.$_POST['username']);
curl_setopt(CURLOPT_RETURNTRANSFER, true);
curl_setopt(CURLOPT_FRESH_CONNECT, true);

$html = curl_exec($c);

if (curl_error($c))
die(curl_error($c));

// Get the status code
$status = curl_getinfo($c, CURLINFO_HTTP_CODE);

curl_close($c);

This works perfectly, however at the end of dynamic HTML document there is un-required text, "#endofscript" (without quotations). This gets grabbed/fetched, so what can be done to not grab that? I've tried looking at "strpos" and such but I'm unsure on how to integrate that with cURL.

All/Any help will/would be appreciated. :)

EDIT: The code I'm currently using:

<?php

$homepage = file_get_contents('http://stackoverflow.com/');

$result = substr("$homepage", 0, -12);

echo $result;

?>
+1  A: 

why not to use simply

<?php
$homepage = file_get_contents('http://www.example.com/');
echo $homepage;
?>

http://php.net/manual/en/function.file-get-contents.php

GOsha
This is a fine solution if the code only runs on your server, and your server supports `file_get_contents`. But if this code is to be distributed (like in a WordPress plugin, for example), you can not rely on server settings permitting the use of `file_get_contents`. At a minimum, there should be a fallback to `curl` in this instance.
kingjeffrey
Thank you for the answer and the comment, I appreciate them. No, the code won't be distributed and will stay on one server. I've now applied that code onto the web page rather than using cURL however, it simply brings the same result. I need to find a way so that the text "#endofscript" doesn't get displayed. Kind regards. :)
AUllah1
It was only the simplest way to work with))) You are welcome)))
GOsha
A: 

Thank you all for your help, I can't say how much I appreciate them! Using the script given by GOsha, I managed to modify it so that it removes the end text. The code used is as below:

<?php

$homepage = file_get_contents('http://url.com/dataid='.$_POST['username']);

$rest = substr("$homepage", 0, -12);
echo $rest;

?>

This has now been answered. Thank you all, I am very thankful for all your responses. :)

AUllah1
+1  A: 

You could use preg_replace() to remove all lines starting with a "#" for example:

$res = preg_replace('/^#.*$[\\r\\n]*/m','',$dat);

or just

'/#endofscript$/'

to match the thingie at the end.

substr/str_replace/some other string-functions will work as well.


Some example code how to implement the substr/preg_replace method:

<pre><?php

$dat = 'Lorem ipsum dolor sit amet,
        consectetur adipisicing 
        elit #endofscript';

// either
if (substr($dat,-12) == '#endofscript')
    $res = substr($dat,0,-12);

var_dump($res);

// or
$res = preg_replace('/#endofscript$/','',$dat);
var_dump($res);

?></pre>
Kuchen
Thanks for the reply Kuchen, I'd like to use this method as I noticed substr not only removed the #endofscript text, but all the last several letters grabbed (So, if #endofscript wasn't displayed, it still erased text). How would I go about applying your method within the script? Especially as my content is fetched/grabbed. Also, the content which I fetch/grab is all one line, therefore the first option can't be used. Once again, I appreciate your response. :)
AUllah1
you could could check if (substr($homepage,-12) == '#endofscript') before using substr to actually remove it, might be faster than the regex. Other than that, just use the preg_replace line with the 2nd expression, where $dat is your $homepage. :-)
Kuchen
Hey Kuchen, again thanks for the response. I like the idea of checking before using the substr feature, but how do I add that feature within my script? I've attempted it but failed, sorry, I'm still learning a couple of things as I go along. And using the preg_replace, how do I use this too? After adding it within the script, I added "echo $res;" which didn't seem to do the trick, it still displayed the text "#endofscript". Thank you for your reply. :)
AUllah1
Added some example code to the answer that should help you out
Kuchen
Thank you for the assistance Kuchen! I really appreciate it, the script now fully works. :)
AUllah1
+1  A: 

Since you're saying that this bad text might append to the output, you could use something like this code (wrap it in a function for easier coding experience):

<?php
define("bad_text", "#endofscript");

$feed_text = "here is some text#endofscript";
$bExist = false;
if(strlen($feed_text) >= constant("bad_text"))
{
    $end_of_text = substr($feed_text, strlen($feed_text) - strlen(constant("bad_text")));
    $bExist = strcmp($end_of_text, constant("bad_text")) == 0;
}

if($bExist)
    $final_text = substr($feed_text, 0, strlen($feed_text) - strlen(constant("bad_text")));
else
    $final_text = $feed_text;

echo $final_text;
?>
Poni
Hi Poni, I'm very thankful of your reply and think that your coding is pretty fascinating. However, the content which I use is grabbed/fetched, therefore I don't think this code will work for it as I attempted to apply the grabbed content onto it. Do you think there is a work around for this or am I simply doing it wrong? Thank you for your reply and time. :)
AUllah1
Thanks for the nice feedback! What do you mean by "grabbed/fetched"? And, you said the feed is text, or is it binary? .. and, just so you know - each time in the above code that we call strlen() we waste CPU cycles - you better call it once and put it in "$feed_len"... just a quick optimization.
Poni
Thank you for the response, it's appreciated. :) When I state "grabbed/fetched" I mean that the text has been transferred onto my web site using "file_get_contents" (php function) and that the text isn't manually entered. Although the text is still text and not binary. After using "file_get_contents" to get text, your php script doesn't seem to remove the "#endofscript" text which is fetched. Again, thank you for your effort and your reply! :)
AUllah1
The only reason for that that I can think of is that my script above relies on the fact that "#endofscript" is going to be the very last chars in the whole text. Did it this way because it would be inefficient to pass through all the text and search for the last instance of this string. If it's not the case, and "#endofscript" might not be the last string in the text then yes, you need to use another function to get the offset of its instance. I'm sure you can find the PHP function to do so (:
Poni
Hey Poni, sorry for any hassle I'm causing you and again, appreciate your response. The text "#endofscript" is indeed the last text which is displayed once "file_get_contents" is used. For "$feed_text", I've made it so that it looks like: "$feed_text = $homepage;" so that the fetched content is the feed text, however, this doesn't seem to work and remove the "#endofscript" text. This script does work when it's simple text, such as "here is some text#endofscript". Sorry for the hassle, and I do extremely appriciate your support.
AUllah1
You're most welcome, no hassle is caused (: I suggest you put it nicely in a code example and edit you original post to include that code, so I can copy/paste and run. Let's take http://stackoverflow.com/ as the target page to be fetched, and let's remove its last string - "</html>". When I can run it myself I'm sure I'll be able to help you solve it.
Poni
Thank you for the assistance. I've now updated my initial/first post with the code I'm using to grab the content and remove the last 12 letters. However, ignore the substr as I'm trying to remove the text "#endofscript" if displayed. Again, appreciate your response.
AUllah1
It works for me. Copy/Paste the code and ran it - got the whole content less 12 chars. Do you have any problem? Maybe you should check your PHP configuration or something?
Poni
It works but it removes the last 12 characters, even if the tag </html> isn't displayed. None the less, I appreciate your response and help! The script is now fully working. :) Thank you!
AUllah1
Good, I'm guessing that all you had to do is simply compare the last 12 chars to "#endofscript" and only then remove if needed. Happy to be of an help, and would appriciate if you'd choose an answer for this question (:
Poni