tags:

views:

60

answers:

2

I am creating a web app that accepts input of news items (title, article, url). It has a page news.php which creates a summary of all news items inputted for specified dates, like so:

News
4/25/2010

Title 1
[URL 1]
Article 1

Title 2
[URL 2]
Article 2

and so on...

I have two other pages, namely preview.php and send.php , both of which call news.php through a file_get_contents() call.

Everything works fine except when the URL contains spaces. During Preview, the urls get opened (FF: spaces are spaces, Chrome: spaces are %20). However, during Send, when received as emails, the urls don't get opened, because the spaces are converted into + signs.

For example:

  1. Preview in FF: http://www.example.com/this is the link.html
  2. Preview in Chrome: http://www.example.com/this%20is%20the%20link.html
  3. Viewed as email in both browsers: http://www.example.com/this+is+the+link.html

Only #3 doesn't work (link doesn't get opened).

Why are the spaces in the urls correct (spaces or %20) when previewed, but incorrect (+) when received in the emails, when in fact, the same page is generated by the same news.php?

Any help appreciated :)


EDIT:

preview.php:

$HTML_version = file_get_contents('news.php');
echo $HTML_version;

send.php

$HTML_version = file_get_contents('news.php');
$body = "$notice_text

--$mime_boundary
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

$TEXT_version

--$mime_boundary
Content-Type: text/html; charset=us-ascii
Content-Transfer-Encoding: 7bit

$HTML_version

--$mime_boundary--";
//some other code here to send the email

news.php:

<a href="<?php echo $url ?>">attachment</a>
//the $url there contains spaces
+2  A: 

The + signs are a legacy substitute for space or %20, so they should work fine. Since they don't I would recommend that you manually converted all spaces in the URL to %20. That should fix the problem.

Jakob Kruse
i'll try this out. i did add some edits above though, could you check it out?
Obay
thanks man, your suggestion worked! however, i was thinking, i also need to manually convert other characters into their % counterparts right? what other characters, like the space, gets converted into something like %something? you know what these things are called? i'd like to google and see a list of all these characters...
Obay
If your URL includes parameters then those should be encoded using rawurlencode: http://www.php.net/manual/en/function.rawurlencode.php. Don't use this on your entire URL though. With that you should be pretty safe just encoding space characters.
Jakob Kruse
+1  A: 

What mail client are you using? Mail clients' handling of HTML is subject to extreme limitations and bugs.

<a href="<?php echo $url ?>">attachment</a>
//the $url there contains spaces

URLs don't contain spaces, by definition. If you include a space in a link in HTML:

<a href="x y.z">

the browser will typically fix your error, by encoding to x%20y.z. However this is not a standardised behaviour and you should not rely on it. It is, I suppose, possible that some dodgy mail client could be misguidedly ‘fixing’ it up to x+y.z instead, which wouldn't work because + in the path part of a URL does not mean a space.

Use rawurlencode() for URL-encoding. With this function, spaces are converted to %20, which is appropriate for URL path-parts and query strings. PHP's misleadingly-named urlencode() function encodes to + instead, which is only appropriate in form data in the query string.

You also need to use htmlspecialchars() any time you output a string to HTML.

$name= 'this is the link';
$url= 'http://www.example.com/'.rawurlencode($name).'.html';

<a href="<?php echo htmlspecialchars($url); ?>">link</a>
bobince
very helpful dude! thanks..i tested in Gmail and Lotus Notes, and they behave the same.anyway, i dont think i can use rawurlencode() since the urls vary for each news item (e.g. no common base url). i think i'll just manually str_replace(' ', '%20', $url). As the other commenter pointed out below, converting the spaces should be enough. do you think so too? or do i also have to worry about other characters?
Obay
There are plenty of other characters that are invalid in URIs, for example `"<{}|\^~[]`, the backtick, control characters and all non-ASCII characters. Whether you have to worry about them depends on where your input is coming from. If someone's sending you “URLs” that have spaces in them it sounds like they're just randomly sticking strings together without knowing what they're doing, so, yeah, you might get other characters to worry about.
bobince