views:

449

answers:

5

Hi, does anybody has a clue why downloaded html code via Wifi is different from the same url's content using Edge/3G? I noticed it when using the function because the output from Wifi has much more lines than the output from 3G. Analyzing the code I could see that only a few line breaks were detected. Here is the code:

NSString *htmlCode = [NSString stringWithContentsOfURL:[NSURL URLWithString:@"www.any.url"]];
NSArray *htmlCodeByLines = [[NSArray alloc] init];

htmlCodeByLines = [htmlCode componentsSeparatedByString:@"\n"];

fg

A: 

Since you are using HTML, it could have something to do with what the intervening layers are putting in your client info. Different servers often send back different results based on what they think your capabilities are. This is particularly true for mobile browsers.

I'd suggest getting out a packet sniffer and taking a look at the traffic. Wireshark on a laptop connected with a dumb hub is best, but that would be tricky w/ wireless/3G. You could instead run it on the target (server) system. In particular, look for differences in the HTTP packets you are sending out for the two different configurations.

On the other hand, you could also check to see if the incomming packets contents look the same. That would tell you that something on your end is stripping stuff. Not likely I'd think, but perhaps possible.

T.E.D.
A: 

Does your Wifi network include an HTTP proxy?

Nelson
More likely the carrier network has a transparent HTTP proxy that's doing data compression to give the appearance of a faster download
AlBlue
A: 

no the Wifi has no proxy. Anyhow, I don't have any problems with wifi but w/ Edge/3G. I really didn't expect difficulties like that. Meanwhile I think the problems are caused by my provider. Unfortunately I'm not able to sniff the packets to compare differences between Wifi and Edge/3G.

A: 

Why are you assuming that the HTML's newlines have any relevance at all to what you're trying to do with it? There's no real significance to newlines in HTML.

Sites that optimise for mobile browsers will usually strip out all insignificant whitespace (including newlines) to speed the download. As well as browser sniffing, there may be differences in the generated result from the server based on which IP address you connect from, and there may be transparent HTML proxies in the way (for http requests) that you don't know about on the 3G network doing the translation.

In summary, you shouldn't be associating any significance to the existence of newline characters in an HTML page, nor should you expect that there's no transparent proxies in between you and the site if you're going over a mobile network. (If you can view the site in HTTPS, you probably find that the transparent proxy will not do any data re-writing for obvious reasons.)

AlBlue
A: 

thanks for that hint AlBlue. Meanwhile I try to use NSScanner. I guess the provider strips the html code when it's send to a mobile device.