views:

320

answers:

2

After connecting to a socket and capturing the response using .read() how do I parse the input stream and read lines?

I see the data is returned without any CRLF

<html><head><title>Apache Tomcat/6.0.16 - Error report</title><style><!--H1 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:22px;} H2 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:16px;} H3 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:14px;} BODY {font-family:Tahoma,Arial,sans-serif;color:black;background-color:white;} B {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;} P {font-family:Tahoma,Arial,sans-serif;background:white;color:black;font-size:12px;}A {color : black;}A.name {color : black;}HR {color : #525D76;}--></style> </head><body><h1>HTTP Status 404 - /index.html</h1><HR size="1" noshade="noshade"><p><b>type</b> Status report</p><p><b>message</b> <u>/index.html</u></p><p><b>description</b> <u>The requested resource (/index.html) is not available.</u></p><HR size="1" noshade="noshade"><h3>Apache Tomcat/6.0.22</h3></body></html>
A: 

Use an HTML parser. Beautiful Soup seems to be a popular one.

danben
To the downmodder: do you care to explain your vote?
danben
A: 

You have to parse the HTML. Python has several ways of parsing HTML - one of them the built-in HTMLParser module. Another, and probably better way, is the 3rd party BeautifulSoup module.

Many other issues dealing with HTML processing are explained in this nice article. You can also read the relevant chapter of the (free online) Dive into Python book.

Eli Bendersky
I wonder why this was downmodded?
Eli Bendersky