questions about parsing | ansaurus

parsing

In python, I need to store one element of the source of an html page as a string. How can I do this?

So far I have managed to write some code that should print the source of the page. The problem is, it doesn't. I tried it with another web site, and it printed it out fine, so I used wget on the page "http://www.whitepages.com/carrier_lookup?carrier=other&number_0=2165138899&response=1" which should download the page for me. It g...

Parser as main file?

I am trying to write a big system that inputs data from a text file, and has a parser file. So, do I have to write a main file that would call the parser, and if so, how would I call the parser file, just code it like this? Parser parser = new Parser(); If not, what would be my options???? Thank you for your help :) FOR the parser h...

parsing specific values in JSON for ajax

I was given this code earlier but am having a hard time parsing the correct data. I have the following JSON { flavors: [{"image":"images/bbtv.jpg", "desc":"BioBusiness.TV", "id":"1"},{"image":"images/grow.jpg", "desc":"Grow Staffing", "id":"2"}]} and I want to only show id:1 or id:2. I have the following code for Ajax $.ajax({ ...

How do I print a line following a line containing certain text in a saved file in Python?

I have written a Python program to find the carrier of a cell phone given the number. It downloads the source of http://www.whitepages.com/carrier_lookup?carrier=other&number_0=1112223333&response=1 (where 1112223333 is the phone number to lookup) and saves this as carrier.html. In the source, the carrier is in the line after the...

Problem (un-)greedy RegExp

Consider the following Strings: 1: cccbbb 2: cccaaabbb I would like to end up with are matches like this: 1: Array ( [1] => [2] => bbb ) 2: Array ( [1] => aaa [2] => bbb ) How can I match both in one RegExp? Here's my try: #(aaa)?(.*)$# I have tried many variants of greedy and ungreedy modifications but it doe...

string-manipulation

Parsing of badly formated HTML in PHP

In my code I convert some styled xls document to html using openoffice. I then parse the tables using xml_parser_create. The problem is that openoffice creates oldschool html with unclosed <BR> and <HR> tags, it doesn't create doctypes and don't quote attributes <TABLE WIDTH=4>. The php parsers I know off don't like this, and yield xml ...

j2me reading html differs between WTK and device

I have built a mobile application in J2ME and it reads data from a website. In WTK (wireless toolkit) everything works now, but when I test the samen app on my mobile (nokia) device, it behaves differently: It gives another type of html back: it doesn't show a <hr> tag, but a <hr/> tag. There is a possibility that the remote website ...

Parse XML string in VB.NET

I am trying to parse a particular attribute/value pair from XML in VB.NET. The XML is originally a string that looks like XML but it needs to be converted to an XML-like datatype or structure before I can parse it. How can I convert this string into XML, and then parse the info that I need? EDIT: Dim doc As XDocument = XDocument.Pa...

Excel VBA - Parse Server Logs

heya, We have a small project that involves automatically parsing some server/mail logs (among other things). Now, Plan A was to just hack together some Python and Django to do this properly grins, but I got veto-ed and the solution has to be pure-Excel, as it's believed that will be more portable. 1. Importing tab-separated file Our ...

python parsing url after string

I want to extract a string from a url (link). That string is in a <h3></h3> tag. link = http://www.test.com/page.html Content of link: <h3>Text here</h3> What would be an elegant way to first get the content/sourcecode of page.html and then exctract the link? Thanks! ...

Read XHTML with XDocument?

How to read XHTML with XDocument, without downloading DTD. How to resolve the DTD references? No, you can't just say: settings.ProhibitDtd = false; settings.XmlResolver = null; as given in some previous answer, which is absolutely wrong. What about the entities then:   Also I am not interested in HTMLAgility pack, again wrong an...

Java: Parse Australian Street Addresses

Looking for a quick and dirty way to parse Australian street addresses into its parts: 3A/45 Jindabyne Rd, Oakleigh, VIC 3166 should split into: "3A", 45, "Jindabyne Rd" "Oakleigh", "VIC", 3166 Suburb names can have multiple words, as can street names. See: http://stackoverflow.com/questions/1739746/parse-a-steet-address-into-compon...

Ideal Java library for cleaning html, and escaping malformed fragments

I've got some HTML files that need to be parsed and cleaned, and they occasionally have content with special characters like <, >, ", etc. which have not been properly escaped. I have tried running the files through jTidy, but the best I can get it to do is just omit the content it sees as malformed html. Is there a different library th...

Edit JSON-Parser to parse geoJSON?

Hey, I want to use geoJSON-formatted Data in my iPhone app. THere is a JSON parser but no geoJason parser. Anyone can please help me? How do I have to edit the JSON parser to get geoJSON parsing successful? Is there any geoJson parser for Objective-C out there? Thanks a lot. ...

Pcrepp - Perl Regular Expression syntax to match host name

Possible Duplicate: The Hostname Regex I'm trying to use pcrepp (PCRE) to extract hostname from url. the pcre regular expression is as same as Perl 5 regular expression. for example: url = "http://www.pandora.com/#/volume/73"; // the match will be "http://www.pandora.com/". I can't find the correct syntax of the regex for ...

Strip Down A String at Colon

Hi all, I have a textarea where a user copies and pastes the entire message: Time(UTC): 2010-02-27T21:58:20.74Z Filesize : 9549920 bytes IP Address: 192.168.1.100 IP Port: 59807 Using PHP, how can I automate this and parse this down to 4 separate variables, like so: <?php $time = 2010-02-27T21:58:20.74Z; $filesize = 9549920; $i...

Error about invalid XML characters on Java

Parsing an xml file on Java I get the error: An invalid XML character (Unicode: 0x0) was found in the element content of the document. The xml comes from a webservice. The problem is that I get the error only when the webservice is running on localhost (windows+tomcat), but not when the webservice is online (linux+tomcat). How can I ...

Parsing variable record lengths in Preon

I'm trying to use Preon to parse binary files, which are structured as a sequence of variable length records. For each record, there's a number which specifies the record length (in bytes). Here's a simplified version of what I'm trying to do: package test.preon; import nl.flotsam.preon.annotation.BoundList; import nl.flotsam.preon....

JSON Parser error

Hello, I've added the JSON parser to my project and try to parse a JSON string. On most strings it works as it should, but sometimes it isn't. My first thought was, that the JSON string is not well formed, but I've checked it with several JSON validators and they all say it's correct. I additionally checked the string for some line brea...

Merging HTML files

I want to merge one HTML file into another. Not just include it, but merge. Example master.html: <!DOCTYPE html> <html> <head> <title>My cat</title> </head> <body> <h1>My cat is awesome!</h1> </body> </html> _index.html: <!DOCTYPE html> <html> <body> <p><img src="cat.jpg"/></p> </body> </html> Now I merge ...

1
...
110
111
112
113
114
...
207