parsing

email parsing system

i am building a system for automatically parsing incoming emails and populating a database from them initially there will only be 10-20 expected formats coming in, but long term there is the possibility of thousands of different formats the way i see it i need to identify format of email (eg regex on subject line) parse the email wi...

Command-line web browser that outputs the DOM

I'm looking for a way to process a web page and associated Javascript from the command-line, so that the resulting DOM model can be outputted. The purpose for this is to identify forms within the page without doing any nasty HTML (and Javascript) parsing with regular expressions. Are there any command-line tools that will do this? So ...

Most optimal way to parse querystring within a string in C#

I have a querystring alike value set in a plain string. I started to split string to get value out but I started to wonder that I can proabably write this in one line instead. Could you please advice if there is more optimal way to do this? I am trying to read "123" and "abc" like in Request.QueryString but from normal string. protect...

Regex to extract elements by class name

Greetings! I have some HTML that may or may not be valid. If the HTML is invalid, a best attempt can be made, and any errors that arise are acceptable (ie, grouping too much because some tag isn't closed correctly). In this HTML are a variety of elements, some of which may have a class (call it "findme"). These elements are of varying ...

Linux languages/tools specific for log parsing

Is there such a thing? Maybe like Microsoft's LogParser? I know there's sed/awk, but I'm curious if there are any specific tools or even programming languages. PS: I'm not sure this belongs here or on SF. ...

Is it possible to check PHP file syntax from PHP?

I load dynamically PHP class files with autoload. And those files could be missing or corrupted by some reason. Autoload will successfully report missing files so application logic could handle that. But if those files are corrupted, then the whole processing halts with blank screen for the user and "PHP Parse error: syntax error" in er...

Parse Italian Date with Ruby

I would like to use Date.parse, but it doesn't work with Italian month names! Date.parse "26 agosto 1991" => Sun, 26 Jul 2009 Is there any alternative? ...

Are there any good tutorials that describe how to use ANTLR to parse boolean search strings

I need to parse a boolean search string of keywords and operators into a SQL query to be executed from C# code. I think something like ANTLR is what I need for this task, but I'm not sure how to do it. Are there any good tutorials on how to do this? Or maybe I need a different tool? An example of what I mean is below. The only operator...

ROME API to Parse RSS/ATOM

Hey folks. Trying to parse rss/atom feeds with the ROME library. Need some help. I am new to java so I am not in tune with many of it's intricacies. 2 things. Does ROME automatically use it's modules to handle different feeds as it comes across them, or do I have to ask it to use them. If so, any direction on this. How do I get to ...

nginx: parse *.cgi as php

Is it possible to parse *.cgi files as php, and how? location ~ \.php$ { fastcgi_pass 127.0.0.1:9000; fastcgi_index index.php; fastcgi_buffer_size 128k; fastcgi_buffers 4 256k; fastcgi_param SCRIPT_FILENAME /srv/htdocs$fastcgi_script_name; include /etc/nginx/fastcgi_params; } ...

PHP and XML: The cost of parsing a large XML file every page request.

What is the cost of parsing a large XML file using PHP on every page request? I would like to implement custom tags in HTML. <?xml version="1.0"?> <html> <head> <title>The Title</title> </head> <body> <textbox name="txtUsername" /> </body> </html> After I load this XML file in PHP, I search for the cus...

xml parsing in iPhone... question about getting other tags with same names

Let me try to explain as clear as possible what I mean exactly with this question. let's say we use this example http://www.iphonesdkarticles.com/2008/11/parsing-xml-files.html and the xml looks instead of like this <Books> <Book id="1"> <title>Circumference</title> <author>Nicholas Nicastro</author> <summary>Eratosthenes and the A...

Manually parse string as XAML Attribute

How does the XAML Parser convert the string "Red" in Foreground="Red" to a SolidColorBrush? Allthough I know the Types have System.ComponentModel.TypeConverter defined, I doupt that the WPF XAML parser acutally always uses those to convert the string to the brush. Are there any XAML APIs apart from XamlReader.Load (wich wants a valid xml...

Parse JSON in C#

I'm trying to parse some JSON data from the Google AJAX Search API. I have this URL and I'd like to break it down so that the results are displayed. I've currently written this code, but I'm pretty lost in regards of what to do next, although there are a number of examples out there with simplified JSON strings. Being new to C# and .NET...

Parsing PHP/JavaScript document structure in Delphi

Hi, I need to parse PHP & JavaScript documents structure to get the info about document functions & their parameters, classes & their methods, variables, and so on ... I'm wondering if there is any solution for doing that (no regular expressions) ... I've heard about something called "lexing" however I was unable to find any examples eve...

Javascript date parsing bug - fails for dates in June (??)

I have some javascript which parses an ISO-8601 date. For some reason, it is failing for dates in June. But dates in July and May work fine, which doesn't make sense to me. I'm hoping a fresh set of eyes will help, because I can't see what I'm doing wrong here. Function definition (with bug) function parseISO8601(timestamp) { var ...

Break a CSS file into an array with PHP

I wanted to break a css file into an array with PHP. Ex: #selector{ display:block; width:100px; } #selector a{ float:left; text-decoration:none; } Into a php array... array(2) { ["#selector"] => array(2) { [0] => array(1) { ["display"] => string(5) "block" } [1] => array(1) { ["width"] => string(5) "100px" ...

Identifying a Page's Primary Content

Given an HTML page that is a text heavy article, I would like to identify and parse out the primary content. Using http://www.fivethirtyeight.com/2009/08/chavismo-obama-and-monroe-doctrine.html as an example, I want to identify div#post-4438372351887392855, which contains the title and article. I know nothing can be perfect or work 100...

Parse XML iLO response file with Perl

I have the folowing XML file generated by my iLO HP server, Do you have any examples of how can i parse it ? See the example XML file below. I would like to extract fan speeds and temperatures from it. <?xml version="1.0"?> <GET_EMBEDDED_HEALTH_DATA> <FANS> <FAN> <LABEL VALUE = "Fan 1"/> <ZONE VALUE = "System"/> <ST...

Is there a good html parser like HtmlAgilityPack (.NET) for Python?

I'm looking for a good html parser like HtmlAgilityPack (open-source .NET project: http://www.codeplex.com/htmlagilitypack), but for using with Python. Anyone knows? ...