parsing

Trying to write a program / library like LogParser - How does it work internally?

LogParser isn't open source and I need this functionality for an open source project I'm working on. I'd like to write a library that allows me to query huge (mostly IIS) log files, preferably with Linq. Do you have any links that could help me? How does a program like LogParser work so fast? How does it handle memory limitations? ...

Python : How to convert markdown formatted text to text

I need to convert markdown text to plain text format to display summary in my website. I want the code in python. ...

What would be the Regular Expression to retrieve number in front of % symbol in string?

I'm using a math parser that uses % as the mod symbol. I'd like to use it for the percent symbol instead to allow users to type "25%*10" or "25% of 10" and receive the answer "2.5" (they could type anything). I'd then use the regex asked for to get the "25%" (could be in any part of the string) and do a simple calculation (25 / 100) an...

How to comment out calls to a specific API in Java source code

I want to comment out all calls to an API (java.util.Logging, in my case) in my codebase. Is there a good library to accomplish this easily? I tried Eclipse ASTParser, but that is tied to Eclipse. I am now struggling with PMD's parser. I haven't yet looked at it, but can Jackpot do this? Any other suggestions? ...

Is the ANTLR parser generator best for a C++ app with constrained memory?

I'm looking for a good parser generator that I can use to read a custom text-file format in our large commercial app. Currently this particular file format is read with a handmade recursive parser but the format has grown and complexified to the point where that approach has become unmanageable. It seems like the ultimate solution would...

Can you provide an example of parsing HTML with your favorite parser?

This question is a lazy way of collecting examples of parsing HTML with a variety of languages and parsing libraries. Individual comments will be linked to in answers to questions about how to parse HTML with regexes as a way of showing the right way to do things (similar to how I use Can you provide some examples of why it is hard to p...

Is there a Perl module for parsing columnar text?

Let's say I have a tab-delimited text file that contains data arranged in columns (with headers). It is possible that different columns may be "stacked" into a "worksheet"-like arrangement, i.e. there is some divider (that may or may not be known ahead of time) that allows different columns to be arranged vertically. Is there a Perl m...

Extracting data from Enterprise Architect model

I'm trying to programmatically extract information from an Enterprise Architect model (saved in an XMI file) - I need it to generate some reports, but I don't want to go so far as to create an EA add-in. Is there a C# XMI parser library anywhere? I could of course generate XMI parsing code from its XML schema, but that would be my secon...

C#: How to parse arbitrary strings into expression trees?

In a project that I'm working on I have to work with a rather weird data source. I can give it a "query" and it will return me a DataTable. But the query is not a traditional string. It's more like... a set of method calls that define the criteria that I want. Something along these lines: var tbl = MySource.GetObject("TheTable"); tbl.Ad...

How can I parse a string that contains wildcards and character classes in Ruby?

I would like to write a script that takes one argument that might look like this: abc(ag)de* a, b, c are literal characters. (ag) means "an 'a' or a 'g'". * means any one letter or number. I want the script to create an Array of all the possible strings the input could represent. (The purpose is to check if they're available domain...

Parsing CSV file with un-escaped quotation marks and commas in .NET

I was wondering if someone could help me. I have to parse data from a csv file and put this into a db table. An example of the data is as follows: "first field", "second , field", "third " Field " ", "fourth field" As you can see there are quotation marks and commas embedded in the fields. I was using ADO.NET but it had issues with t...

Is it possible to split the file contents using a custom pattern?

Hello, Is it possible to split the contents of file into parts that have specific pattern? This is what I want to achieve: Read the file using file_get_contents Read only contents between similar commented areas. I am not sure how complicated is that but basically If I am parsing a large html file and want only to display to the br...

Using Gecko/Firefox or Webkit got HTML parsing in python

I am using BeautifulSoup and urllib2 for downloading HTML pages and parsing them. Problem is with mis formed HTML pages. Though BeautifulSoup is good at handling mis formed HTML still its not as good as Firefox. Considering that Firefox or Webkit are more updated and resilient at handling HTML I think its ideal to use them to construct ...

Using C# and XDocument/XElement to parse a Soap Response

Here is a sample soap response from my SuperDuperService: <soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema"&gt; <soap:Body> <MyResponse xmlns="http://mycrazyservice.com/SuperDuperService"&gt; <Result>3234...

Parsing Large Text Files in Real-time (Java)

Hi all, I'm interested in parsing a fairly large text file in Java (1.6.x) and was wondering what approach(es) would be considered best practice? The file will probably be about 1Mb in size, and will consist of thousands of entries along the lines of; Entry { property1=value1 property2=value2 ... } etc. My first instinc...

CSS parser + XHTML generator, advice needed

Guys, I need to develop a tool which would meet following requirements: Input: XHTML document with CSS rules within head section. Output: XHTML document with CSS rules computed in tag attributes The best way to illustrate the behavior I want is as follows. Example input: <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" ...

Parsing commands from user input

This question is intended to be a discussion of people's personal opinions in handling user input. This portion of the project that I am working on handles user input in a manner similar to an IRC chat. For instance, there are set commands and whatnot, for chatting, executing actions, etc. Now, I have several options to choose from ...

parse dynamic function form string

I'm beginner in c++ and I hopeless to parse dynamic function form string like char* func = "app.exe /path:\"@FileExists('filepath', @FileDelete('filepath'), @MsgBox('file not found','error',1))\"; I want to parse @FileExists('filepath', @FileDelete('filepath'), @MsgBox('file not found','error',1)) can you help me? sorry for my en...

BugHunt - Javascript Parsing error?

Hello there, peoples of web! Maybe one of you will spot what's amiss in the following function: It is a javascript function used on a textbox client-side onblur event to validate that the entered text represents a correctly formatted and valid date. I tried a few times, and it worked, but it seems ! should've tried all dates! it will ...

What is .NET recommended practice to interact with MIME-emails with latest Windows OSes ?

We are using CDO interop (cdont.dll) in our current project for parsing incoming mime mails, but facing some bugs with Cyrillic code pages conversions. While looking for any MS supported replacements, we noticed that all available message parsing dlls marked with "Do not use" in MSDNLib (CDO, CDOex,CDOnt obviously, but inetcomm.dll for "...