parsing

Parsing C++ preprocessor #if statements

I have a C/C++ source file with conditional compilation. Before I ship it to customers I want to remove most of the #if statements, so that my customers do not need to worry about passing the right -D options to the compiler. I have this implemented and working in Python, but it only handles #ifdef and #ifndef statements properly. I n...

What is the fastest way to Parse a line in Delphi?

I have a huge file that I must parse line by line. Speed is of the essence. Example of a line: Token-1 Here-is-the-Next-Token Last-Token-on-Line ^ ^ Current Position Position after GetToken GetToken is called, returning "Here-is-the-Next-Token" and sets the Cur...

Get the subdomain from a URL

Getting the subdomain from a URL sounds easy at first. http://www.domain.example Scan for the first period then return whatever came after the "http://" ... Then you remember http://super.duper.domain.example Oh. So then you think, okay, find the last period, go back a word and get everything before! Then you remember http://su...

Canonical way to parse the command line into arguments in plain C Windows API

In a Windows program, what is the canonical way to parse the command line obtained from GetCommandLine into multiple arguments, similar to the argv array in Unix? It seems that CommandLineToArgvW does this for a Unicode command line, but I can't find a non-Unicode equivalent. Should I be using Unicode or not? If not, how do I parse th...

Vb6: Separating Tab Delimited Text

I have a file with several thousand rows and several columns separated with tabs What I'd like to do is loop through each individually, Drop the columns into an array so that I can place them in another application individually, then move onto the next line. Unfortunately I got about as far as this: Open mytextfile.txt For Input As #Fi...

Robust, Mature HTML Parser for PHP

Are there any robust and mature HTML parsers available for PHP? A quick skimming of PEAR didn't turn anything up (lots of classes for generating HTML, not so much for consuming), and Google taught me a lot of people have started and then abandoned a variety of parser projects. Not interested in XML parsers (unless then can consume non-...

HtmlAgilityPack Drops Option End Tags

I am using HtmlAgilityPack. I create an HtmlDocument and LoadHtml with the following string: <select id="foo_Bar" name="foo.Bar"><option selected="selected" value="1">One</option><option value="2">Two</option></select> This does some unexpected things. First, it gives two parser errors, EndTagNotRequired. Second, the select node ha...

Python help - Parsing Packet Logs

I'm writing a simple program that's going to parse a logfile of a packet dump from wireshark into a more readable form. I'm doing this with python. Currently I'm stuck on this part: for i in range(len(linelist)): if '### SERVER' in linelist[i]: #do server parsing stuff packet = linelist[i:find("\n\n", i, len(linelist))] linelist i...

LINQ to XML: parsing XML file which one of nodes presents type of another node

Helo! Is this possible to use string value of one node which tells what type of field is presented in another node using LINQ to XML? For example: <node> <name>nodeName</name> <type>string</type> </node> <node> <name>0</name> <type>bool</type> </node> <node> <name>42</name> <type>int</type> </node> Thanks in advance ...

SGML Parser in Plain C

I'm looking for an open-source SGML parser written in plain C. This is to parse bona-fide SGML, not malformed stuff. Any ideas? ...

Read (and write) RTF files with C++ / Qt

Hello, I am looking for a simple C++ library for tokenizing and parsing RTF (Rich Text Format) files. I am planning to edit them with Qt's QTextEdit. More the Formatting preserved the better -- but actually I am planning to use Bold and Italics only. In perl I would use RTF::Tokenizer. It would be nice if the module had some sort of ...

PHP YAML Parsers

Does anyone know of a good YAML Parser for PHP? If so, what are the pros and cons of this library? Update: Starting a bounty to get fresh input. What's the status of YAML parsers in 2010? Any new developments? ...

Parse C files

I am looking for a Windows based library which can be used for parsing a bunch of C files to list global and local variables. The global and local variables may be declared using typedef. The output (i.e. list of global and local variables) can then be used for post processing (e.g. replacing the variable names with a new name). Is such...

How can I parse relative dates with Perl?

I'd love to know if there is a module to parse "human formatted" dates in Perl. I mean things like "tomorrow", "Tuesday", "next week", "1 hour ago". My research with CPAN suggest that there is no such module, so how would you go about creating one? NLP is way over the top for this. ...

How to make the keywords recognizable in simpleparse?

I've been trying to create a parser using simpleparse. I've defined the grammar like this: <w> := [ \n]* statement_list := statement,(w,statement)? statement := "MOVE",w,word,w,"TO",w,(word,w)+ word := [A-Za-z],[A-Za-z0-9]*,([-]+,[A-Za-z0-9]+)* Now if I try to parse a string MOVE ABC-DEF TO ABC MOVE DDD TO XXX The second statement ...

parse meta tags in Java

Hi, I have a collection of HTML documents for which I need to parse the contents of the <meta> tags in the <head> section. These are the only HTML tags whose values I'm interested in, i.e. I don't need to parse anything in the <body> section. I've attempted to parse these values using the XPath support provided by JDom. However, this i...

How do i parse a text file in c#.

How do i parse a text file in c#? ...

Microsoft word Text Parser in "C"

Hi, I would like to know the procedure to adopt to parse and obtain text content from Microsoft word (.doc and .docx) documents . programming language used should be plain "C" (should be gcc). Are there any libraries that already do this job, extension : can i use the same procedure to parse text from Microsoft power point files also ...

Best way to handle mixed HTML and in user input?

In a PHP application I am writing, I would like to have users enter in text a mix of HTML and text with pointed-brackets, but when I display this text, I want to let the HTML tags be rendered by the non-HTML tags be shown literary, e.g. a user should be able to enter: <b> 5 > 3 = true</b> when displayed, the user should see: 5 > 3 = ...

parse http response header from wget

Im trying to extract a line from wget's result but having trouble with it. This is my wget call: $ wget -SO- -T 1 -t 1 http://myurl.com:15000/myhtml.html --18:24:12-- http://xxx.xxxx.xxxx:15000/myhtml.html => `-' Resolving xxx.xxxx.xxxx... xxx.xxxx.xxxx Connecting to xxx.xxxx.xxxx|xxx.xxxx.xxxx|:15000... connected. HTTP re...