parsing

How can I match "/*" in a regular expression?

Hello all. $stuff = "d:/learning/perl/tmp.txt"; open STUFF, $stuff or die "Cannot open $stuff for read :$!"; while (<STUFF>) { my($line) = $_; # Good practice to always strip the trailing chomp($line); my @values = split(' ', $line); foreach my $val (@values) { if ($val == 1){ print "1 found"; ...

xml parsing in vb.net

I have an xml formatted document that looks like this: <?xml version="1.0" encoding="windows-1250"?> < Recipe> < Entry name="Stuffed Red Cabbage" ethnicity="Slavic" /> < Cook_Time Hrs="1" Mins="30" /> < Ingredients> < Cabbage Amount="1" Measurement="head" /> < Egg Amount="1" Measurement="unit" /> ...

How to filter data from a file using Python?

Hi all, I'm trying to filter certain data from an HTML file. For example, the HTML file is as follows: <tr><td valign="top"><img src="/icons/unknown.gif" alt="[ ]">software_0.1-0.log</td><td align="right">17-Nov-2009 13:46 </td><td align="right">186K</td></tr> I need to extract the software_0.1-0 part as well as the 17-Nov-2009 par...

Best way to add missing <p> tags to text in HTML while disregarding other tags?

I'm currently writing a function for parsing some HTML and adding tags where necessary. Basically i have a piece of HTML like this: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse feugiat, nunc at vestibulum egestas. <script type="c"> #include &lt;stdio.h&gt; #define debug(var) printf(#var &quot; = %d\n&qu...

Parsing a comma-delimited std::string

If I have a std::string containing a comma-separated list of numbers, what's the simplest way to parse out the numbers and put them in an integer array? I don't want to generalise this out into parsing anything else. Just a simple string of comma separated integer numbers such as "1,1,1,1,2,1,1,1,0". ...

Parsing SVG path in objective-c

Hi, i need to parse some paths of an SVG file, they are simple lines. When retrieving the data i end up with this string: m 0,666.6479 254.28571,0 According to SVG specifications m denotes a new current point then the following 2 numbers are the position and the laters are relative positions to the first one. So that would create ...

php lib for parsing html to DOM hierarchy tree

I need some php library to parse html content to DOM tree Like this: html |--head | |---title--title_content | |---meta--meta_content |--body | |---div | | |--div--div_content .. etc and also repare or clean the invalid html. ITS not only for HTML BUT event for any XML style mark-up language. basically a parent-...

BBCode to HTML transformation rules

Background I have written very simple BBCode parser using C# which transforms BBCode to HTML. Currently it supports only [b], [i] and [u] tags. I know that BBCode is always considered as valid regardless whatever user have typed. I cannot find strict specification how to transform BBCode to HTML Question Does standard "BBCode to HTML...

Most efficient method to parse small, specific arguments

I have a command line application that needs to support arguments of the following brand: all: return everything search: return the first match to search all*search: return everything matching search X*search: return the first X matches to search search#Y: return the Yth match to search Where search can be either a single keywor...

Losing whitespace around escaped symbols in CDATA using Expat XML parser in C++

I'm using XML to send project information between applications. One of the pieces of information is the project description. So I have: <ProjectDescription>Test &amp; spaces around&amp;some &amp; amps!</ProjectDescription> Or: "Test & spaces around&some & amps!" <-- GOOD! When I then use Expat to parse it, my data handler gets ju...

pulling webpages from an adult site -- how to get past the site agreement?

I'm trying to parse a bunch of webpages from an adult website using Ruby: require 'hpricot' require 'open-uri' doc = Hpricot(open('random page on an adult website')) However, what I end up getting instead is that initial 'Site Agreement' page making sure that you're 18+, etc. How do I get past the Site Agreement and pull the webpag...

Extract Integer Part in String

What is the best way to extract the integer part of a string like Hello123 How do you get the 123 part. You can sort of hack it using Java's Scanner, is there a better way? ...

Loading an hpricot element with a chunk of html

is there a way to load a chunk of html into an Hpricot::Doc object? I am trying to parse various chunks of html within custom tags from a page. so if I have: <foo> <b>here is some stuff</b> <table> <tr> <td>one</td> <td>two</td> </tr> <tr> <td>three</td> <td><four</td> </tr> </table> </foo...

Keyword Matching in Pyparsing: non-greedy slurping of tokens

Pythonistas: Suppose you want to parse the following string using Pyparsing: 'ABC_123_SPEED_X 123' were ABC_123 is an identifier; SPEED_X is a parameter, and 123 is a value. I thought of the following BNF using Pyparsing: Identifier = Word( alphanums + '_' ) Parameter = Keyword('SPEED_X') or Keyword('SPEED_Y') or Keyword('SPEED_Z') ...

How can I make SimpleDateFormat.parse() fail when month is greater than 12?

I'm using java.text.SimpleDateFormat to parse strings of the form "yyyyMMdd". If I try to parse a string with a month greater than 12, instead of failing, it rolls over to the next year. Full runnable repro: import java.text.ParseException; import java.text.SimpleDateFormat; import java.util.Date; public class ParseDateTest { pub...

split svnversion output in bash

I have this function, works fine, but I would like to rewrite it in bash. the problem is, I have too little knowledge of what's available in bash. #!/usr/bin/python def parse_svnversion(value): """split the output of svnversion into its three components given a string that looks like the output of the command svnversion, ...

how to allow a user to upload a spreadsheet in asp.net mvc

i want a user to have file picker and then choose a spreadsheet which will then be parsed by a controller action. are there any examples of how to do this? ...

Trouble Scraping Web Page With Malformed Content

I have written c# code which utilizes the HtmlAgilityPack library in order to scrape a page located at: World's Largest Urban Areas (Page 2). Unfortunately the page consists of malformed content. I'm at an impasse on how to scrape this page. The current code I have (appearing below) freezes on parsing the HTML: HtmlNodeCollection ...

XML Parser to read xml tags from word file C#

Hello there, I have some word templates(dot/dotx) files that contain xml tags along with plain text. At run time, I need to replace the xml tags with their respective mail merge fields. So, need to parse the document for these xml tags and replace them with merge fields. I was using Regex to find and replace these xml tags. But I was s...

Practical consequences of formal grammar power?

Every undergraduate Intro to Compilers course reviews the commonly-implemented subsets of context-free grammars: LL(k), SLR(k), LALR(k), LR(k). We are also taught that for any given k, each of those grammars is a subset of the next. What I've never seen is an explanation of what sorts of programming language syntactic features might req...