parsing

Writing BibTex parser

How should I start writing a parser for BibTex files. As the initial design I see following steps. List down grammar Build a tokenizer Do parsing of token stream against grammar We also need some error mechanism, so the users uploading bibtex files can know line numbers where is the error in their BibTex files. I am looking for commu...

Google app engine parsing xml more then 1 mb

Hi i need to parse xml file which is more then 1 mb in size, i know GAE can handle request and response up to 10 MB but as we need to use SAX parser API and API GAE has limit of 1 MB so is there way we can parse file more then 1 mb any ways. ...

xpath/dom/term question

trying to get my head around a problem, not sure if there really is a solution, or a solution that's readily available. i'm trying to figure out if i can specify a "term" that's in the source of a webpage, and have the "blackbox" produce a fully qualified XPath for the DOM element that contains/has/uses the "term" in other words, if i h...

Perl RegEx to find the portion of the email address before the @

Hi, I have this below issue in Perl.I have a file in which I get list of emails as input. I would like to parse the string before '@' of all email addresses. (Later I will store all the string before @ in an array) For eg. in : [email protected], i would like to parse the email address and extract abcdefgh. My intention is to get o...

Looking for Open Source document to text parser recommendations (ex: .PDF or .doc or to text)

Formatting should be preserved. ...

how do I extract Name and Value from a line of text while reading from a file using Java?

A file name will be passed in from standard in. I want to open it, read it, and create some information based off the text in the file. For example, if this is a line in the file: Hristo 3 ... then I want to create a Member() named Hristo with a value of 3. So I want to pull out a String for the name and an int for the value. The na...

Facebook Graph API retrieve Friends with json and C#

I'm working in C# with the Graph API and have been able to grab Facebook user profile information such as the ID, Name and email and then deserialize the JSON to be able to assign the values to labels. However, my problem is when I go to grab the list of Friends, or list of anything for that matter, how do I go about deserializing this ...

Convert Carrage returns to br tags

Hi All, I'm stuck with a small problem of parsing carrage return in a text area. JQuery Code $.fn.escapeHtml = function() { this.each(function() { $(this).html( $(this).html() .replace(/"/g,"&quot;") .replace(/&/g,'&amp;') .replace(/</g,'&lt;') .rep...

Adding html tags with Java based on regex, keeping data in matches

Using java, I am writting a script to anchor link an html bibliography. That is going from: [1,2] to: <a href="o100701.html#bib1">[1, 2]</a> I think I have found the right regex expression: \[.*?\] What I am having trouble with is writting the code that will retain the values inside the expression while surounding it with the link ta...

Parsing an XML file -options?

I'm developing a system to pick up XML attachments from emails, via Exchange Web Services, and enter them into a DB, via a custom DAL object that I've created. I've manage to extract the XML attachment and have it ready as a stream... they question is how to parse this stream and populate a DAL object. I can create an XMLTextReader and...

Parser in Ruby: #slice! inside #each_with_index = missing element

Let's say, I want to separate certain combinations of elements from an array. For example data = %w{ start before rgb 255 255 255 between hex FFFFFF after end } rgb, hex = [], [] data.each_with_index do |v,i| p [i,v] case v.downcase when 'rgb' then rgb = data.slice! i,4 when 'hex' then hex = data.slice! i,2 end end pp [r...

Basic problem with yacc/lex

Hello, I have some problems with a very simple yacc/lex program. I have maybe forgotten some basic steps (it's been a long time since I've used these tools). In my lex program I give some basic values like : word [a-zA-Z][a-zA-Z]* %% ":" return(PV); {word} { yylval = yytext; printf("yylval = %s\n",yylva...

Anyone know of a good csv to NSArray parser for objective-c

I'm looking for an easy to use csv parser for objective-c to use on the iphone? I'm also looking for other parsers such as json so maybe there is a conversion library somewhere. ...

How to search and correct html tags and attributes?

In my application, I have to fix all the closing tags of the <img> tag as shown below. Instead of closing the <img> with a >, it should close with />. Is there any easy way to search for all the <img> in this text and fix the > ? (If it is closed with a /> already then there is no action required). Other question, if there is no "widt...

How do i remove &#x2002; &#x2014; &#x2013; special characters from my XML files

this is a sample of the xml file <row tnote="0"> <entry namest="col2" nameend="col4" us="none" emph="bld"><blst> <li><text>Single, head of household, or qualifying widow(er)&#x2014;$55,000</text></li> <li><text>Married filing jointly&#x2014;$115,000</text></li> </blst></entry> <entry colname="col6" ldr="1" valign="middle">&#x2002;</entr...

Getting BeautifulSoup to catch tags in a non-case-sensitive way

I want to catch some tags with BeautifulSoup: Some <p> tags, the <title> tag, some <meta> tags. But I want to catch them regardless of their case; I know that some sites do meta like this: <META> and I want to be able to catch that. I noticed that BeautifulSoup is case-sensitive by default. How do I catch these tags in a non-case-sensit...

Positionally matching substrings in Python

How would you parse the ['i386', 'x86_64'] out of a string like '-foo 23 -bar -arch ppc -arch i386 -isysroot / -fno-strict-aliasing -fPIC'? >>> my_arch_parse_function('-foo 23 -bar -arch i386 -arch x86_64 -isysroot / -fno-strict-aliasing -fPIC') >>> ['i386', 'x86_64'] Can this be done using regex, or only using modules like PyParsin...

How to return a Enum value from a string?

I'm trying to return a strongly typed Enumeration value from a string. I'm sure there is a better way to do this. This just seems like way too much code for a simple thing like this: public static DeviceType DefaultDeviceType { get { var deviceTypeString = GetSetting("DefaultDeviceType"); ...

PHP- HTML parsing :: How can be taken charset value of webpage with simple html dom parser?

Hi, PHP:: How can be taken charset value of webpage with simple html dom parser (utf-8, windows-255, etc..)? remark: its have to be done with html dom parser http://simplehtmldom.sourceforge.net Example1 webpage charset input: <meta content="text/html; charset=utf-8" http-equiv="Content-Type"> result:utf-8 Example2 webpage ch...

passing data structures from java to perl

Hi, I would like to pass some data structures from java to perl. In perl, this should basically be a hash where the keys are strings and each value is either a string, a hash or a hash of hashes. Is there a way to dump data from java that can be easily parsed by perl? ...