parsing

Reading EDI Formatted Files

I'm new to EDI, and I have a question. I have read that you can get most of what you need about an EDI format by looking at the last 3 characters of the ISA line. This is fine if every EDI used line breaks to separate entities, but I have found that many are single line files with any number of characters used as breaks. I have noticed ...

What is a suitable lexer generator that I can use to strip identifiers from many language source files?

I'm working on a group project for my University which is going to be used for plagiarism detection in Computer Science. My group is primarily going off the hashing/fingerprinting techniques described in this journal article: Winnowing: Local Algorithms for Document Fingerprinting. This is very similar to how the MOSS plagiarism detect...

Parse Credit Card input from Magnetic Stripe

Does anyone know how to parse a credit card string input from a Magnetic Card Swiper? I tried a JavaScript parser but never got it to work. This is what the input looks like. %BNNNNNNNNNNNNNNNN^DOE/JOHN ^1210201901000101000100061000000?;NNNNNNNNNNNNNNNN=12102019010106111001? The N's are the credit card number. ...

Filter/parse/modify emails and hrefs from html content in PHP4

I'm not validating emails. What I want to do is find (and then change) 3 separate types of "email" content in a (html) string: a plain email: eg [email protected] a mailto href: eg <a href="mailto:[email protected]">[email protected]</a> an aliased href: eg <a href="mailto:[email protected]">user's email</a> I'm then going to transform each example ...

Code to parse user agent string?

As strange as I find this, I have not been able to find a good PHP function anywhere which will do an intelligent parse of a user agent string? Googled it for about 20 minutes now. I have the string already, I just need something that will chop it up and give me at least browser/ver/os. Know of a good snippet anywhere? ...

Are unescaped user names incompatible with BNF?

Hi all, I've got a (proprietary) output from a software that I need to parse. Sadly, there are unescaped user names and I'm scratching my hairs trying to know if I can, or not, describe the files I need to parse using a BNF (or EBNF or ABNF). The problem, oversimplified (it's really just an example), may look like this: (data) ::= <...

I'm interested in Programming Languages. What areas of programming are good for me?

I've always been interested in writing and designing programming languages. Of course, it's pretty difficult to find an employer that will let you write a programming language as part of your job. So I'm looking for the "next best thing". What fields of programming will let me get some experience solving some related problems? Or wha...

Can XML be parsed reliably using jQuery's $(responseXML) syntax?

I'm currently looking for an easy way to extract information from server XML responses using JavaScript. jQuery seems like a good candidate for this. When it comes to parsing XML with jQuery, I keep coming across code examples similar to the following snippet: function parseXml(responseXml) { $(responseXml).find('someSelector')......

I can never predict XMLReader behavior. Any tips on understanding?

It seems every time I use an XMLReader, I end up with a bunch of trial and error trying to figure out what I'm about to read versus what I'm reading versus what I just read. I always figure it out in the end, but I still, after using it numerous times, don't seem to have a firm grasp of what an XMLReader is actually doing when I call th...

Text Parser with PHP, like Instapaper

Hi, I'm trying to write a text parser with PHP, like Instapaper did. What I want to do is; get a webpage and parse it in text-only mode. It's simple to get the webpage with cURL and strip HTML tags. But every webpage have some common areas; like header, navigation, sidebar, footer, banners etc. I only want to get the article in text mo...

Delphi Suggestions For Parsing Google SERP Results

What's the best way to parse the google search results with Delphi (The API will not work, only allows 10 results)? (I would prefer free options.) ...

How to parse a folder path with spaces in C code

Hello, I'm using this simple C code: char * command = NULL; sprintf (command, "ls %s", folderpath); system(command); The problem is when the folder name has a space in it... I know that in Unix I need to add a "\", for example ls my\ folder\ name How can I get around this ? Thank you! ...

HTML/XML Parser for Java

Hello, What HTML parsers have the following features: Fast Thread-safe Reliable and bug-free Parses HTML and XML Handles erroneous HTML Has a DOM implementation Supports HTML4, JavaScript, and CSS tags Relatively simple, object-oriented API What parser you think is better? Thank you. ...

GLR parsing algorithm resources

Hi, I am writing a GLR parser generator and would like some advice on resources relating to this algorithm both on the internet and of the dead-tree variety (books for those unfamiliar with the geek-speak). I know Bison can generate GLR parsers, and given it's under the GPL I can examine its code, however it'd be nice to have a full des...

Parsing Word docs with Ruby

Is there a way to parse word documents (doc and docx) with Ruby? I've a linux server and running a rails website that requires this service. ...

How can I keep track of original character positions in a string across transformations?

I'm working on an anti-plagiarism project for my CS class. This involves detecting plagiarism in computer science courses (programming assignments), through a technique described "Winnowing: Local Algorithms for Document Fingerprinting." Basically, I'm taking a group of programming assignments. Lets say one of the assignments looks lik...

Custom date format cannot be parsed. (Java)

I have to parse a custom date format in Java. It contains microseconds although Java doesn't provide support for microseconds. Because of that I filled the format with zeroes, but now I cannot parse date-strings with that format. Is there a simple workaround or must I handle microseconds on my own (with String functions)? @Test public ...

Will this 'algorithm' for nullable and first work (in a parser)?

Working through this for fun: http://www.diku.dk/hjemmesider/ansatte/torbenm/Basics/ Example calculation of nullable and first uses a fixed-point calculation. (see section 3.8) I'm doing things in Scheme and relying a lot on recursion. If you try to implement nullable or first via recursion, it should be clear you'll recur infinitely ...

SQL Server String parsing for special characters

I need a solution (t-sql function / procedure) to parse an SQL Server varchar(max) and eliminating all special characters and accents The output of this string will be transformed to a CSV file using an AWK script that breaks on special characters like '&', '%', '\' and all accent characters that on convert turn into unknown characters...

C# reliable way to pattern match?

At the moment I am trying to match patterns such as text text date1 date2 So I have regular expressions that do just that. However, the issue is for example if users input data with say more than 1 whitespace or if they put some of the text in a new line etc the pattern does not get picked up because it doesn't exactly match the patter...