parsing

How can I use NLP to parse recipe ingredients?

I need to parse recipe ingredients into amount, measurement, item, and description as applicable to the line, such as 1 cup flour, the peel of 2 lemons and 1 cup packed brown sugar etc. What would be the best way of doing this? I am interested in using python for the project so I am assuming using the nltk is the best bet but I am open t...

Tool for parsing smtp logs that finds bounces

Our web application sends e-mails. We have lots of users, and we get lots of bounces. For example, user changes company and his company e-mail is no longer valid. To find bounces, I parse smtp log file with log parser. Some bounces are great, like 550+#[email protected]. There is [email protected] in bounce. But som...

Best JavaScript Date Parser & Formatter?

Since I've started to use jQuery, I have been doing a lot more JavaScript development. I have the need to parse different date formats and then to display them into another format. Do you know of any good tool to do this? Which one would you recommend? ...

extracting a parenthesized Python expression from a string

I've been wondering about how hard it would be to write some Python code to search a string for the index of a substring of the form ${expr}, for example, where expr is meant to be a Python expression or something resembling one. Given such a thing, one could easily imagine going on to check the expression's syntax with compile(), evalu...

Converting a Google search query to a PostgreSQL "tsquery"

How can I convert a Google search query to something I can feed PostgreSQL's to_tsquery() ? If there's no existing library out there, how should I go about parsing a Google search query in a language like PHP? For example, I'd like to take the following Google-ish search query: ("used cars" OR "new cars") -ford -mistubishi And turn ...

Why are people using regexp for email and other complex validation?

There are a number of email regexp questions popping up here, and I'm honestly baffled why people are using these insanely obtuse matching expressions rather than a very simple parser that splits the email up into the name and domain tokens, and then validates those against the valid characters allowed for name (there's no further check ...

Gold Parsing System - What can it be used for in programming?

I have read the GOLD Homepage ( http://www.devincook.com/goldparser/ ) docs, FAQ and Wikipedia to find out what practical application there could possibly be for GOLD. I was thinking along the lines of having a programming language (easily) available to my systems such as ABAP on SAP or X++ on Axapta - but it doesn't look feasible to me,...

Parsing of nested tags in a file

Hi, I am wondering - What's the most effective way of parsing something like: {{HEADER}} Hello my name is {{NAME}} {{#CONTENT}} This is the content ... {{#PERSONS}} <p>My name is {{NAME}}.</p> {{/PERSONS}} {{/CONTENT}} {{FOOTER}} Of course this is intended to be somewhat of a templating system in the end, s...

How do I create a comma delimited string from an ArrayList?

I'm storing an ArrayList of Ids in a processing script that I want to spit out as a comma delimited list for output to the debug log. Is there a way I can get this easily without looping through things? EDIT: Thanks to Joel for pointing out the List(Of T) that is available in .net 2.0 and above. That makes things TONS easier if you have...

What's the best way to strip literal values out of SQL to correctly identify db workload?

Does anyone know of any code or tools that can strip literal values out of SQL statements? The reason for asking is I want to correctly judge the SQL workload in our database and I'm worried I might miss out on bad statements whose resource usage get masked because they are displayed as separate statements. When, in reality, they are p...

How do I find all cells with a particular attribute in BeautifulSoup?

Hi I am trying to develop a script to pull some data from a large number of html tables. One problem is that the number of rows that contain the information to create the column headings is indeterminate. I have discovered that the last row of the set of header rows has the attribute border-bottom for each cell with a value. Thus I de...

How can you use BeautifulSoup to get colindex numbers?

I had a problem a week or so ago. Since I think the solution was cool I am sharing it here while I am waiting for an answer to the question I posted earlier. I need to know the relative position for the column headings in a table so I know how to match the column heading up with the data in the rows below. I found some of my tables ha...

Parsing integers from a line

I am parsing an input text file. If I grab the input one line at a time using getline(), is there a way that I can search through the string to get an integer? I was thinking something similar to getNextInt() in Java. I know there has to be 2 numbers in that input line; however, these values will be separated by one or more white spa...

Trim whitespace from middle of string

I'm using the following regex to capture a fixed width "description" field that is always 50 characters long: (?.{50}) My problem is that the descriptions sometimes contain a lot of whitespace, e.g. "FLUID COMPRESSOR " Can somebody provide a regex that: Trims all whitespace off the end Collapses an...

Repairing wrong encoding in XML files

One of our providers are sometimes sending XML feeds that are tagged as UTF-8 encoded documents but includes characters that are not included in the UTF-8 charset. This causes the parser to throw an exception and stop building the DOM object when these characters are encountered: DocumentBuilder.parse(ByteArrayInputStream bais) throws...

Simple C++ MIME parser

I want to digest a multipart response in C++ sent back from a PHP script. Anyone know of a very lightweight MIME parser which can do this for me? Regards Robert ...

How do I programatically inspect a HTML document

I have a database full of small HTML documents and I need to programatically insert several into, say, a PDF document with iText or a Word document with Aspose.Words. I need to preserve any formatting within the HTML documents (within reason, honouring <b> tags is a must, CSS like <span style="blah"> is a nice-to-have). Both iText and ...

Best way to turn an integer into a month name in c#?

Is there a best way to turn an integer into its month name in .net? Obviously I can spin up a datetime to string it and parse the month name out of there. That just seems like a gigantic waste of time. ...

Parsing A Data Feed

I'm not the best at PHP and would be extremely grateful if somebody could help. Basically I need to parse each line of a datafeed and just get each bit of information between each "|" - then I can add it to a database. I think I can handle getting the information from between the "|"'s by using explode but I need a bit of help with parsi...

Is there a good date parser for Java?

Does anyone know a good date parser for different languages/locales. The built-in parser of Java (SimpleDateFormat) is very strict. It should complete missing parts with the current date. For example if I do not enter the year (only day and month) then the current year should be used. if the year is 08 then it should not parse 0008...