parsing

Regex to match attributes in HTML?

Hi, I have a txt file which actually is a html source of some webpage. Inside that txt file there are various strings preceded by a "title=" tag. e.g. <div id='UWTDivDomains_5_6_2_2' title='Connectivity Framework'> I am interested in getting the text Connectivity Framework to be extraced and written to a separate file. Like this...

Parsing Functions

I'm making a script parser in python and I'm a little stuck. I am not quite sure how to parse a line for all its functions (or even just one function at a time) and then search for a function with that name, and if it exists, execute that function short of writing a massive list if elif else block.... EDIT This is for my own scripting ...

What is the best way to parse Microsoft Office and PDF documents?

I'm developing a Desktop Search Engine using VB9 (VS2008) and Lucene.NET. The Indexer in Lucene.NET accepts only raw text data and it is not possible to directly extract raw text from a Microsoft Office (DOC, DOCX, PPT, PPTX) and PDF documents. What is the best way to extract raw text data from such files? ...

Concepts required in building an IDE/compiler

When it comes to making an IDE (e.g. SharpDevelop) or a compiler/language parser, what topics of computer science do I need to know? I don't expect a full list of in depth tutorials but just a list of topics which would benefit me in improving. Am I right in thinking a parser has some rules about the syntax/semantics of a language, and ...

Parse v. TryParse

What is the difference between parse and TryParse? int number = int.Parse(textBoxNumber.Text); // The Try-Parse Method int.TryParse(textBoxNumber.Text, out number); Is there some form of error-checking like a Try-Catch Block? ...

Implementing a "rules engine" in Python

I'm writing a log collection / analysis application in Python and I need to write a "rules engine" to match and act on log messages. It needs to feature: Regular expression matching for the message itself Arithmetic comparisons for message severity/priority Boolean operators I envision An example rule would probably be something lik...

Python: convert alphabetically spelled out numbers to numerics?

I'm looking for a library, service, or code suggestions to turn spelled out numbers and amounts (eg. "thirty five dollars and fifteen cents", "one point five") into numerics ($35.15, 1.5) . Suggestions? ...

In C# What is the best way to parse Large XML (Size of 1GB)...?

I am having a 1GB XML File and want to parse it.If i use XML Textreader or XMLDocument ,result is very slow and some times hangs ...Your answers are welcome ...

Java - Easier way to guarantee integer input through Scanner?

For a program I am writing, I need to ask a user for an integer between 1 and 8. I've tried multiple (cleaner) ways of doing this but none of them worked, so I'm left with this: int x = 0; while (x < 1 || x > 8) { System.out.print("Please enter integer (1-8): "); try { x = Integer.par...

Canonicalize NFL team names.

This is actually a machine learning classification problem but I imagine there's a perfectly good quick-and-dirty way to do it. I want to map a string describing an NFL team, like "San Francisco" or "49ers" or "San Francisco 49ers" or "SF forty-niners", to a canonical name for the team. (There are 32 NFL teams so it really just means f...

Parse the Contents of String Array C++

Hi, I've read a file, and stored its contents into a string array. I need to change some numerical values, interpreted as characters by the compiler. The file is of the format: ABCDE,EFGHIJ KLMNOPQRS,45.58867,122.59750 I've searched and studied but haven't got onto anything too comprehensive. If someone can please tell me how to do ...

Search a codebase for references to table names

I have a directory full of legacy code from a VB6 application. I have been asked to provide a list of all the tables that this application uses, so we can assign a special SQL Server username to it. What is the best way to scan a codebase for references to table names? Some ideas I've had: Search for the following keywords: "FROM", ...

Tool or language to count occurrances of errors in a log file

I am trying to determine the best way to parse a log file and get a count of all of the errors in it by type. Currently, I open the log in a text editor, strip out the date and thread ID, then sort the file. This puts all errors together by type, which I can then count (using the count function in the editor, not manually). I am looki...

Partial evaluation for parsing

I'm working on a macro system for Python (as discussed here) and one of the things I've been considering are units of measure. Although units of measure could be implemented without macros or via static macros (e.g. defining all your units ahead of time), I'm toying around with the idea of allowing syntax to be extended dynamically at r...

Snippet for SAX parsing a user object in C++?

Can anyone share a snippet of code where they parsed a user defined object using SAX parser in C++. ...

How to determine whether a grammar is LL(1) LR(0) SLR(1)

Is there a simple way to determine wether a grammar is LL1, LR0, SLR1... just from looking on the grammar without doing any complex analysis? For instance: To decide wether a BNF Grammar is LL1 you have to calculate First and Follow sets first - which can be very time consuming in some cases. Has anybody got an idea how to do this fast...

How do I get from a type to the TryParse method?

My particular problem: I have a string which specifies an aribitrary type in a configuration class Config.numberType = "System.Foo"; where Foo is a type like Decimal or Double I use Type.GetType(Config.numberType) to return the corresponding type. How do I get from that type to being able to use, System.Foo.TryParse() ? Some furthe...

Parsing Integer to String C

How does one parse an integer to string(char* || char[]) in C? Is there an equivalent String parseInt(int) method from Java in C? ...

Parsing a file with column data in Python

I have a file that contains the symbol table details.Its in the form of rows and columns. I need to extract first and last column. How can I do that? ...

Adding a parameter to the URL with JavaScript

In a web application that makes use of AJAX calls, I need to submit a request but add a parameter to the end of the URL, for example: Original URL: http://server/myapp.php?id=10 Resulting URL: http://server/myapp.php?id=10&amp;enabled=true Looking for a JavaScript function which parses the URL looking at each parameter, the...