I would like to extract the structure of the HTML document - so the tags are more important than the content. Ideally, it would be able to cope reasonably with badly-formed HTML to some extent also.
Anyone know of a reliable and efficient parser?
...
i want to try to convert a string to a Guid, but i don't want to rely on catching exceptions (
for performance reasons - exceptions are expensive
for usability reasons - the debugger pops up
for design reasons - the expected is not exceptional
In other words the code:
public static Boolean TryStrToGuid(String s, out Guid value)
{
...
As a pet-project, I'd like to attempt to implement a basic language of my own design that can be used as a web-scripting language. It's trivial to run a C++ program as an Apache CGI, so the real work lies in how to parse an input file containing non-code (HTML/CSS markup) and server-side code.
In my undergrad compiler course, we used Fl...
Ometa is "a new object-oriented language for pattern matching." I've encountered pattern matching in languages like Oz tools to parse grammars like Lexx/Yacc or Pyparsing before. Despite looking at example code, reading discussions, and talking to a friend, I still am not able to get a real understanding of what makes Ometa special (or ...
I'm writing a C parser using PLY, and recently ran into a problem.
This code:
typedef int my_type;
my_type x;
Is correct C code, because my_type is defined as a type previously to
being used as such. I handle it by filling a type symbol table in the
parser that gets used by the lexer to differentiate between types and
simple identifie...
What is the smartest way to design a math parser? what i mean is a function that takes a math string (like: "2 + 3 / 2 + (2 * 5)") and returns the calculated value? I did write one in VB6 ages ago but it ended up being way to bloated and not very portable (or smart for that matter...). General ideas, psuedo code or real code is appreciat...
How would one write a regular expression to use in python to split paragraphs?
A paragraph is defined by 2 linebreaks (\n). But one can have any ammount of spaces/tabs together with the line breaks, and it still should be considered as a paragraph.
I am using python so the solution can use python's regular expression syntax which is ex...
My quick search reveals the reference implementation (http://stax.codehaus.org), the Woodstox implementation (http://woodstox.codehaus.org), and Sun's SJSXP implementation (https://sjsxp.dev.java.net/).
Please comment on the relative merits of these, and fill me in on any other implementations I should consider.
...
I'm doing a sitemap producer in Object Pascal and need a good function or lib to emulate the parse_url function on PHP.
Does anyone know of any good ones?
...
I've used ConfigParser for quite a while for simple configs. One thing that's bugged me for a long time is the DEFAULT section. I'm not really sure what's an appropriate use. I've read the documentation, but I would really like to see some clever examples of its use and how it affects other sections in the file (something that really ill...
I need to parse strings like that "2008-09-03T20:56:35.450686Z" into the python's datetime?
I have found only strptime in the python 2.5 std lib, but it not so convinient.
Which is the best way to do that?
Update:
It seems, that python-dateutil works very well. I have found that solution:
d1 = '2008-09-03T20:56:35.450686Z'
d2 = date...
Hello,
The following simple "calculator expression" grammar (BNF) can be easily parsed with the a trivial recursive-descent parser, which is predictive LL(1):
<expr> := <term> + <term>
| <term> - <term>
| <term>
<term> := <factor> * <factor>
<factor> / <factor>
<fa...
I'm writing a Perl script and I've come to a point where I need to parse a Java source file line by line checking for references to a fully qualified Java class name. I know the class I'm looking for up front; also the fully qualified name of the source file that is being searched (based on its path).
For example find all valid referen...
I am currently running into a problem where an element is coming back from my xml file with a single quote in it. This is causing xml_parse to break it up into multiple chunks, example: Get Wired, You're Hired!
Is then enterpreted as 'Get Wired, You' being one object, the single quote being a second, and 're Hired!' as a third.
What I w...
Seeking a method to:
Take whitespace separated tokens in a String; return a suggested Word
ie:
Google Search can take "fonetic wrd nterpreterr",
and atop of the result page it shows "Did you mean: phonetic word interpreter"
A solution in any of the C* languages or Java would be preferred.
Are there any existing Open Libraries which...
Examples:
"1" yes
"-1" yes
"- 3" no
"1.2" yes
"1.2.3" no
"7e4" no (though in some cases you may want to allow scientific notation)
".123" yes
"123." yes
"." no
"-.5" yes
"007" yes
"00" yes
...
The reason I want to do this is to make it easy to parse out instructions that are emailed to a bot, the kind of thing majordomo might do to parse commands like subscribing and unsubscribing. It turns out there are a lot of crazy formats and things to deal with, like quoted text, distinguishing between header and body, etc.
A perl modu...
I have a problem which requires me to parse several log files from a remote machine.
There are a few complications:
1) The file may be in use
2) The files can be quite large (100mb+)
3) Each entry may be multi-line
To solve the in-use issue, I need to copy it first. I'm currently copying it directly from the remote machine to the local ...
if I call php's parse_ini_file("foo.ini"), in what paths does it look for foo.ini ?
the include path? the function's documentation doesn't mention it.
...
I am working on a form widget for users to enter a time of day into a text input (for a calendar application). Using JavaScript (we are using jQuery FWIW), I want to find the best way to parse the text that the user enters into a JavaScript Date() object so I can easily perform comparisons and other things on it.
I tried the parse() met...