In another thread I got convinced into using HTML parsers instead of regexps for HTML parsing. I thought of using libxml (it has some HTML parser built in), but failed to find any useful tutorial. I also found this site and it says here it should do fine even with severely broken HTML.
Could you give me some examples of HTML parsing wit...
Apparently Excel 4.0 is still used and I have to read it in Java.
Neither poi nor jExcelAPI, as great as they are, can parse them. I can't find anything on them, especially with Java. Any help? Thank you.
...
I've been wondering for long why there doesn't seem to be any parsers for, say, BNF, that behave like regexps in various libraries.
Sure, there's things like ANTLR, Yacc and many others that generate code which, in turn, can parse a CFG, but there doesn't seem to be a library that can do that without the intermediate step.
I'm interest...
This question is about the PHP parsing engine.
When I include a file multiple times in a single runtime, does PHP tokenize it every time or does it keep a cache and just run the compiled code on subsequent inclusions?
EDIT: More details: I am not using an external caching mechanism and I am dealing with the same file being included mul...
In a string element tag the XML parser will get confused if it finds the following characters
'
"
<
>
&
(i.e. lets say the name of company has been retrieved from a database field, and it looks like this: "Smith & Sons")
The question is - how can you design your XSD to ignore these characters if found within an element?
...
I'm using this program to display a list of all html tags in a given file:
#include <cstdio>
#include <libxml/HTMLparser.h>
#include <libxml/tree.h>
#include <iostream>
#include <cstring>
using namespace std;
static void
print_element_names(htmlNodePtr a_node)
{
htmlNodePtr cur_node = NULL;
for (cur_node = a_node; cur_node!=N...
What type of Python objects should I use to parse files with a specific syntax? Also what sort of loop should be followed to make it through the file. Should one pass be sufficient? Two, three?
...
Hi
I'm writing BNF for JavaScript which will be used to generate a lexer and a parser for the language. However, I'd like some ideas on how to design the for-loop. Here is the simplified version of my current BNF:
[...]
VarDecl. Statement ::= "var" Identifier "=" Expr ";"
ForLoop. Statement ::= "for" "(" Expr ";" Expr ";" Expr ")"
[......
So, it seems like Happy is a robust replacement for yacc in Haskell. Is there an equally robust lexer generator to replace lex/flex?
...
I need to do some HTML parsing with python. After some research lxml seems to be my best choice but I am having a hard time finding examples that help me with what I am trying to do. this is why i am hear. I need to scrape a page for all of its viewable text.. strip out all tags and javascript.. I need it to leave me with what text is vi...
The documentation lists the tags that are allowed/removed by default:
http://www.feedparser.org/docs/html-sanitization.html
But it doesn't say anything about how you can specify which additional tags you want removed.
Is there a way to do this using Universal Feed Parser or do you have to do further processing using your own regex and...
SimplePie lets you merge feeds together:
http://simplepie.org/wiki/tutorial/sort_multiple_feeds_by_time_and_date
Is there anything like this in the Python world? The Universal Feed Parser documentation doesn't say anything about merging multiple feeds together.
...
While editing this and that in Vim, I often find that its syntax highlighting (for some filetypes) has some defects. I can't remember any examples at the moment, but someone surely will. Usually, it consists of strings badly highlighted in some cases, some things with arithmetic and boolean operators and a few other small things as well....
Hello everybody ,
I have a .txt file like:
Symbols from __ctype_tab.o:
Name Value Class Type Size Line Section
__ctype |00000000| D | OBJECT |00000004| |.data
__ctype_tab |00000000| r | OBJECT |00000101| |.rodata
Symbols from _ashldi3.o:
Name ...
How can I write a console application that prompts me and lets me enter LINQ expressions and it will spit out the results of that LINQ query?
What would be the easiest way to parse/evaluate a incoming string as a LINQ expression?
...
What's the smartest way to have Nokogiri select all content between the start and the stop element (including start-/stop-element)?
Check example code below to understand what I'm looking for:
require 'rubygems'
require 'nokogiri'
value = Nokogiri::HTML.parse(<<-HTML_END)
"<html>
<body>
<p id='para-1'>A</p>
<div clas...
I am looking for a parser generator for Java that does the following: My language project is pretty simple and only contains a small set of tokens.
Output in pure READABLE Java code so that I can modify it (this why I wouldn't use ANTLR)
Mature library, that will run and work with at least Java 1.4
I have looked at the following and t...
I want to automatically process a WSDL file to discover defined Service / Port elements. Is this possible, using Java or some sort of Ant utility? If so, how?
...
Here is my entire Script as I can't seem to figure out where the problem is.
The symptoms are that where I addChild(book) , is not the appropriate place for this to be added properly and sequentially with the thumbs as well. As a result, and to my surprise, the only way I can get these to appear so far is by writing a faulty trace state...
Hi,
I need to convert C# code to an equivalent XML representation.
I plan to convert the C# code (C# 2.0 code snippets, no generics or nullable types) to an AST and then convert the AST to XML.
Looking for a simple lexer/parser for C# which outputs an AST.
Any pointers on converting C# code to an XML representation (which can be convert...