I'm trying to load a piece of (possibly) malformed HTML into an XMLDocument object, but it fails with XMLExceptions... since there are extra opening/closing tags, and malformed XML tags such as <img > instead of <img />
How do I get the XML to parse with all the errors in the data? Is there any XML validator that I can apply before pars...
I am looking to apply scores (positive, negative or neutral) to short phrases of text. Short of parsing out emoticons and making assumptions based on their usage, I'm unsure of what else to try. Can anyone provide examples, research papers, articles, etc. that take a more lexical analysis to this problem.
I am thinking things like adver...
I'm trying to extract the structure of an XML document in PHP without expanding the entities within. I'm aware that entities are usually expanded before the structure is parsed, and that ignoring this means that the XML may not be well-formed, but I'm parsing XML fragments which might not include the normal XML document header, and so wi...
I have a tree data structure, comprised of nodes, that I need to parse into an expression tree. My nodes look like this (simplified):
public class Node
{
public Node Left { get; set; }
public Node Right { get; set; }
public Operation OperationType { get; set; }
public object Value { get; set; }
...
Does anyone know of a good .NET library that allows me to parse source code files, but not only .NET source code files (like java, perl, ruby, etc)?
I need programmatic access to the contents of various source code files (e.g. class/method /parameter names, types, etc.).
Has anyone come across something like this? I know within .NET it...
I'm trying to parse some data out of a file using Perl & Parse::RecDescent. I can't throw the full data file at the perl script because RecDescent will take days poring over it. So I split up the huge datafile into RD-sized chunks to reduce the runtime.
However, I need to extract sections within balanced brackets and the routine I hav...
Either this requirement is weird or i should be confusing myself too much
I have a rule table with 30 columns. Every row from a feed file is compared against some or all conditions based on the type of feed. The domain is banking and the application is for loan reporting (say, reporting the amount of total secured loans and unsecured loa...
Short of parsing the ASPX page myself, what's the way to determine the class of an ASPX page?
In our projects we use Web Projects in VS2008 (instead of Web Sites kind or projects), that gives us a single DLL for the whole site, which is great.
Now I need to determine programmatically the class of an ASPX site.
I KNOW that the ASPX cla...
Hi all,
I was wondering if anyone could help me with parsing a full name field.
I would like to separate it into lastname, firstname, middle initial, suffix.
Here are some inputs for name followed by how I would like for them to be parsed.
Parsed Stuff Begins Here-------------------------------------
nam...
I have an xml file and a flash file. The flash file reads the xml file.
<?xml version="1.0" standalone="yes"?>
<banners>
<banner>
<title>Hello World</title>
<image>http://www.search-this.com/wp-content/themes/big-blue/images/company-logos1.gif</image>
<link>http://google.com/</link>
</banner>
</banners...
Is this possible? Given that C# uses immutable strings, one could expect that there would be a method along the lines of:
var expensive = ReadHugeStringFromAFile();
var cheap = expensive.SharedSubstring(1);
If there is no such function, why bother with making strings immutable?
Or, alternatively, if strings are already immutable for o...
Hi all
I want to write a function that parses a (theoretically) unknown XML data structure into an equivalent PHP array.
Here is my sample XML:
<?xml version="1.0" encoding="UTF-8"?>
<content>
<title>Sample Text</title>
<introduction>
<paragraph>This is some rudimentary text</paragraph>
</introduction>
<description>
<paragra...
I need to parse potentially huge XML files, so I guess this rules out DOM parsers.
Is out there any good lightweight SAX parser for C++, comparable with TinyXML on footprint?
The structure of XML is very simple, no advanced things like namespaces and DTDs are needed. Just elements, attributes and cdata.
I know about Xerces, but its she...
How could you remove all characters that are not alphabetic from a string?
What about non-alphanumeric?
Does this have to be a custom function or are there also more generalizable solutions?
...
I've been stuck on an interesting (IE: mind numbing) question for the past few hours.
I've been trying to parse operators with regex:
([<>]=?|[!=]=)
The ones that I want are: <= >= < > == !=
== and != matches great. But all the ones having to do with < or > doesn't on my Drupal site, even though they should theoretically work.
What...
Hi all,
I need some help with parsing the response from ListDirectoryDetails in c#.
I only need the following fields.
File Name/Directory Name
Date Created
and the File Size.
Here's what some of the lines look like when i run ListDirectoryDetails
d--x--x--x 2 ftp ftp 4096 Mar 07 2002 bin
-rw-r--r-- 1 ftp ft...
I have a string that should contain a list of items in the form , {0}, {1}, and {2} are strings and I want to basically extract them.
I do want to do this for part of an html parsing problem, and I have heard that parsing html with regular expressions is bad. (Like here)
I am not even sure how to do this with regular expressions.
This...
Hello, I'm receiving the following error when parsing XML as answer from a webservice.
An invalid character was found in text content.
The webservice sends answers with some characters as Ψ for example or HTML structured tests malformed with " or < and > characters.
The code used is:
Set var_xmlPostObject = CreateObject("MSXML2.Ser...
Hi all
I am looking for an open source library to parse and execute formula/functions in C#.
I would like to create a bunch of objects that derive from an interface (i.e. IFormulaEntity) which would have properties/methods/values and the allow a user to specify formulas for those objects.
For example, I might have
public class Emplo...
In webspiders/crawlers how can i get the actual initial rendered size of the font a user sees in an HTML document, keeping CSS in mind.
...