parsing

C# - Parse malformed XML

I'm trying to load a piece of (possibly) malformed HTML into an XMLDocument object, but it fails with XMLExceptions... since there are extra opening/closing tags, and malformed XML tags such as <img > instead of <img /> How do I get the XML to parse with all the errors in the data? Is there any XML validator that I can apply before pars...

Determining "Mood" of Textual Phrases through Lexical Analysis

I am looking to apply scores (positive, negative or neutral) to short phrases of text. Short of parsing out emoticons and making assumptions based on their usage, I'm unsure of what else to try. Can anyone provide examples, research papers, articles, etc. that take a more lexical analysis to this problem. I am thinking things like adver...

Parsing XML structure without expanding entities in PHP

I'm trying to extract the structure of an XML document in PHP without expanding the entities within. I'm aware that entities are usually expanded before the structure is parsed, and that ignoring this means that the XML may not be well-formed, but I'm parsing XML fragments which might not include the normal XML document header, and so wi...

how to parse a tree data structure?

I have a tree data structure, comprised of nodes, that I need to parse into an expression tree. My nodes look like this (simplified): public class Node { public Node Left { get; set; } public Node Right { get; set; } public Operation OperationType { get; set; } public object Value { get; set; } ...

.Net Library for parsing source code files?

Does anyone know of a good .NET library that allows me to parse source code files, but not only .NET source code files (like java, perl, ruby, etc)? I need programmatic access to the contents of various source code files (e.g. class/method /parameter names, types, etc.). Has anyone come across something like this? I know within .NET it...

Is there an easy way to chunk a text file into brace-balanced sections?

I'm trying to parse some data out of a file using Perl & Parse::RecDescent. I can't throw the full data file at the perl script because RecDescent will take days poring over it. So I split up the huge datafile into RD-sized chunks to reduce the runtime. However, I need to extract sections within balanced brackets and the routine I hav...

Formula in a database row - Java

Either this requirement is weird or i should be confusing myself too much I have a rule table with 30 columns. Every row from a feed file is compared against some or all conditions based on the type of feed. The domain is banking and the application is for loan reporting (say, reporting the amount of total secured loans and unsecured loa...

How to determine the class of an ASPX page?

Short of parsing the ASPX page myself, what's the way to determine the class of an ASPX page? In our projects we use Web Projects in VS2008 (instead of Web Sites kind or projects), that gives us a single DLL for the whole site, which is great. Now I need to determine programmatically the class of an ASPX site. I KNOW that the ASPX cla...

Parse Full Name Field Oracle

Hi all, I was wondering if anyone could help me with parsing a full name field. I would like to separate it into lastname, firstname, middle initial, suffix. Here are some inputs for name followed by how I would like for them to be parsed. Parsed Stuff Begins Here------------------------------------- nam...

AS2 Parse XML Problem

I have an xml file and a flash file. The flash file reads the xml file. <?xml version="1.0" standalone="yes"?> <banners> <banner> <title>Hello World</title> <image>http://www.search-this.com/wp-content/themes/big-blue/images/company-logos1.gif&lt;/image&gt; <link>http://google.com/&lt;/link&gt; </banner> </banners...

Sharing character buffer between C# strings objects

Is this possible? Given that C# uses immutable strings, one could expect that there would be a method along the lines of: var expensive = ReadHugeStringFromAFile(); var cheap = expensive.SharedSubstring(1); If there is no such function, why bother with making strings immutable? Or, alternatively, if strings are already immutable for o...

iterating over unknown XML structure with PHP (DOM)

Hi all I want to write a function that parses a (theoretically) unknown XML data structure into an equivalent PHP array. Here is my sample XML: <?xml version="1.0" encoding="UTF-8"?> <content> <title>Sample Text</title> <introduction> <paragraph>This is some rudimentary text</paragraph> </introduction> <description> <paragra...

A lightweight XML parser efficient for large files?

I need to parse potentially huge XML files, so I guess this rules out DOM parsers. Is out there any good lightweight SAX parser for C++, comparable with TinyXML on footprint? The structure of XML is very simple, no advanced things like namespaces and DTDs are needed. Just elements, attributes and cdata. I know about Xerces, but its she...

How to strip all non-alphabetic characters from string in SQL Server?

How could you remove all characters that are not alphabetic from a string? What about non-alphanumeric? Does this have to be a custom function or are there also more generalizable solutions? ...

JS not accepting <> greater than or less than signs

I've been stuck on an interesting (IE: mind numbing) question for the past few hours. I've been trying to parse operators with regex: ([<>]=?|[!=]=) The ones that I want are: <= >= < > == != == and != matches great. But all the ones having to do with < or > doesn't on my Drupal site, even though they should theoretically work. What...

Parsing FtpWebRequests ListDirectoryDetails Line

Hi all, I need some help with parsing the response from ListDirectoryDetails in c#. I only need the following fields. File Name/Directory Name Date Created and the File Size. Here's what some of the lines look like when i run ListDirectoryDetails d--x--x--x 2 ftp ftp 4096 Mar 07 2002 bin -rw-r--r-- 1 ftp ft...

Capturing the rel type and href of links in c#

I have a string that should contain a list of items in the form , {0}, {1}, and {2} are strings and I want to basically extract them. I do want to do this for part of an html parsing problem, and I have heard that parsing html with regular expressions is bad. (Like here) I am not even sure how to do this with regular expressions. This...

MSXML2.DomDocument.3.0 invalid characters

Hello, I'm receiving the following error when parsing XML as answer from a webservice. An invalid character was found in text content. The webservice sends answers with some characters as Ψ for example or HTML structured tests malformed with " or < and > characters. The code used is: Set var_xmlPostObject = CreateObject("MSXML2.Ser...

Parse and execute formulas with C#

Hi all I am looking for an open source library to parse and execute formula/functions in C#. I would like to create a bunch of objects that derive from an interface (i.e. IFormulaEntity) which would have properties/methods/values and the allow a user to specify formulas for those objects. For example, I might have public class Emplo...

How to get the size of the font on a webpage?

In webspiders/crawlers how can i get the actual initial rendered size of the font a user sees in an HTML document, keeping CSS in mind. ...