I have been trying to strip out some data from HTML files. I have the logic coded to get the right cells. Now I am struggling to get the actual contents of the 'cell':
here is my htm snip
headerRows[0][10].contents
[<font size="+0"><font face="serif" size="1"><b>Apples Produced</b><font size="3">
</font></font></font>]
...
I'm trying to get the dates from entries in two different RSS feeds through feedparser.
Here is what I'm doing:
import feedparser as fp
reddit = fp.parse("http://www.reddit.com/.rss")
cc = fp.parse("http://contentconsumer.com/feed")
print reddit.entries[0].date
print cc.entries[0].date
And here's how they come out:
2008-10-21T22:23:...
I want to use Perl to extract information from a Certificate Signing Request, preferably without launching an external openssl process. Since a CSR is stored in a base64-encoded ASN.1 format, I tried the Convert::PEM module. But it requires an ASN.1 description of the content, which I haven't been able to put together (ASN.1 being the be...
I need a well tested Regular Expression (.net style preferred), or some other simple bit of code that will parse a USA/CA phone number into component parts, so:
3035551234122
1-303-555-1234x122
(303)555-1234-122
1 (303) 555 -1234-122
etc...
all parse into:
AreaCode: 303
Exchange: 555
Suffix: 1234
Extension: 122
...
Is there a simple way to parse a date that may be in MM/DD/yyyy, or M/D/yyyy, or some combination? i.e. the zero is optional before a single digit day or month.
To do it manually, one could use:
String[] dateFields = dateString.split("/");
int month = Integer.parseInt(dateFields[0]);
int day = Integer.parseInt(dateFields[1]);
int year ...
Hi,
I need to parse a large amount of text that uses HTML font tags for formatting,
For example:
<font face="fontname" ...>Some text</font>
Specifically, I need to determine which characters would be rendered using each font used in the text. I need to be able to handle stuff like font tags inside another font tag.
I need to use C#...
Can anyone recommend a lightweight JavaScript XML-RPC library?
After researching this a while ago, I couldn't find anything I was comfortable with, so I kinda ended up writing my own.
However, maybe that was stupid, as there must be something suitable out there!?
My own pseudo-library is mainly missing a way to turn an XML-RPC response...
I'm looking for syntatic examples or common techniques for doing regular expression style transformations on words instead of characters, given a procedural language.
For example, to trace copying, one would want to create a document with similar meaning but with different word choices.
I'd like to be able to concisely define these po...
I am working with some input that is in the possible forms
$1,200
20 cents/ inch
$10
Is there a way to parse these to numbers in VB? Also printing these numbers?
EDIT: Regular expressions would be great.
EDIT: VB 6 in particular
...
I want to build a parser for a C like language. The interesting aspect about it is that I want to build it in such a way that someone who has access to the source can easily modified it to extend the language (a new expression type of instance) with the extensions being runtime configurable (they can be turned on and off).
My current in...
I have a string which is like this:
this is [bracket test] "and quotes test "
I'm trying to write something in Python to split it up by space while ignoring spaces within square braces and quotes. The result I'm looking for is:
['this','is','bracket test','and quotes test ']
...
I've recently been trying to create units tests for some legacy code.
I've been taking the approach of using the linker to show me which functions cause link errors, greping the source to find the definition and creating a stub from that.
Is there an easier way?
Is there some kind of C++ parser that can give me class definitions, in ...
Any python libs for parsing apache config files or if not python anyone aware of such thing in other languages (perl, php, java, c#)?
As i'll be able to rewrite them in python.
...
So lets say I'm using Python's ftplib to retrieve a list of log files from an FTP server. How would I parse that list of files to get just the file names (the last column) inside a list? See the link above for example output.
...
Hello everyone.
I'm working on an app which scrapes data from a website and I was wondering how I should go about getting the data. Specifically I need data contained in a number of div tags which use a specific CSS class - Currently (for testing purposes) I'm just checking for "div class = "classname"" in each line of HTML - This wor...
Is there a simple way to support wildcards ("*") when searching strings - without using RegEx?
Users are supposed to enter search terms using wildcards, but should not have to deal with the complexity of RegEx:
"foo*" => str.startswith("foo")
"*foo" => str.endswith("foo")
"*foo*" => "foo" in str
(it gets more complicated when...
Is there a good way to remove HTML from a Java string? A simple regex like
replaceAll("\\<.*?>","")
will work, but things like
&
wont be converted correctly and non-HTML between the two angle brackets will be removed (ie the .*? in the regex will disappear).
...
I'm trying to parse an international datetime string similar to:
24-okt-08 21:09:06 CEST
So far I've got something like:
CultureInfo culture = CultureInfo.CreateSpecificCulture("nl-BE");
DateTime dt = DateTime.ParseExact("24-okt-08 21:09:06 CEST",
"dd-MMM-yy HH:mm:ss ...", culture);
The problem is what should I use for the '......
I were reading about parsers and parser generators when I hit upon this statement in wikipedia's LR parsing -page:
"Many programming languages can be parsed using some variation of an LR parser. One notable exception is C++."
Why is it so? What particular property in C++ causes it to be impossible to parse with LR parsers?
I first tri...
My knowledge about implementing a parser is a bit rusty.
I have no idea about the current state of research in the area, and could need some links regarding recent advances and their impact on performance.
General resources about writing a parser are also welcome, (tutorials, guides etc.) since much of what I had learned at college I ...