I've already worked out this solution for myself with PHP, but I'm curious how it could be done differently - better even. The two languages I'm primarily interested in are PHP and Javascript, but I'd be interested in seeing how quickly this could be done in any other major language today as well (mostly C#, Java, etc).
Return only wor...
I have a project where I need to compare multi-chapter documents to a second document to determine their similarity. The issue is I have no idea how to go about doing this, what approaches exist or if their are any libraries available.
My first question is... what is similar? The numbers of words that match, the number of consecutive wo...
I'm trying to create a generalized HTML parser that works well on Blog Posts. I want to point my parser at the specific entrie's URL and get back clean text of the post itself. My basic approach (from python) has been to use a combination of BeautifulSoup / Urllib2, which is okay, but it assumes you know the proper tags for the blog entr...
Hi,
I'm asking the same question as this: http://stackoverflow.com/questions/296738/how-can-i-parse-relative-dates-with-perl but in C#.
Sorry if this is a duplicate, ill delete if so.
Does such a library exist?
Thanks
...
I have lines in an ASCII text file that I need to parse.
The columns are separated by a variable number of spaces, for instance:
column1 column2 column3
How would i split this line to return an array of only the values?
thanks
...
I need a portable function/subroutine to locate the position of the last non-blank character in a string. I've found two options: LEN_TRIM and LNBLNK. However, different compilers seem to have different standards. The official documentation for the following compilers suggests that LEN_TRIM is part of the Fortran 95 standard on the f...
Given the following string, I'd like to parse into a list of first names + a last name:
Peter-Paul, Mary & Joël Van der Winkel
(and the simpler versions)
I'm trying to work out if I can do this with a regex. I've got this far
(?:([^, &]+))[, &]*(?:([^, &]+))
But the problem here is that I'd like the last name to be captured in ...
How do I parse sentence case phrases from a passage.
For example from this passage
Conan Doyle said that the character of Holmes was inspired by Dr. Joseph Bell, for whom Doyle had worked as a clerk at the Edinburgh Royal Infirmary. Like Holmes, Bell was noted for drawing large conclusions from the smallest observations.[1] Michael Har...
I am not sure how to go about this. Right now I am counting the spaces to get the word count of my string but if there is a double space the word count will be inaccurate. Is there a better way to do this?
...
I have a C# string "RIP-1234-STOP\0\0\0\b\0\0\0???|B?Mp?\0\0\0" returned from a call to a native driver.
How can I trim all characters from first null terminator '\0\ onwards. In this case, I just would like to have "RIP-1234-STOP".
Thanks.
...
I am attempting to find documentation on how Local Storm Reports (LSR) issued by the Nation Weather Services are formatted.
Also I am aware of public FTP directory these text files are stored but I was wondering if anyone knows if the NWS or other sources provide these reports via a web service instead if having to manually write a par...
I figure regex is overkill also it takes me some time to write some code (i guess i should learn now that i know some regex).
Whats the simplest way to separate the string in an alphanumeric string?
It will always be LLLLDDDDD. I only want the letters(l's), typically its only 1 or 2 letters.
...
Hi
I'm working on a basic networking protocol in Python, which should be able to transfer both ASCII strings (read: EOL-terminated) and binary data.
For the latter to be possible, I chose to create the grammar such that it contains the number of bytes to come which are going to be binary.
For SimpleParse, the grammar would look like th...
I was wondering how could it be possible to format in a human-readable format a ParseException thrown by JavaCC: in fact it includes fields such asbeginLine, beginColumn, endColumn, endLine in the token reference of the exception, but not the reference to the source parsed.
Thanks! :)
...
Hello, I want to clean up some parsed text such as
\n the said \r\n\r\n\r\n I look in your eyes my dear\r\n\r\nI see green rolling Forests\r\n\r\nI see the far away Sky\r\n\r\nThey turn into the rain\r\n\r\n\r\nI see high soaring eagles... more\n
So I want to get rid of the "\n", "\r\n", "\r\n\r\n", "\r\n\r\n\r\n", "\r\n\r\n\r\n\r\n" a...
I'm trying to turn free-form text into something more structured. I have a complex pattern that matches the great majority (well above the minimum acceptable limit) of the data available, and I'd like to use that to assist in structuring the data, rather than parsing the text character-by-character. The problem that I've just run into is...
I have the following string which will probably contain ~100 entries:
String foo = "{k1=v1,k2=v2,...}"
and am looking to write the following function:
String getValue(String key){
// return the value associated with this key
}
I would like to do this without using any parsing library. Any ideas for something speedy?
...
What would be the best way in python to parse out chunks of text contained in matching brackets?
"{ { a } { b } { { { c } } } }"
should initially return:
[ "{ a } { b } { { { c } } }" ]
putting that as an input should return:
[ "a", "b", "{ { c } }" ]
which should return:
[ "{ c }" ]
[ "c" ]
[]
...
When I write Erlang programs which do text parsing, I frequently run into situations where I would love to do a pattern match using a regular expression.
For example, I wish I could do something like this, where ~ is a "made up" regular expression matching operator:
my_function(String ~ ["^[A-Za-z]+[A-Za-z0-9]*$"]) ->
....
I know...
I'd like to find a way to take a piece of user supplied text and determine what addresses on the map are mentioned within the text. I'd be happy to use a free web service if it exists or use a script which will not consume too many resources.
One way I can imagine doing this is taking a gigantic database of addressing and searching for...