text-parsing

Using streamreader to read line containing this "//"?

Read a Text file having any line starts from "//" omit this line and moved to next line. The Input text file having some seprate partitions. Find line by line process and this mark. ...

chunking/text parsing using NLTK

I am trying to parse some text and diagram it, like you would a sentence. I am new to NLTK and am trying to find something in NLTK that will help me accomplish this. So far, I have seen nltk.ne_chunk and nltk.pos_tag. I find them to be not very helpful and I am not able to find any good online documentation. I have also tried to use the...

How to detect tabular data from a variety of sources

In an experimental project I am playing with I want to be able to look at textual data and detect whether it contains data in a tabular format. Of course there are a lot of cases that could look like tabular data, so I was wondering what sort of algorithm I'd need to research to look for common features. My first thought was to write a ...

What do people mean when they say “Perl is very good at parsing”?

What do people mean when they say "Perl is very good at parsing"? How is Perl any better or more powerful than other scripting languages such as Python or Ruby? ...

Extract key sentences from a text

Hi, do you know about an effective method for extracting key sentences from a text with their frequency parameters, etc and that can also do "stemming" (search also for similar sentences) ? I wonder also if there is some software implementation Thanks a lot ...

Converting Words(string) to number in .net with culture

Hi! I need advise. Do you know any lib to convert string representation of number(group of words) to number? It's like of intelligent parser that may contain stop words(symbols) cutter, stemming, ets. And it gives you not only result, but smth like rightness index. Of course various culture(languages) support required. ...

read text file into custom data class

I have a text file which contains columns of data that are either integer, double or string. I want to read each row of data into my own record class. I know the column data types beforehand, so I am parsing a text file row something like the code below (I typed it out, so don't complain there are errors). I didn't list all of the col...

Parsing text with simple wildcards logic in Java / C / Objective-C

I'm looking for a fast library/class to parse plain text using expressions like below: Text is: <b>Name:</b>John<br><i>Age</i>32<br> Pattern is: {*}Name:</b>{%}<br>{*}Age</i>{%}<br> And it will find me two values: John and 32. Intent is to parse simple HTML web pages without involving heavy duty tools. It should not be using string op...

pyparsing question

This code works: from pyparsing import * zipRE = "\d{5}(?:[-\s]\d{4})?" fooRE = "^\!\s+.*" zipcode = Regex( zipRE ) foo = Regex( fooRE ) query = ( zipcode | foo ) tests = [ "80517", "C6H5OH", "90001-3234", "! sfs" ] for t in tests: try: results = query.parseString( t ) print t,"->", results except ParseEx...

Where can I learn more about parsing text in Java?

I'm in a Data Structures class (in Java) this semester, but we're doing a lot of parsing on text files to populate the structures we design. The focus is on the structures themselves, not on parsing algorithms. I feel sort of weak in the area and was wondering if anyone could point me to a book or site on the subject. Design patterns,...

Python: Read configuration file with multiple lines per key

Hi, I am writing a small DB test suite, which reads configuration files with queries and expected results, e.g.: query = "SELECT * from cities WHERE name='Unknown';" count = 0 level = 1 name = "Check for cities whose name should be null" suggested_fix = "UPDATE cities SET name=NULL WHERE name='Unknown';...

Java Buffered Reader Text File Parsing

I am really struggling with parsing a text file. I have a text file which is in the following format ID Float Float Float Float .... // variable number of floats END ID Float Float Float Float .... END etc However the ID can represent one of two values, 0 which means it is a new field, or -1 which means it is related to the las...

Reading the next line using LINQ and File.ReadAllLines()

Hi, I have a file which represents items, in one line there's Item GUID followed by 5 lines describing the item. Example: Line 1: Guid=8e2803d1-444a-4893-a23d-d3b4ba51baee name= line1 Line 2: Item details = bla bla . . Line 7: Guid=79e5e39d-0c17-42aa-a7c4-c5fa9bfe7309 name= line7 Line 8: Item details = bla bla . . ...

parsing and translating from text to xml

Hello I need to translate programs written in a domain specific language into xml representation. These programs are in the form of simple text file. What approach would you suggest me? What api should I use to: Parse the text files written in this language. Write xml based on the token and token streams I obtain. My criteria is more...

how to read an address in multiple formats like google maps

notice that on google maps you can input the address any way you like. as long as it is a valid address...google maps will read it. In some ruby book I had seen code snippet for something like this, but with phone numbers. Any ideas how this could be done for addresses? in language of your choice. EDIT: i dont care about a "valid" ...

Text parsing, conditional text

Hello I have a text template with placehoders that I parse in order to replace placeholders with real values. Text Template: Name:%name% Age:%age% I use StringBuilder.Replace() to replace placeholders sb.Replace("%name%", Person.Name); Now I want to make more advanced algorithm. Some lines of code are conditional. They have to b...

Parsing a string, Grammar file.

How would I separate the below string into its parts. What I need to separate is each < Word > including the angle brackets from the rest of the string. So in the below case I would end up with several strings 1. "I have to break up with you because " 2. "< reason >" (without the spaces) 3. " . But Let's still " 4. "< disclaimer >" 5. " ...

How to use Wordpress' http.php in external projects?

Answer : Implemented using Curl... $file = "http://abc.com/data//output.txt"; $ch = curl_init($file); $fp = @fopen("out.txt", "w"); curl_setopt($ch, CURLOPT_FILE, $fp); curl_setopt($ch, CURLOPT_HEADER, 0); curl_exec($ch); curl_close($ch); fclose($fp); $file = "out.txt"; $fp = fopen($file, "r"); I am trying to parse data from a pipe-de...

How do I keep a scanner from throwing exceptions when the wrong type is entered? (java)

Here's some sample code: import java.util.Scanner; class In { public static void main (String[]arg) { Scanner in = new Scanner (System.in) ; System.out.println ("how many are invading?") ; int a = in.nextInt() ; System.out.println (a) ; } } if i run the program and give it an int like 4then everythi...

Making links clickable in Javascript?

Is there an simple way of turning a string from Then go to http:/example.com/ and foo the bar! into Then go to <a href="http://example.com"&gt;example.com&lt;/a&gt; and foo the bar! in Javascript within an existing HTML page? ...