text-parsing

Parsing space/tab separated Text file and embedding into XL file

Hi i have my text file in this format **4 1250000209852 01 XXXX XXXX V 3054XXX, XXXX J. 73227DUONG, DUC H. 672XXX COMM HOSP 40352405 RO 07/07/201010/05/2010HO 331.5 XXX NL PRESS XXX ...

Parsing text file using C#

http://yfrog.com/bftransactionsp Looking for a good way to parse out of this text file, the values highlighted with the yellow boxes using C#. Each section is delineated by a TERM # which I forgot to highlight. Tried this: string fileName = "ATMTerminalTotals.txt"; StreamReader sr = new StreamReader(fileName); string[] delimiter = new ...

Display log file information on Web page with Asp.NET MVC paging.

Hello, I have logs stored in a txt file in the following format ======8/4/2010 10:20:45 AM========================================= Processing Donation ======8/4/2010 10:21:42A M========================================= Sending information to server ======8/4/2010 10:21:43 AM=====================================...

Matching everything between two specific words using regular expressions

I'm attempting to parse an Oracle trace file using regular expressions. My language of choice is C#, but I chose to use Ruby for this exercise to get some familiarity with it. The log file is somewhat predictable. Most lines (99.8%, to be specific) match the following pattern: # [Timestamp] [Thread] [Event] [Message...

Is there any framework for parsing a SQL-like query into its component parts?

I'm interested in writing a SQL-like query syntax for a CMS I work with. The idea would be that a CMS query could be written in a SQL-ish syntax, and I would convert that to execute through the CMS API. There would be no field or table selection, so I need some way to get from this: SELECT WHERE Something = 'something' AND (SomethingE...

How to use the pretrained MaltParser parsing models for english

I am trying to use the pretrained parsing model for English of the MaltParser by following the steps in the following page, but repeatedly getting a null pointer exception. http://maltparser.org/mco/english_parser/engmalt.html I am trying this on a MaltParser version 1.4 and Java version 6 on a Windows machine. I think the model was tra...

Parsing a text file in Java

Example from input file: ARTIST="unknown" TITLE="Rockabye Baby" LYRICS="Rockabye baby in the treetops When the wind blows your cradle will rock When the bow breaks your cradle will fall Down will come baby cradle and all " The Artist, Title & Lyrics fields have to be extracted to their respective Strings with captalization and format ...

Which Perl modules for good for data munging?

Nine years ago when I started to parsing HTML and free text with Perl I read the classic Data Munging with Perl. Does someone know if David is planning to update the book or if there are similar books or web pages where the new parsing modules like XML-Twig, Regexp-Grammars, etc, are explained? I assume that in the last nine years some ...

How do I remove a portion of my string?

Continuing from my previous question, I now want to remove the number once I have found it and stored it in a variable. ...

Parse string into a tree structure?

I'm trying to figure out how to parse a string in this format into a tree like data structure of arbitrary depth. "{{Hello big|Hi|Hey} {world|earth}|{Goodbye|farewell} {planet|rock|globe{.|!}}}" [[["Hello big" "Hi" "Hey"] ["world" "earth"]] [["Goodbye" "farewell"] ["planet" "rock" "globe" ["." "!"]]]] ...

Intelligently parse user search terms in PHP

I am in the midst of creating a search service for my PHP website and I was wondering how others have gone about intelligently parsing search terms based on quotation marks (and possibly other symbols in the future). In others words, the search term screwdriver hammer might yield an array of ['screwdriver', 'hammer'], but "flathead scr...

How to strip variable spaces in each line of a text file based on special condition - one-liner in Python?

I have some data (text files) that is formatted in the most uneven manner one could think of. I am trying to minimize the amount of manual work on parsing this data. Sample Data : Name Degree CLASS CODE EDU Scores -------------------------------------------------------------------------------------- John M...

Parsing Nested Text in C#

If I have a series of strings that have this base format: "[id value]"//id and value are space delimited. id will never have spaces They can then be nested like this: [a] [a [b value]] [a [b [c [value]]] So every item can have 0 or 1 value entries. What is the best approach to go about parsing this format? Do I just use stuff li...

How can I extract/parse tabular data from a text file in Perl?

I am looking for something like HTML::TableExtract, just not for HTML input, but for plain text input that contains "tables" formatted with indentation and spacing. Data could look like this: Here is some header text. Column One Column Two Column Three a b a b ...

Most popular substrings

Hi, I'm trying to parse a large number of short strings into some logical parts. It seems like an interesting problem that someone could've already solved, but I cannot find any papers / solutions (or maybe I'm trying wrong keywords). The strings are have 2-5 parts. If I substitute each word for a letter saying which "part" / "section" ...

Parsing with MaltParser engmalt

Hi, I'm trying to use the pretrained parsing model engmalt, available at "http://maltparser.org/mco/english_parser/engmalt.html". I downloaded it, I unpacked it in the directory where I downloaded the MaltParser, and I wrote in the Prompt "java -Xmx1024m -jar malt.jar -c engmalt.poly -i infile.conll -o outfile.conll -m parse", as suggest...

Parsing a file of values in order to change into an SQL insert

Hey, trying to figure out a way to use a file I have to generate an SQL insert to a database. The file has many entries of the form: 100090 100090 bill smith 1998 That is,an id number, another id(not always the same), a full name and a year. These are all separated by a space. Basically what i want to to is be able to get variables f...

Ruby: How do I convert a string (ARGV) representation of integers and ranges to an array of integers

In Ruby, how can I take an array of tokens representing either integers or ranges and parse them into an array of integers that include each integer and each element in each range? Example: Given input [ "5", "7-10", "24", "29-31"] I'd like to produce output [ 5, 7, 8, 9, 10, 24, 29, 30, 31 ] Thanks. ...

converting bibtex files to html with python (maybe pybtex?)

Hi I want to parse a bibtex publications file and sort for specific fields (e.g. year) and filter certain content, to then put it on a website. I came across pybtex, which works as far as reading and parsing the bibtex file, but it is basically not documented and I can't figure out how to sort the entries. Is pybtex the way to go (how c...

regex help, find first 3 occurrences of a keyword and str_ireplace the content

Given a block of text, I need to parse it for the existing of a keyword. Then on the first appearance of the keyword, I need to wrap bold tags around it (if it doesn't already have them), on the second appearance of the keyword, italics, and on the third, underline. Example using the keyword "help": This is some text with the keyword "...