views:

53

answers:

2

I have a need to process text files to extract relevant information for later input into R for statistical analysis. The text file content typically looks like the example extract shown below. Can the board make any recommendations as to what software/programming language I should be looking to use for this purpose? The critical requirements for the software are:

  • ease/clarity of programming syntax to extract the relevant information from each line (note: not all lines will contain relevant information)
  • free/open source
  • can run on both Linux and Windows systems
  • ability to loop through many, many separate text files contained in a folder/directory but output to just one single (csv/text) file

EXAMPLE

Full Tilt Poker Game #19911608402: Table Buggy - $0.01/$0.02 - No Limit Hold'em - 4:05:58 ET - 2010/04/08
Seat 2: BAD BeAts02 ($1.74)
Seat 3: VIVIVIVIV ($1.20)
Seat 4: pipelis ($2.87), is sitting out
Seat 5: trichinosis ($2.54)
Seat 6: Syrenski ($2)
Seat 9: evil-bunny1 ($1.20)
BAD BeAts02 posts the small blind of $0.01
VIVIVIVIV posts the big blind of $0.02
handrici sits down
pipelis stands up
Syrenski posts $0.02
The button is in seat #9
*** HOLE CARDS ***
Dealt to Syrenski [6d 3s]
handrici adds $2
trichinosis calls $0.02
Syrenski checks
pkmyers sits down
evil-bunny1 folds
BAD BeAts02 raises to $0.08
VIVIVIVIV folds
VIVIVIVIV adds $0.02
pkmyers adds $1.34
trichinosis calls $0.06
Syrenski folds
*** FLOP *** [Js 5s 8s]
pipelis sits down
BAD BeAts02 has 15 seconds left to act
BAD BeAts02 bets $0.18
AntHraX85 sits down
pipelis stands up
trichinosis folds
Uncalled bet of $0.18 returned to BAD BeAts02
BAD BeAts02 mucks
AntHraX85 adds $2
BAD BeAts02 wins the pot ($0.19)
*** SUMMARY ***
Total pot $0.20 | Rake $0.01
Board: [Js 5s 8s]
Seat 2: BAD BeAts02 (small blind) collected ($0.19), mucked
Seat 3: VIVIVIVIV (big blind) folded before the Flop
Seat 4: pipelis is sitting out
Seat 5: trichinosis folded on the Flop
Seat 6: Syrenski folded before the Flop
Seat 9: evil-bunny1 (button) didn't bet (folded)
A: 

Coincidentially, I have tinkered with parsing of hand history files as well :) I think the best candidates are python and perl. They are both cross-platform and open-source. Conceptually, the program design is straightforward: it simply involves iteration over line-wise input and the application of various regular expressions in order to extract information. And you could do that in almost any programming language. (You might even be able to do that in pure R, who knows?) However, I would cast my vote on perl, since it is renowned for being a superb language especially for for the processing of plain text files.

Bogdev
A: 

Have a look at 'grep' (Try Wikipedia).

It can be used in PHP: http://www.php.net/manual/en/function.preg-grep.php

There are desktop text editors that will do grep too. Some of them are free - e.g. TextWrangler (Mac)

radbourn3