ansaurus

Question

What's the best tool to do text processing in Linux or Mac?

Answer 1

+3 A:

For doing simple steam editing sed is a great utility that comes standard on most *nix boxes, but for anything much more complex than that I would suggest getting into Perl. The curve isn't that bad and it's great for writing most forms of regular text parsing. A great reference can be found here.

Mimisbrunnr 2010-03-15 06:48:08

Answer 2

+6 A:

Perl and awk come to mind, although Python will do, if you'd rather not learn a new language.

Perl's a general purpose language, awk's more oriented to text processing of the type you've described.

ronys 2010-03-15 06:58:37

“Whenever faced with a problem, some people say `Lets use AWK.' Now, they have two problems.” -- D. Tilbrook ;)

J.F. Sebastian 2010-03-15 07:13:16

@J.F, that's just nonsense.

ghostdog74 2010-03-15 07:44:25

@ronys, awk is not just for text processing. you can use it as a programming language as well.

ghostdog74 2010-03-15 07:51:26

@ghostdog: The quote survived 20 years (since 1988 http://regex.info/blog/2006-09-15/247 ). It tells something. Also note `;)` at the end :)

J.F. Sebastian 2010-03-15 08:07:31

don't you think its irrelevant and dated? awk has come a long way since then.

ghostdog74 2010-03-15 08:30:11

can u suggest any good resources for awk ?

euphoria83 2010-03-15 17:13:53

Answer 3

+1 A:

#!/usr/bin/env python
# process.py     
import fileinput

for line in fileinput.input(): # you could use `inplace=True` parameter here
    words = line.split() # e.g. split on white spaces
    all_except_last = words[:-1]
    print ' '.join(all_except_last)
    # or
    first_two = words[:2]
    print ' '.join(first_two)

Examples:

$ echo a b c | python process.py
$ ./process.py input.txt another.txt

J.F. Sebastian 2010-03-15 07:10:10

`perl -lane '$,=" ";pop@F;print@F'` or `perl -lane '$,=" ";print@F[0,1]'`

Hynek -Pichi- Vychodil 2010-03-15 08:50:44

@Hynek -Pichi- Vychodil: Try little experiment: show Perl and Python versions to somebody who doesn't know neither and ask them what these scripts do. And I agree nothing beats Perl one-liners in brevity except J (for math stuff).

J.F. Sebastian 2010-03-15 10:42:00

Answer 4

+1 A:

*nix tools such as awk/grep/tail/head/sed etc are good file processing tools. If you want to search for patterns in files and process them, you can use awk. For big files, you can use a combination of grep+awk. Grep for its speed in pattern searching and awk for its ability to manipulate text. with regards to sed, oftern what sed does, awk can already do them, so i find it redundant to use sed for file processing.

In terms of speed of processing files, awk is often on par, or sometimes better than Perl or other languages.

Also, 2 very good tools for getting the front and back portion of a file FAST, are tail and head. So to get last lines, you can use tail.

ghostdog74 2010-03-15 07:40:19

I assume that by "tokens" the OP means items on a line, not lines of the file, so `tail` would not be applicable to that case. `cut`, on the other hand...

Dave Sherohman 2010-03-15 11:07:10

ansaurus

tags:

views:

answers:

What's the best tool to do text processing in Linux or Mac?

related questions