tags:

views:

3962

answers:

12

I currently do my textfile manipulation through a bunch of badly remembered awk, sed, bash and a tiny bit of Perl.

I've seen mentioned a few places that python is good for this kind of thing, I know a little and I would like to know more. Is python a good choice for this, and is there a good book or guide to learning how to use python to replace shell scripting, awk, sed and friends?

+2  A: 

Python would be perfectly fine for text file manipulation. For learning, check here.

EBGreen
+3  A: 

I suggest the awesome online book Dive Into Python. It's how I learned the language originally.

Beyone teaching you the basic structure of the language, and a whole lot of useful data structures, it has a good chapter on file handling and subsequent chapters on regular expressions and more.

Dan
+14  A: 

Any shell has several sets of features.

  • The Essential Linux/Unix commands. All of these are available through the subprocess library. This isn't always the best first choice for doing all external commands. Look also at shutil for some commands that are separate Linux commands, but you could probably implement directly in your Python scripts. Another huge batch of Linux commands are in the os libary; you can do these more simply in Python.

    And -- bonus! -- more quickly. Each separate Linux command in the shell (with a few exceptions) forks a subprocess. By using Python shutil and os modules, you don't fork a subprocess.

  • The shell environment features. This includes stuff that sets a command's environment (current directory and environment variables and what-not). You can easily manage this from Python directly.

  • The shell programming features. This is all the process status code checking, the various logic commands (if, while, for, etc.) the test command and all of it's relatives. The function definition stuff. This is all much, much easier in Python. This is one of the huge victories in getting rid of bash and doing it in Python.

  • Interaction features. This includes command history and what-not. You don't need this.

  • The shell file management features. This includes redirection and pipelines. This is trickier. Much of this can be done with subprocess. But some things that are easy in the shell are unpleasant in Python. Specifically stuff like (a | b; c ) | something >result. This runs two processes in parallel (with output of a as input to b), followed b a third process. The output from that sequence is run in parallel with something and the output is collected into a file named result. That's just complex to express in any other language.

Specific programs (awk, sed, grep, etc.) can often be rewritten as Python modules. Don't go overboard. Replace what you need and evolve your "grep" module. Don't start out writing a Python module that replaces "grep".

The best thing is that you can do this in steps.

  1. Replace AWK and PERL with Python. Leave everything else alone.
  2. Look at replacing GREP with Python. This can be a bit morre complex, but your version of GREP can be tailored to your processing needs.
  3. Look at replacing FIND with Python loops that use os.walk. This is a big win because you don't spawn as many processes.
  4. Look at replacing common shell logic (loops, decisions, etc.) with Python scripts.
S.Lott
Also (from http://stackoverflow.com/questions/3479728/is-it-good-style-to-call-bash-commands-within-a-python-script-using-os-systemba ), for path/filename patterns expansion there's a "glob" module: http://docs.python.org/library/glob.html
Nickolay
Running bash script from python: http://stackoverflow.com/questions/2651874/embed-bash-in-python/2654398#2654398
Nickolay
@Nickolay. You don't need to run bash from Python. You don't need bash. Simply stop using bash.
S.Lott
+1  A: 

If your textfile manipulation usually is one-time, possibly done on the shell-prompt, you will not get anything better from python.

On the other hand, if you usually have to do the same (or similar) task over and over, and you have to write your scripts for doing that, then python is great - and you can easily create your own libraries (you can do that with shell scripts too, but it's more cumbersome).

A very simple example to get a feeling.

import popen2
stdout_text, stdin_text=popen2.popen2("your-shell-command-here")
for line in stdout_text:
  if line.startswith("#"):
    pass
  else
    jobID=int(line.split(",")[0].split()[1].lstrip("<").rstrip(">"))
    # do something with jobID

Check also sys and getopt module, they are the first you will need.

Davide
+1  A: 

This is an interesting read in that context: http://www.dabeaz.com/generators/

André
+1  A: 

Your best bet is a tool that is specifically geared towards your problem. If it's processing text files, then Sed, Awk and Perl are the top contenders. Python is a general-purpose dynamic language. As with any general purpose language, there's support for file-manipulation, but that isn't what it's core purpose is. I would consider Python or Ruby if I had a requirement for a dynamic language in particular.

In short, learn Sed and Awk really well, plus all the other goodies that come with your flavour of *nix (All the Bash built-ins, grep, tr and so forth). If it's text file processing you're interested in, you're already using the right stuff.

Eric Smith
+7  A: 
  • If you want to use Python as a shell, why not have a look at IPython ? It is also good to learn interactively the language.
  • If you do a lot of text manipulation, and if you use Vim as a text editor, you can also directly write plugins for Vim in python. just type ":help python" in Vim and follow the instructions or have a look at this presentation. It is so easy and powerfull to write functions that you will use directly in your editor!
Mapad
there's an ipython profile called 'sh' that makes the interpreter *very* much like a shell.
Autoplectic
+3  A: 

In the beginning there was sh, sed, and awk (and find, and grep, and...). It was good. But awk can be an odd little beast and hard to remember if you don't use it often. Then the great camel created Perl. Perl was a system administrator's dream. It was like shell scripting on steroids. Text processing, including regular expressions were just part of the language. Then it got ugly... People tried to make big applications with Perl. Now, don't get me wrong, Perl can be an application, but it can (can!) look like a mess if you're not really careful. Then there is all this flat data business. It's enough to drive a programmer nuts.

Enter Python, Ruby, et al. These are really very good general purpose languages. They support text processing, and do it well (though perhaps not as tightly entwined in the basic core of the language). But they also scale up very well, and still have nice looking code at the end of the day. They also have developed pretty hefty communities with plenty of libraries for most anything.

Now, much of the negativeness towards Perl is a matter of opinion, and certainly some people can write very clean Perl, but with this many people complaining about it being too easy to create obfuscated code, you know some grain of truth is there. The question really becomes then, are you ever going to use this language for more than simple bash script replacements. If not, learn some more Perl.. it is absolutely fantastic for that. If, on the other hand, you want a language that will grow with you as you want to do more, may I suggest Python or Ruby.

Either way, good luck!

MattG
+1  A: 

I have built semi-long shell scripts (300-500 lines) and Python code which does similar functionality. When many external commands are being executed, I find the shell is easier to use. Perl is also a good option when there is lots of text manipulation.

Mike Davis
+1  A: 

Adding to previous answers: check the pexpect module for dealing with interactive commands (adduser, passwd etc.)

Federico Ramponi
Argh, the foreground colors in that page hurt me...
Federico Ramponi
+1  A: 

There was interesting project, but it died two years ago: http://pyshell.sourceforge.net/

vartec
A: 

While researching this topic, I found this proof-of-concept code (via a comment at http://jlebar.com/2010/2/1/Replacing_Bash.html) that lets you "write shell-like pipelines in Python using a terse syntax, and leveraging existing system tools where they make sense":

for line in sh("cat /tmp/junk2") | cut(d=',',f=1) | 'sort' | uniq:
    sys.stdout.write(line)
Nickolay