



I am about to begin writing an app that handles adding new users/repostories to my subversion server, so that I don't have to repeatedly open vi and edit conf files and execute shell commands.

Most of my experience centers around C, C++, Objective-C and Java. Java seems decent for string manipulations with its tokenizer class, however I haven't really looked into what is really available in Java or any other languages for that matter.

What would be the best language for reading, writing and manipulating strings in text files and executing shell commands as a command line app? If you suggest a language please give me a good argument as to why you believe that is the best. Don't just fire off a language.

perl - as it was designed for string manipulations and executes other command line commands as easy as you only can imagine.
python - very structured language with a good text processing libraries. Can also run other command line commands and read their results.

+1: Python SVN library already written for you.
WIth a C++ background, I suspect Python will be easier to learn than Perl.
How long is piece of string?

Perl, it was designed from the ground up for processing text.

Nobody ever got fired for choosing Perl for text processing :)
Unless you work for Microsoft. You should be using C# or VB. /s
Nothing wrong with C# for string processing.
Whatever you're most comfortable using is the best choice.

+1 because this doesn't deserve a downvote.
Perl has regular expressions as a base variable type and uses them extensively. It also has been coined as meaning Practical Extraction and Report Language. It was developed by the linguist Larry Wall.

For simple text file manipulation I wouldn't go further than a shell script using basic tools as grep, awk, sed, or a perl or python script.

Just to add another option, I think Ruby has the same regular expression integration as perl without the syntax diarrhea (it's got the regex features that I have used). Anyway, this is completely subjective.

definitely subjective. Perhaps you should look at the x flag to regexes.
Among the common languages Perl obviously stands out. However as you asked for the best string manipulation language and we disregard practicalities a little you might like to take a look at Icon which developed from Snobol - or String Orientated Symbolic Language.

Ruby incorporates some ideas which can be originally traced back to Snobol too.

There is an open source Snobol available, if you want to be really perverse.
David Thornley
"I think so, Brain, but Snobol for Windows?"
I would certanly not suggest keeping with a c/java type language for this job. You should probably choose Perl/Python/Ruby as other people are suggesting -- not knowing these languages should not be a big issue if you already know others.

However there is one extra option if you are on a unix/linux server. Shell scripting! This is really the best tool for this job since you also need to execute some programs/utilities, bash is available on almost all servers so you gain easy portability. If the text processing job becomes too complex for shell scripting the defensive first choice should indeed be Perl, with Python and Ruby following after that based on developer preference.

The big reason for doing this in a non-compiled language is that you can always see what code is being run without having to compile and compare versions. Another benefit is that changes are easily made and visible right in your script file.

I'd add awk to the shell scripting.
AWK, Perl's little cousin is great if you're running on a Unix (or a nearly-Unix) of some type. If you just want to open up a text file, manipulate it, then write it again AWK was designed for this. It's simple and elegant. The AWK Programming Language book is probably one of the best language specific technical books that I've ever read. Starts simply, easy to read and then ends with some amazing examples. More than enough good documentation.

If it's a windows box, Cygwin has AWK, but I think just using ActiveState Perl on it's own is probably a more robust option.

I agree with Simon, I would look into doing this via shell scripting if using unix/linux as it is a good fit for this type of work.

If you must use a language, avoid anything compiled such as C/C++/Java as using them will only slow you down and go with Python/Ruby/Perl. Most of these interpreted languages will already be installed on a linux/unix system. Personally I prefer Python because the language is easy to read. and use. It is also very easy to open/read/write/append files (not that the others are hard).

There is a Boost regex library for C++ that looks very powerful (although I've only used the most basic features myself), and something like it is going into C++0/1x. Given that you have C++ experience, and C++ generally does well at generating that sort of executable, it's something to look at.

awk I guess. Really easy to learn and still has a lot of potential. Great for stream processing.


If you like Java, then you could consider groovy. It has the benefit of access to all of the Java APIs, but has much simpler regular expression usage.

Just for me I prefer Ruby for working with string data in my scripts. At the other hand for bigger and commercial apps I think it better done with Java.

Both are great languages and have a special place in my heart, but sometimes i truly despise Java as much as i love it.
