views:

413

answers:

8

I've been working on a C program which does quite a lot of string manipulation, and very often needs to be tweaked and recompiled for some sort of special case processing. I've been thinking that embedding some scripting language with good string manipulation support might make sense for the project.

What language would provide the best string manipulation support while being easy to embed in a C program?

For some extra background...

  • Performance is pretty important (especially startup time)
  • Needs easily be compiled on multiple platforms (Linux, Solaris, Win32 (ideally with MinGW), Darwin)
  • Needs to be a language which will still be around in 5 years time

I've looked a little at Python (perhaps too heavy weight?) and Lua (perhaps not focused on string manipulation?) but don't really know enough about them or what other choices might be out there.

A: 

Perl. Its (original) reason for being is string manipulation.

Matthew Scharley
But not at all easy to embed within a C program from what I understand . Do you know different?
Matt Sheppard
Never tried, personally I've not used C/C++ much beyond "Hello World." I do know atleast one (C++ I think) opensource project that does embed it though, if you want to poke around and see how it's done.
Matthew Scharley
Yeah, what's it called?
Matt Sheppard
Matthew Scharley
Looks like you've got to download Perl separately for kildclient, where as I'd need to actually compile it into the binary.
Matt Sheppard
You _can_ embed Perl: http://search.cpan.org/dist/perl/pod/perlembed.pod - but I would not inflict such a brutal and powerful curse on the poor users who will have to write those scripts ;)
Pavel Minaev
Embedding Perl is like giving birth to an elephant. And its advantage in string manipulation has been matched by other languages years ago. I have nothing against Perl, but it's a terrible choice for embedding.
Eli Bendersky
+2  A: 

People have been embedding tcl in larger projects for what seems like ages. It's been a while since I've had to use tcl for anything...

One of the things that sets tcl apart from other programming languages is that everything is a string.

And for your reference, here's the tcl documentation on string functions.

tcl might be easier to embed than perl, but I do have to agree @Matthew Scharley's reasoning. Also, tcl isn't exactly known for it's performance, but maybe that's changed in recent years.

Anyway, here is the tcl wiki link on embedding tcl in C applications, and a relevant quote from the page:

"How do I embed a Tcl interpreter in my existing C (or C++) application?" is a very frequently-asked question. It's straightforward, certainly far easier than doing the same with Perl or, in general, Python; moreover, this sort of "embeddability" was one of the original goals for Tcl, and many, many projects do it. There are no complete discussions of the topic available, but we can give an overview here. (RWT 14-Oct-2002)


Another alternative might be to go with Lua, as you mentioned, while extending it with another C string library of your choice (Google turns up The Better String Library, for instance).

Once you've compiled Lua into your application, you can "extend" C functions to Lua's interpreter. Or maybe the built-in string functions are adequate for you.

You certainly have a few options.

Mark Rushakoff
+7  A: 

I've never regretted using Lua.

It's very easy to embed in your application. In fact, now I usually don't write C applications, i just write C libraries and control them from Lua.

Text manipulation isn't its best feature, but it's certainly far better than C alone. And the LPEG library makes building parsers almost trivially easy, putting any regex to shame (but still has a couple of regex-like syntaxes if you prefer them).

Javier
In the embeddable language niche, Lua is simply perfect - small, perfect portability (strict C++-compilable ANSI C subset), neat and simple syntax, minimal but convenient standard library, ease of extensibility (I dare say it beats Python there), and very good documentation. My past experience with Lua had been nothing but positive. It may lack string manipulation capabilities out of the box, but you can easily provide all the custom functions that might be needed for that purpose (regex etc).
Pavel Minaev
+1 for writing C libraries controlled from Lua. That's become increasingly common for me as well.
RBerteig
Don't forget that Lua wins quite a few benchmark contests, and that is before bringing in a JIT compiler for its bytecode. And, a JIT is available too, see http://luajit.org/ for the gory details.
RBerteig
+2  A: 

We looked at both Python and Lua for scripting for a .NET product. The goal was to provide some scripability for end users. The decision came down to Python because the powers-that-be preferred anything with Microsoft support to everything else. My choice was for Lua.

Tangurena
+1  A: 

Some people may disagree but Sara Goleman has published a great book on extending and embedding PHP. Which is becoming one of the most widely used languages around... :)

PHP String support isn't as great as say Perl, but it's very usable.

Did I mention it's written in C? </my2cents>

Nathacof
What are the advantages of using PHP as an embedded language? Any such is usually domain-specific, so great popularity elsewhere is not a big factor. And PHP has many detractors as a language on its own.
Pavel Minaev
+1  A: 

Python is not heavyweight at all! It's quite simple to embed (here's the official guide, but you can find many tutorials as well), very powerful, great for string processing, and a pleasant and easy language to use overall. It has a huge user community and support base, which is a bonus.

Python has also been embedded into a large number of real-life applications. One cool example I can think of immediately is the Civilization IV game, most of which runs on Python scripts on top of a C++ API.

Eli Bendersky
+2  A: 

There's a good survey paper on the relative merits of the embedding APIs of various scripting languages:

H. Muhammad and R. Ierusalimschy. C APIs in extension
and extensible languages. Journal of Universal Computer
Science, 13(6):839–853, 2007.

Looking at combining both excellent string manipulation and an excellent embed API, I would suggest, in order:

  • Ruby: Excellent string support, including syntax support for regex. Well-designed embed API, very easy to use.
  • Lua: I'm not sure how its string support is, but its supposed to be a great language for embedding.
  • Python: Less easy to embed, slightly harder to use string features than Ruby. But it has Pyrex, so that might be an easier way to embed it.
  • PHP: Nasty API, nasty language. The embed SAPI is really a second-class citizen, but it does work. There are a lot of string manipulation functions. Still, I wouldn't recommend it.
  • Perl: Nasty to embed (so far as I've heard), string support could be better.

I can't comment about TCL, but I hear its designed for embedding.

Paul Biggar
+3  A: 
Norman Ramsey