views:

529

answers:

3

Hi again,

I have been working with postgreSQL, playing around with wikipedia's millions of hyperlinks and such, for 2 years now. I either do my thing directly by sending SQL commands, or I write a client side script in python to manage a million queries when this cannot be done productively (efficiently and effectively) manually.

I would run my python script on my 32bit laptop and have it communicate with a $6000 64bit server running POstgreSQL; I would hence have an extra 2.10 Ghz, 3 GB of RAM, psyco and a multithreaded SQL query manager.

I now realize that it is time for me to level up. I need to learn to server-side script using a procedural language (PL); I really need to reduce network traffic and its inherent serializing overhead.

Now, I really do not feel like researching all the PLs. Knowing that I already know python, and that I am looking for the means between effort and language efficiency, what PL do you guys fancy I should install, learn and use, and why and how?

Thanks again

+4  A: 

Since you already known python, PL/Python should be something to look at. And you sound like you write SQL for your database queries, so PL/SQL is a natural extension of that.

PL/SQL feels like SQL, just with all the stuff that you would expect from SQL anyway, like variables for whole rows and the usual control structures of procedural languages. It fits the way you usually interact with the database, but it's not the most elegant language of all time. I can't say anything about PL/Python, since I have never used it, but since you know python it should be easy to flip through some examples and see if you like it.

sth
True. I am hesitating between PL/Python or just plain PL\pgSQL. I know python is slow; I am wondering if PL/pgSQL is faster? I also wonder how to find out which PLs are already integrated into my postgresql server? Thank yo for your speedy answer!
Nicholas Leonard
Compared to you doing it over network form a script on your laptop, both variant should be blazing fast :). With debian, PL/pgSQL should already be installed and for the other languages there are packages available.
sth
I haven't tried PL/Python, but I imagine that it has very similar speed to PL/pgSQL. After all they are both interpreted languages. If you really need speed (and you probably don't), you'll need to write functions in C or C++ and build DLLs/shared objects.
j_random_hacker
`SELECT * FROM pg_language` will show you the languages installed in your current database. Check pg_pltemplate for the list of ones that you can install.
derobert
Thx guys! I have indeed decided to go with PL/Python. I am hoping someone will meke psyco64 one day! As for the network overhead, I am not sure it is really that bad for it seems that the network only causes more delay. If the client is multithreaded, it does not really car about delay and +2.1Ghz?
Nicholas Leonard
I did some stuff with PL/Python; after some minor hurdles, it was quite pleasant to use. I still recommend, stick with PL/SQL as far as possible, and for advanced magic use Python. I needed some advanced string processing and recursive calls, which was just impossible in PL/SQL, but easy in Python.
Mauli
+2  A: 

Why can't you run your Python on the database server? That has the fewest complexities -- you can run the program you already have.

S.Lott
+1  A: 

I was in the exact same situation as you and went with PL/Python after giving up on PL/SQL after a while. It was a good decision, looking back. Some things that bit me where unicode issues (client encoding, byte sequence) and specific postgres data types (bytea).

Koen Bok