views:

94

answers:

3

I'm basically trying to setup my own private pastebin where I can save html files on my private server to test and fool around - have some sort of textarea for the initial input, save the file, and after saving I'd like to be able to view all the files I saved.

I'm trying to write this in python, just wondering what the most practical way would be of storing the file(s) or the code? SQLite? Straight up flat files?

One other thing I'm worried about is the uniqueness of the files, obviously I don't want conflicting filenames ( maybe save using 'title' and timestamp? ) - how should I structure it?

+1  A: 

I wrote something similar a while back in Django to test jQuery snippets. See:

http://jquery.nodnod.net/

I have the code available on GitHub at http://github.com/dz/jquerytester/tree/master if you're curious.

If you're using straight Python, there are a couple ways to approach naming:

  1. If storing as files, ask for a name, salt with current time, and generate a hash for the filename.

  2. If using mysqlite or some other database, just use a numerical unique ID.

Personally, I'd go for #2. It's easy, ensures uniqueness, and allows you to easily fetch various sets of 'files'.

thedz
that's a pretty nice script ya got there, i envisioned something like that at one point.. hehe props for having it on github
meder
+1  A: 

Have you considered trying lodgeit. Its a free pastbin which you can host yourself. I do not know how hard it is to set up.

Looking at their code they have gone with a database for storage (sqllite will do). They have structured there paste table like, (this is sqlalchemy table declaration style). The code is just a text field.

pastes = Table('pastes', metadata,
        Column('paste_id', Integer, primary_key=True),
        Column('code', Text),
        Column('parent_id', Integer, ForeignKey('pastes.paste_id'),
               nullable=True),
        Column('pub_date', DateTime),
        Column('language', String(30)),
        Column('user_hash', String(40), nullable=True),
        Column('handled', Boolean, nullable=False),
        Column('private_id', String(40), unique=True, nullable=True)
    )

They have also made a hierarchy (see the self join) which is used for versioning.

David Raznick
A: 

Plain files are definitely more effective. Save your database for more complex queries.

If you need some formatting to be done on files, such as highlighting the code properly, it is better to do it before you save the file with that code. That way you don't need to apply formatting every time the file is shown.

You definitely would need somehow ensure all file names are unique, but this task is trivial, since you can just check, if the file already exists on the disk and if it does, add some number to its name and check again and so on.

Don't store them all in one directory either, since filesystem can perform much worse if there are A LOT (~ 1 million) files in the single directory, so you can structure your storage like this:

FILE_DIR/YEAR/MONTH/FileID.html and store the "YEAR/MONTH/FileID" Part in the database as a unique ID for the file.

Of course, if you don't worry about performance (not many users, for example) you can just go with storing everything in the database, which is much easier to manage.

maksymko
So how would you pull say the files from the current month if you created one for each day?
meder
If performance is the ultimate concern, storing non-binary data (text, in this case) on the filesystem is almost never the way to go. A proper database allows connection pooling, load balancing, automatic mirroring, master/slave relations and a whole lot more. Not to mention the ability to run complex queries across the dataset more easily and more efficiently.
thedz
Yeah, unless your data set is really small, flat files don't scale worth beans. The solution to your "million file per directory" problem is to use a database.
Paul McMillan
@thedzLoading file from the filesystem is LIGHTNING fast, as compared to database query, since all your most used files will end up in the memory eventually.This depends on the usage scenario, however, and yes, CAN be done in a way so files are slower.
maksymko
All your most use database objects end up somewhere cached in any decent scalable system, so that's not a very compelling reason to use files. There's a reason why you don't see very many flat file backends when looking at sites that have ridiculously large visitor counts -- it's because it's much easier and practical to scale db backed solutions. The tools for scaling that out are already written and proven in production.
thedz