views:

177

answers:

2

I want to save some text to the database using the Django ORM wrappers. The problem is, this text is generated by scraping external websites and many times it seems they are listed with the wrong encoding. I would like to store the raw bytes so I can improve my encoding detection as time goes on without redoing the scrapes. But Django seems to want everything to be stored as unicode. Can I get around that somehow?

+1  A: 

You can store data, encoded into base64, for example. Or try to analize HTTP headers from browser, may be it is simplier to get proper encoding from there.

Alexander Artemenko
Some percent of sites just misencode their data or have an inconsistent encoding across the page. I still want to store the raw data though so I can go back and see it exactly.
lacker
+1  A: 

Create a File with the data. Use a Django models.FileField to hold a reference to the file.

No it does not involve a ton of I/O. If your file is small it adds 2 or 3 I/O's (the directory read, the iNode read and the data read.)

S.Lott