tags:

views:

28

answers:

1

I have a very old .sql backup of a vbulletin site that I ran around 8 years ago. I am trying to see the file attachments that are stored in the DB. The script below extracts them all and is verified to be JPEG by hex dumping and checking the SOI (start of image) and EOI (end of image) bytes (FFD8 and FFD9, respectively) according to the JPEG wiki page.

But when I try to open them with evince, I get this message "Error interpreting JPEG image file (JPEG datastream contains no image)"

What could be going on here?

Some background info:

  • sqldump is around 8 years old
  • vbulletin 2.x was the software that stored the info
  • most likely php 4 was used
  • most likely mysql 4.0, possibly even 3.x
  • the column datatype these attachments are stored in is mediumtext

My Python 3.1 script:

#!/usr/bin/env python3.1

import re

trim_l = re.compile(b"""^INSERT INTO attachment VALUES\('\d+', '\d+', '\d+', '(.+)""")
trim_r = re.compile(b"""(.+)', '\d+', '\d+'\);$""")
extractor = re.compile(b"""^(.*(?:\.jpe?g|\.gif|\.bmp))', '(.+)$""")

with open('attachments.sql', 'rb') as fh:
    for line in fh:
        data = trim_l.findall(line)[0]
        data = trim_r.findall(data)[0]
        data = extractor.findall(data)
        if data:
            name, data = data[0]
            try:
                filename = 'files/%s' % str(name, 'UTF-8')
                ah = open(filename, 'wb')
                ah.write(data)
            except UnicodeDecodeError:
                continue
            finally:
                ah.close()

fh.close()

update The JPEG wiki page says FF bytes are section markers, with the next byte indicating the section type. I see some that are not listed in the wiki page (specifically, I see a lot of 5C bytes, so FF5C). But the list is of "common markers" so I'm trying to find a more complete list. Any guidance here would also be appreciated.

+1  A: 

Update your question with a sample SQL statement, including a few lines/bytes of the JPEG string value. Perhaps the data is base64 encoded, or even straight hex values. We'll help you further.

Also, it's easier to see the type of a file's contents by issuing a:

file yourfile.jpg
ΤΖΩΤΖΙΟΥ