views:

444

answers:

5

I built a webpage where users can submit a PDF which is then inserted into a MySQL database in a mediumblob for retrieval later.

This all works fine, except when the PDF contains images or embedded fonts, in which case the images are corrupted and any text using the font disappears (Acrobat display a message about the missing font).

I've determined the problem occurs from my passing the pdf data through the mysql_real_escape_string_function. I have switched to base64_encode/base64_decode on submission/retrieval which fixed the problem for all new files, but I have about 25 already submitted PDFs I need to be able to read.

Is it possible to reversed the effects of mysql_real_escape_string? Or are these files damaged beyond repair?

A: 

mysql_real_escape_string escapes data so that it is not misinterpreted as part of the SQL syntax, and should not have corrupted the data

e.g.

INSERT INTO `tbl` VALUES('John O\'Reilly');

Actually stores the value John O'Reilly in the field without the backslash (escape character)

Is it possible something else has caused this error?

David Caunt
A: 

Sure, should be fixable. You just need to figure out exactly what mysql_real_escape_string does. I believe you just need to remove any slashes that immediately precede a CR, LF, TAB, single-quote, double-quote, NUL, or another slash. Should be a one-line regexp fix.

Eli
+1  A: 

mysql_real_escape_string() puts backslashes to these characters.

\x00, \n, \r, \, ', " and \x1a

The thing is, that if your binary output has backslashes it it's binary data, it can be very hard to fix. That being said, there is no magical function to undo this function.

Ólafur Waage
A: 

I honestly don't know what else it could be. When I changed that bit of code it cleared up the problem, and I've found other instances online where people had the same problem (but no solutions).

Here is the insertion code:

function db_value( $mysqli, $value ) {
if( empty($value) )
 return "''";

if( get_magic_quotes_gpc() )
 $value = stripslashes($value);

if( !is_numeric($value) || ($value[0] == '0' && $value != 0) )
 $value = "'".mysqli_real_escape_string($mysqli, $value)."'";

return $value;
}

function saveToDatabase( $data, $fileTempName, $abstractFileName ) {
$fileHandle = fopen( $fileTempName, 'r' );
$abstractFile = fread( $fileHandle, filesize( $fileTempName ) );
fclose( $fileHandle );
$abstractFileMimeType = $fileUpload->get_mime();

$mysqli = connect_to_database();

if( $mysqli != FALSE ) {
 $insertQuery = "INSERT INTO `paper_submissions` (
  `name`,
  `affiliation`,
  `email`,
  `phone_number`,
  `title`,
  `abstract`,
  `abstract_file`,
  `abstract_file_name`,
  `abstract_file_mime_type`,
  `requests_financial_support`,
  `HTTP_USER_AGENT`,
  `REMOTE_ADDR`
 )
 VALUES ( 
  ".db_value( $mysqli, $data['submitter_name'] ).",
  ".db_value( $mysqli, $data['submitter_affiliation'] ).",
  ".db_value( $mysqli, $data['submitter_email'] ).",
  ".db_value( $mysqli, $data['submitter_phone'] ).",
  ".db_value( $mysqli, $data['paper_title'] ).",
  ".db_value( $mysqli, $data['abstract_text'] ).",
  ".db_value( $mysqli, $abstractFile ).",
  ".db_value( $mysqli, $abstractFileName ).",
  ".db_value( $mysqli, $abstractFileMimeType ).",
  ".db_value( $mysqli, $data['request_financial_support'] ).",
  ".db_value($mysqli, $_SERVER['HTTP_USER_AGENT']).",
  ".db_value($mysqli, $_SERVER['REMOTE_ADDR'])."
 )";

 $insertResult = $mysqli->query( $insertQuery );

 close_database( $insertResult, $mysqli );

 return $insertResult;
}

return FALSE;
}

And here is the extraction code:

$selectQuery = "SELECT `abstract_file_name`, `abstract_file_mime_type`, `abstract_file`
FROM `paper_submissions`
WHERE `id` = ".db_value( $mysqli, $id );


$result = $mysqli->query( $selectQuery );

if( $result != FALSE ) {
if( $result->num_rows ) {
 $paper = $result->fetch_array( MYSQL_ASSOC );

 $fileSize = strlen( $paper['abstract_file'] );

 header( 'Date: '.gmdate( "D, d M Y H:i:s" ).' GMT' );
 header( 'Expires: Thu, 19 Nov 1981 08:52:00 GMT' );
 header( 'Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0' );
 header( 'Pragma: no-cache' );
 header( 'Content-Type: '.$paper['abstract_file_mime_type'].'; charset=utf-8' );
 header( 'Content-Length: '.$paper['abstract_file_size'] );
 header( 'Content-Disposition: inline; filename="'.$paper['abstract_file_name'].'"' );
 echo $paper['abstract_file'];
 exit();
}
}
A: 

Ólafur,

I gathered that from the php manual, and even tried the following:

$search = array( "\\0", "\\n", "\\r", "\\\\", "\\'", "\\\"", "\Z", );
$replace = array( "\x00", "\n", "\r", "\\", "'", "\"", "\x1a" );
$desiredString = str_replace( $search, $replace, $escapedString );

This seems to works fine when dealing with text, but applying it to the binary data only further degrades PDF (e.g. paragraphs go missing).

You could have regular data that fits that criteria, thats why it's hard to fix.
Ólafur Waage