views:

41

answers:

0

Hello, I'm just wondering if anyone has any experience doing this? All of the resources I have seemed to find on the internet deal with doing the opposite (inserting images into RTFs), but it seems like extraction isn't that popular of a notion?

Basically, I'm just wondering if it's possible to upload an RTF file using a form, and dump all of the images found in the file to a directory.

Here's what I've tried so far:

<?php
//get contents of RTF file
$raw_text_string = read_file($file_data['full_path']);

//create empty array to hold all hex image strings from RTF string
$images_array = array();

//get number of images and fill $images_array with hex image strings from RTF string
$number_of_images = preg_match_all("/\{\\\pict[^\}]+\}/",$raw_text_string,$images_array,PREG_OFFSET_CAPTURE);

//create empty array to hold sanitized hex strings (with no rtf metadata)
$hex_image_strings_array = array();

//create sanitized hex image strings (I am disregarding all metadata at this point because i know
//all images in my rtf files are going to be BMP, because in all of the metadata, they all contain
// the \picbmp rtf tag
foreach($images_array[0] as $image)
{
    $current_image = preg_replace("/\{\\\pict.+\n/","",$image[0]);
    $current_image = trim($current_image);
    $current_image = preg_replace("/\\\pic.+\n/","",$current_image);
    $current_image = str_replace("}","",$current_image);
    $hex_image_strings_array[] = $current_image;
}

//cycle through each hex string in the array, and build a binary string, converting the hex
//string character-by-character
foreach($hex_image_strings_array as $index => $heximage)
{
    $string_to_write = '';

    $image_as_array = str_split(trim($heximage));

    foreach($image_as_array as $character)
    {
        $string_to_write .= chr(hexdec($character));
    }

    $handle = fopen('./temp/images/image'.$index.'.bmp', 'w');

    fwrite($handle,$string_to_write);
}
?>

I know one of the problems I'm probably facing is missing header information, which also seems like a tricky problem to try to solve.

I've even tried directly copy & pasting the image information (the hex string) directly out of the RTF file into a stand-alone string in a php file, do my character-by-character binary conversion on it and printing it to the screen after setting the header of the php file to be bmp using the header() command, and it still doesn't work.

Does anyone have any ideas?