tags:

views:

160

answers:

4

Hello :)

I'm trying to write my first iterator class / container type. Basically, I want to be able to iterate though a file, have the file converted to HEX on the fly, and pass the result into the boost::xpressive library. I don't want to do a one shot conversion to a string in RAM, because some of the files I need to be able to process are larger than the expected system RAM (multiple GB).

Here is what my code looks like. I'm using MSVC++ 2008.

Header for Iterator:

class hexFile;

class hexIterator : public std::iterator<std::bidirectional_iterator_tag, wchar_t>
{
    hexFile *parent;
    __int64 cacheCurrentFor;
    wchar_t cacheCharacters[2];
    void updateCache();
public:
    bool highCharacter;
    __int64 filePosition;
    hexIterator();
    hexIterator(hexFile *file, bool begin);
    hexIterator operator++();
    hexIterator operator++(int)
    {
     return ++(*this);
    }
    hexIterator operator--();
    hexIterator operator--(int)
    {
     return --(*this);
    }
    bool operator==(const hexIterator& toCompare) const;
    bool operator!=(const hexIterator& toCompare) const;
    wchar_t& operator*();
    wchar_t* operator->();
};

Implementation for iterator:

#include <stdafx.h>
class hexFile;
hexIterator::hexIterator()
{
    parent = NULL;
    highCharacter = false;
    filePosition = 0;
}
hexIterator::hexIterator(hexFile *file, bool begin) : parent(file)
{
    if(begin)
    {
     filePosition = 0;
     highCharacter = false;
    } else
    {
     filePosition = parent->fileLength;
     highCharacter = true;
    }
}

hexIterator hexIterator::operator++()
{
    if (highCharacter)
    {
     highCharacter = false;
     filePosition++;
    } else
    {
     highCharacter = true;
    }
    return (*this);
}

hexIterator hexIterator::operator--()
{
    if (highCharacter)
    {
     highCharacter = false;
    } else
    {
     filePosition--;
     highCharacter = true;
    }
    return (*this);
}

bool hexIterator::operator==(const hexIterator& toCompare) const
{
    if (toCompare.filePosition == filePosition &&
     toCompare.highCharacter == highCharacter)
     return true;
    else return false;
}

bool hexIterator::operator!=(const hexIterator& toCompare) const
{
    if (toCompare.filePosition == filePosition &&
     toCompare.highCharacter == highCharacter)
     return false;
    else return true;
}
wchar_t& hexIterator::operator*()
{
    updateCache();
    if (highCharacter)
     return cacheCharacters[1];
    return cacheCharacters[0];
}

wchar_t* hexIterator::operator->()
{
    updateCache();
    if (highCharacter)
     return cacheCharacters + 1;
    return cacheCharacters;
}

void hexIterator::updateCache()
{
    if (filePosition == cacheCurrentFor)
     return;
    BYTE rawData;
    DWORD numberRead;
    LONG lowValue = static_cast<LONG>(filePosition);
    LONG highValue = static_cast<LONG>(filePosition >> 32);
    SetFilePointer(parent->theFile, lowValue, &highValue, FILE_BEGIN);
    if (!ReadFile(parent->theFile, &rawData, 1, &numberRead, 0))
     throw std::runtime_error(eAsciiMsg("Error reading from file."));
    static const wchar_t hexCharacters[] = L"0123456789ABCDEF";
    cacheCharacters[0] = hexCharacters[rawData & 0x0F];
    cacheCharacters[1] = hexCharacters[rawData >> 4];
    cacheCurrentFor = filePosition;
}

Header for container

class hexFile {
public: 
    HANDLE theFile;
    unsigned __int64 fileLength;
    hexFile(const std::wstring& fileName)
    {
     theFile = CreateFile(fileName.c_str(),GENERIC_READ,FILE_SHARE_DELETE|FILE_SHARE_READ|FILE_SHARE_WRITE,NULL,OPEN_EXISTING,NULL,NULL);
     if (theFile == INVALID_HANDLE_VALUE)
     {
      throw std::runtime_error(eAsciiMsg("Could not open file!"));
     }
     BY_HANDLE_FILE_INFORMATION sizeFinder;
     GetFileInformationByHandle(theFile, &sizeFinder);
     fileLength = sizeFinder.nFileSizeHigh;
     fileLength <<= 32;
     fileLength += sizeFinder.nFileSizeLow;
    };
    ~hexFile()
    {
     CloseHandle(theFile);
    };
    hexIterator begin()
    {
     hexIterator theIterator(this, true);
     return theIterator;
    };
    hexIterator end()
    {
     hexIterator theIterator(this, false);
     return theIterator;
    };
};

However, no matter what test case I try, the regex never matches. I'm assuming it's some conceptual problem with my iterator.. it's supposed to be a constant bidirectional iterator in all cases.

Anything I'm missing, or is there an easier way to do this?

Billy3

+1  A: 

operator++(int) should be defined in terms of operator++(), not the other way around. Try switching them. Same for operator--.

rlbond
No joy :( Sorry.
Billy ONeal
Edited the above to reflect differences.
Billy ONeal
+1  A: 

This is quite a bit of code to read through; at first glance, I don't see anything.

For implementation notes, this sounds like more of an iostream than an iterator, which with buffering could be significantly faster. Boost has a library to help implement those.

If you want to use an iterator, they also have a library to help with that, it makes sure you get all the methods you need while simplifying the implementation a lot.

For debugging, I'd try for unit tests; that should hopefully point out the problem; start with something small, like a file with two characters, walk through in a debugger to see what's going on, and iterate until you pass everything.

Todd Gardner
Problem is that boost::xpressive doesn't know how to use an iostream.
Billy ONeal
After testing, the iterator produces correct results. It's not a bug with the iterator code itself, it's a an assertion failure inside boost::xpressive.
Billy ONeal
You could still implement it as an iostream filter (like the base64 one) and use an istream_iterator<char> over the input. Just my 2 cents :)
Todd Gardner
A: 

Try using a Boost::transform_iterator.

rlbond
Problem is that, when converting to hex, two characters are returned for every input character. Therefore passing the ++ and -- operators to the iterator underneath doesn't operate as expected... this needs to be handled explicitly.
Billy ONeal
+1  A: 

Well... turns out I'm a complete idiot.

boost::xpressive::regex_match(fileInstance.begin(), fileInstance.end(), results, internalRegex);

Well... what I meant to do was

boost::xpressive::regex**_search**(fileInstance.begin(), fileInstance.end(), results, internalRegex);

The iterator never had a thing wrong with it.

Sorry for wasting other's time, Billy3

Billy ONeal