views:

1337

answers:

13

Hi, I need to write a circular file in c++. The program has to write lines in a file and when the code reaches a maximum number of lines, it must overwrite the lines in the beginning of the file.

Anyone have any idea? Thanks

+6  A: 

Unfortunately you can't truncate/overwrite lines at the beginning of a file without rewriting the entire thing.

New Suggestion

I've just thought of a new approach that might do the trick for you...

You could include a small header to your file that has the following structure.

Edit: Rubbish, I've just described a variant of a circular buffer!

Header Fields

  • Bytes 00 - 07 (long) - Total (current) number of lines written to the file.
  • Bytes 08 - 15 (long) - Pointer to the start of the "actual" first line of your file. This will initially be the byte after the header ends, but will change later when data gets overridden.`
  • Bytes 16 - 23 (long) - Length of the "end section" of the file. Again, this will initially be zero, but will change later when data gets overridden.

Read Algorithm (Pseudocode)

Reads the entire file.

Read the header field that points to the start of the "actual" first line
Read the header field that specifies the length of the "end section"
Read every line until the end of the file
Seek to the byte just after the end of the header
Read every line until the "end section" has been fully read

Write Algorithm (Pseudocode)

Writes an arbitrary number of new lines to the file.

Read the header field that contains the total no. of lines in the file
If (line count) + (no. of new lines) <= (maximum no. of lines) Then
    Append new lines to end of file
    Increment header field for line count by (no. of ne lines)
Else
    Append as many lines as possible (up to maximum) to end of file
    Beginning at pointer to first line (in header field), read as many lines as still need to be written
    Find the total byte count of the lines just read
    Set the header field that points to the first line to the next byte in the stream
    Keep writing the new lines to the end of the file, each at a time, until the byte count of the remaining lines is less than the byte count of the lines at the beginning of the file (it may be that this condition is true immediately, in which case you don't need to write any more)
    Write the remaining new lines to the start of the file (starting at the byte after the header)
    Set the header field that contains the length of the "end section" of the file to the number of bytes just written after the header.

Not a terribly simple algorith, I fully admit! I nonetheless think it's quite elegant in a way. Let me know if any of that isn't clear, of course. Hopefully it should do precisely what you want now.

Original Suggestion

Now, if you're lines are guaranteed to be of constant length (in bytes), you could easily enough just seek back to the appropiate point and overwrite existing data. This would seem like a rather unlikely situation however. If you don't mind imposing the restriction that your lines must have a maximum length, and additionally padding each of the lines you write to this maximum length, then that could make matters easy for you. Still, it has its disadvantages such as greatly increasing file size under certain circumstances (i.e. most lines are much shorted than the maximum length.) It all depends on the situation whether this is acceptable or not...

Finally, you may instead want to look at utilising an existing logging system, depending on your exact purpose.

Noldorin
The lines are a very variable size :(also thanks for the reply :)
Kram
@Kram: No problem. :) See my updated answer for another method.
Noldorin
If you used fixed sized lines (e.g. 80 bytes/line) you can file = fopen(...,"rb+") and then fseek(file,line_no*80,SEEK_SET) followed by some memcpy/strcpy/etc and then fwrite(charbuffer,80,file).
KitsuneYMG
A: 

that's going to be tricky since file I/O works with bytes as the underlying unit of storage, and not lines.

I mean you could just fseek() back to the beginning and clobber the earlier data, but I have a hunch that's not what you want.

Jason S
+4  A: 

The usual way to handle logging that doesn't explode in size is to use rolling log files, and roll them once a day or similar, and only keep the N latest files.

For instance, every day you create a new logfile with the filename `application_2009_05_20.log', and start writing to it, always appending.

Once you have 14 days worth of logfiles, you start deleting the oldest.

Lasse V. Karlsen
most linux systems have a utility called "logrotate" that does this for you. Perhaps it could be leveraged to accomplish this goal really simply.
rmeador
+5  A: 

Since files are byte-oriented, and you need a line-oriented service, you have two choices:

  1. implement a line-oriented wrapper around the file

  2. switch to some line-oriented device. Just of the top of my mind: SQLite has some nice C++ wrappers available.

xtofl
+1 for SQLite suggestion
Dominic Rodger
+1  A: 

Simple solution:

  1. Have some sort of delimeter for lines.
  2. Each time you add a new line, just overwrite all text from the current line onwards until it hits a delimiter.
  3. The end of the file is a special case and may have some padding to keep the file size constant.

This solution is designed to offer a constant file length rather than a constant number of lines within the file. The number of lines will vary over time depending on length. This solution makes it more difficult to seek to specific line numbers quickly, though you could stick some indicator data at the top or bottom of the file to make this easier.

"Clever" Solution (variation on solution above):

Just use the same trick that is sometimes used for deques. Just expicitely wrap around from the beginning of the file to the end, but keep track of where the beginning/end of the file is. You could write an unwrap utility to convert this file to a standard one when you wished to read it with a program that did not support it. This solution is REALLY easy to implement, but I like the version above better.

Ugly Solution:

When adding lines, add a moderate amount of padding to every line you add.

Each time you wish add a new line, do as follows:

  1. Determine the length of the current line, including padding. Note that the start of the current line is equal the end, not including padding, of the previous line.
  2. If the current line is long enough to fit within the line you are on, put it in. Add left padding to the end of the previous line equal to 1/3 of any excess space, and right padding equal to 2/3 of any excess space.
  3. If the current line is not long enough to fit within the line you are on, shift lines ahead of you (eating their padding) until their is room.
  4. If Step 3 hits some sort of threshhold, rewrite the whole file with more padding.

Note that this will work pretty poorly unless your lines are pretty consistent about length. A simpler solution is to guarantee lines are of constant length (but put in some way to create multi-line "lines" in case you exceed that length.

Brian
A: 

I've seen this done by keeping the current write position for the file somewhere. When you need to add a line, you seek to the position, write the line, and update the position in an atomic fashion. If you overflow, then you seek to zero before you write the line. We do this today for size constrained circular log files. Doing it on a line-constrained basis is a little odd, but could probably be done in a similar fashion. Our write loop looks something like:

logFile.lockForWrite();
currentPosition = logFile.getWritePosition();
logFile.seek(currentPosition);
for each line in lineBuffer {
    if ((currentPosition+line.length()) > logFile.getMaxSize()) {
        currentPosition = 0;
        logFile.seek(0);
    }
    logFile.write(line);
    currentPosition += line.length();
}
logFile.setWritePosition(currentPosition);
logFile.unlock();

The tricky part is in maintaining the current write position and finding some way to coordinate reading the file (e.g., with the tail utility) while your application is writing to it. Your reader utility has to keep track of the write position as well so it's read loop becomes:

lastPosition = logFile.getWritePosition();
while (!killed) {
    logFile.wait();
    logFile.lockForRead();
    newPosition = logFile.getWritePosition();
    logFile.seek(lastPosition);
    newLine = logFile.readFrom(lastPosition, (newPosition-lastPosition));
    lastPosition = newPosition;
    logFile.unlock();
}

This isn't in any particular language - it's just pseudocode but the idea is there. Of course, I left the handling all of the interesting edge cases to the reader.

With all of that said... I agree with the other opinions. Don't do this unless you have a really good reason. It sounds like a great idea, but:

  • the implementation is difficult to get write
  • it's even harder to make efficient
  • since the write position has to be maintained somewhere, multiple utilities have to agree on how it is read, updated, initialized, etc.
  • having a non-linear log makes log processing difficult using existing tools like grep, tail, perl, etc.

Overall, you will be better off using some existing package logging package that allows for configurable log file management. Take a look at Apache's log4cxx or Poco's Poco::Logger.

D.Shawley
+1  A: 

if the files need to be text files:
This is very problematic with varying line lengths. Your first two lines are 80 characters each, how do you overwrite that with a 100 character line?

If the new line should replace the first line, this would result in an file insert, which is a very expensive operation (basically, the entire remainder of the file needs to be read and written). You really don't want to do that for all but minimal amounts of data.

If this is for logging purpose, use rollng log files - e.g. one a day (as suggested by lassevek). I made it even simpler: when file size exceeds a limit, the old file is renamed to .bak (old .bak is deleted), and start anew. With an 1MB limit, this preserves e.g. the last 1 MB, while never occupying more than 2 MB.

You could employ a similar mechanism with two or more files. Basically, move the "rollover" to files, rather than to lines.

if the file may be in a proprietary format:
Use a basic DB engine (like SQLite as suggested), or another structured storage mechanism.

peterchen
A: 

Easy workaround:

  • Define a maximum length of 80 characters per line. wrap longer "lines" into several lines.
  • Add a line header to a line. such as "[#589] This is the 589th line" so you'll know what comes first etc.
yairchu
+1  A: 

You could use log4cxx with a RollingFileAppender to write this information into a log file. The RollingFileAppender will handle rolling over the log file when it reaches a certain size. I don't think it's exactly what you want, but it's fairly simple -- maybe it will do.

Paul Morie
+1  A: 

Just create a mapping of the file of the required size (CreateFileMapping or mmap), write the lines in the buffer and start over when the maximum number is reached.

Edouard A.
+2  A: 

Use a circular buffer and write the buffer to a file for each add.

Here is a small and simple code size solution. It is a simple circular buffer of strings and each time you add strings it writes the entire buffer of strings to the file (of course you incur a significant cost for writing all the strings for a single add operation. So this is only suitable for a small number of strings).

Simple implementation of circular buffer with output to a file:

// GLOBALS ( final implementation should not use globals )
#define MAX_CHARS_PER_LINE (1024)
#define MAX_ITEMS_IN_CIRCULARBUF (4) // must be power of two
char    lineCircBuf[MAX_ITEMS_IN_CIRCULARBUF][MAX_CHARS_PER_LINE];
int     lineCircBuf_add = 0;
int     lineCircBuf_rmv = 0; // not being used right now
uint32_t lineCircBuf_mask = MAX_ITEMS_IN_CIRCULARBUF-1;
char    FILENAME[] = "lineCircBuf.txt";
FILE *  ofp = NULL;

int addLine(char * str) {
    int i;

    // Error checking
    if( strlen(str) > MAX_CHARS_PER_LINE ) {
        return -1; // failure
    }
    if( ofp != NULL) {
        fclose(ofp);
    }

    // Copy string into circular buffer
    strncpy( &(lineCircBuf[lineCircBuf_add][0]),
             str,
             MAX_CHARS_PER_LINE );
    lineCircBuf_add = ( lineCircBuf_add + 1 ) & lineCircBuf_mask;

    // Write to file
    ofp = fopen(FILENAME,"w");
    for( i = 0; i < MAX_ITEMS_IN_CIRCULARBUF-1; i++ ) {
        fprintf( ofp, "%s\n", lineCircBuf[i] );
    }
    fprintf( ofp, "%s", lineCircBuf[i] ); // do not add a newline to the last line b/c we only want N lines in the file

    return 0; // success
}

int removeLine(int index) {
    // not implemented yet
}

void unitTest() {
    int i;

    // Dummy text to demonstrate adding string lines
    char lines[5][MAX_CHARS_PER_LINE] = {
        "Hello world.",
        "Hello world AGAIN.",
        "The world is interesting so far!",
        "The world is not interesting anymore...",
        "Goodbye world."
    };

    // Add lines to circular buffer
    for( i = 0; i < sizeof(lines)/sizeof(lines[0]); i++ ) {
        addLine(&(lines[i][0]));
    }
}

int main() {
    unitTest();
    return 0;
}

So in the above example we had 5 lines of input and our buffer was only 4 lines long. Therefore the output should have 4 lines only and the first line should be overwritten by the last line "Goodbye world". Sure enough the first line of the output confirms does have "Goodbye world":

Goodbye world.
Hello world AGAIN.
The world is interesting so far!
The world is not interesting anymore...
Trevor Boyd Smith
A: 

If you're wanting to generate this file for input to another application, I think your best bet would be to log directly to a relation database (SQL Server, MySQL, whatever..) Then periodically generate that file as needed from the logged data.

Eclipse
A: 

To get around the variable size thing, you will probably end up with an indirection and an allocation scheme. This would consist of an indirection block with a fixed number of 'pointers' into the file, and one 'next-to-be-written' pointer, that would wrap around N.

But the main trick would be in adding the indirection.

xtofl