ansaurus

Question

types of buffers

Answer 1

+5 A:

In C under UNIX (and probably other operating systems as well), there are usually two buffers, at least in your given scenario.

The first exists in the C runtime libraries where information to be written is buffered before being delivered to the OS.

The second is in the OS itself, where information is buffered until it can be physically written to the underlying media.

As an example, we wrote a logging library many moons ago that forced information to be written to the disk so that it would be there if either the program crashed or the OS crashed.

This was achieved with the sequence:

fflush (fh); fsync (fileno (fh));

The first of these actually ensured that the information was handed from the C runtime buffers to the operating system, the second that it was written to disk. Keep in mind that this is an expensive operation and should only be done if you absolutely need the information written immediately (we only did it at the SUPER_ENORMOUS_IMPORTANT logging level).

To be honest, I'm not entirely certain why your interviewer thought it would be slow unless you're writing a lot of information. The two levels of buffering already there should perform quite adequately. If it was a problem, then you could just introduce another layer yourself which wrote the messages to an in-memory buffer and then delivered that to a single fprint-type call when it was about to overflow.

But, unless you do it without any function calls, I can't see it being much faster than what the fprint-type buffering already gives you.

Following clarification in comments that this question is actually about buffering inside a kernel:

Basically, you want this to be as fast, efficient and workable (not prone to failure or resource shortages) as possible.

Probably the best bet would be a buffer, either statically allocated or dynamically allocated once at boot time (you want to avoid the possibility that dynamic re-allocation will fail).

Others have suggested a ring (or circular) buffer but I wouldn't go that way (technically) for the following reason: the use of a classical circular buffer means that to write out the data when it has wrapped around will take two independent writes. For example, if your buffer has:

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|s|t|r|i|n|g| |t|o| |w|r|i|t|e|.| | | | | | |T|h|i|s| |i|s| |t|h|e| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                                 ^           ^
                                 |           |
                   Buffer next --+           +-- Buffer start

then you'll have to write "This is the " followed by "string to write.".

Instead, maintain the next pointer and, if the bytes in the buffer plus the bytes to be added are less than the buffer size, just add them to the buffer with no physical write to the underlying media.

Only if you are going to overflow the buffer do you start doing tricky stuff.

You can take one of two approaches:

Either flush the buffer as it stands, set the next pointer back to the start for processing the new message; or
Add part of the message to fill up the buffer, then flush it and set the next pointer back to the start for processing the rest of the message.

I would probably opt for the second given that you're going to have to take into account messages that are too big for the buffer anyway.

What I'm talking about is something like this:

initBuffer:
    create buffer of size 10240 bytes.
    set bufferEnd to end of buffer + 1
    set bufferPointer to start of buffer
    return

addToBuffer (size, message):
    while size != 0:
        xfersz = minimum (size, bufferEnd - bufferPointer)
        copy xfersz bytes from message to bufferPointer
        message = message + xfersz
        bufferPointer = bufferPointer + xfersz
        size = size - xfersz
        if bufferPointer == bufferEnd:
            write buffer to underlying media
            set bufferPointer to start of buffer
        endif
    endwhile

That basically handles messages of any size efficiently by reducing the number of physical writes. There will be optimisations of course - it's possible that the message may have been copied into kernel space so it makes little sense to copy it to the buffer if you're going to write it anyway. You may as well write the information from the kernel copy directly to the underlying media and only transfer the last bit to the buffer (since you have to save it).

In addition, you'd probably want to flush an incomplete buffer to the underlying media if nothing had been written for a time. That would reduce the likelihood of missing information on the off chance that the kernel itself crashes.

Aside: Technically, I guess this is sort of a circular buffer but it has special case handling to minimise the number of writes, and no need for a tail pointer because of that optimisation.

paxdiablo 2010-07-06 04:41:12

+1 for detailed explanation. If I were asked this question in an interview, I would have just said "stdio buffers".

R.. 2010-07-06 05:07:00

@paxdiablo Thank you very much for the detailed expalanation

mousey 2010-07-07 07:20:45

Answer 2

A:

What comes to mind for me is time-based buffers and size-based. So you could either just write whatever is in the buffer to file once every x seconds/minutes/hours or whatever. Alternatively, you could wait until there are x log entries or x bytes worth of log data and write them all at once. This is one of the ways that log4net and log4J do it.

Duracell 2010-07-06 04:43:04

Answer 3

+2 A:

There are also ring buffers which have bounded space requirements and are probably best known in the Unix dmesg facility.

msw 2010-07-06 04:43:32

Answer 4

A:

Overall, there are "First-In-First-Out" (FIFO) buffers, also known as queues; and there are "Latest*-In-First-Out" (LIFO) buffers, also known as stacks.

To implement FIFO, there are circular buffers, which are usually employed where a fixed-size byte array has been allocated. For example, a keyboard or serial I/O device driver might use this method. This is the usual type of buffer used when it is not possible to dynamically allocate memory (e.g., in a driver which is required for the operation of the Virtual Memory (VM) subsystem of the OS).

Where dynamic memory is available, FIFO can be implemented in many ways, particularly with linked-list derived data structures.

Also, binomial heaps implementing priority queues may be used for the FIFO buffer implementation.

A particular case of neither FIFO nor LIFO buffer is the TCP segment reassembly buffers. These may hold segments received out-of-order ("from the future") which are held pending the receipt of intermediate segments not-yet-arrived.

* My acronym is better, but most would call LIFO "Last In, First Out", not Latest.

Heath Hunnicutt 2010-07-06 04:45:26

Your acronym isn't any better. It is better to stay with the well-established meaning of technical terms. Inventing your own meaning or spelling could result in confusion, or it could even be seen as a sign of incompetence.

Secure 2010-07-06 07:12:48

That said, there is of course nothing wrong with inventing your own concepts that happen to collide with an already existing acronym, since there is only a limited number of acronyms. Just avoid redefinitions of existing concepts.

Secure 2010-07-06 07:18:48

Wow, you seem to feel personally insulted by my downvote and comment, reading through all my old answers and downvoting a lot of them. Sorry about this. But this sort of reaction is a sure way to receive downvotes in the future **without** actually explaining the reasons.

Secure 2010-07-06 09:05:33

You're both correct and incorrect, but mostly the latter. I have given you reasons for my downvotes, and they are just as lame as yours. You're too self-satisfied and preachy. Your answers and comments are your opinions more than technical material. I'm glad we've gotten off to this start. I think you're lame.

Heath Hunnicutt 2010-07-06 14:08:24

Finally, a personal enemy on SO. I'm honoured... I think. Please feel invited to comment all of my old answers and downvote them as you like, since I don't play SO for the highscore. Just don't expect any more answer from me on this topic, as I won't descend to this kindergarten level. If you have specific questions of clarification about a specific point, please ask and I will answer. Thank you.

Secure 2010-07-06 15:03:30

Just one last question: You've downvoted this (accepted) answer without a comment: http://stackoverflow.com/questions/3110088/c-switch-statement-has-default-to-be-the-last-case/3110478#3110478 This answer was based on ANSI C standard citations. If this is only my self-satisfied opinion and not technical material, I'm a bit at loss about what exactly you expect?

Secure 2010-07-06 15:05:59

Don't swear to a last anything, that doesn't befit your personality type. I won't bother to comment all your downvotes, but I did bother to lead you to water on 3110088, which may be accepted by the masses but is lame pedagogy, actually misleading.

Heath Hunnicutt 2010-07-06 15:40:20

You've actually downvoted the ANSI C Standard itself, from which the code in question was cited. ;) I think I can live with this answer. Thank you and goodbye.

Secure 2010-07-06 16:37:44

It's not that the example is wrong; it is not. The example was useless given the question. The non-accepted answer to 3110088 has six times (6x) the votes yours does. That means something and it concurs with my view that your answer is poor given the question. Thank you and good-bye.

Heath Hunnicutt 2010-07-07 02:15:52

Answer 5

A:

Correct me if I'm wrong, but wouldn't using a mmap'd file for the log avoid both the overhead of small write syscalls and the possibility of data loss if the application (but not the OS) crashed? It seems like an ideal balance between performance and reliability to me.

R.. 2010-07-06 05:06:00

ansaurus

tags:

views:

answers:

types of buffers

related questions