tags:

views:

161

answers:

3

Hello. I'm not exactly sure whether or not this is a silly question, but I guess I will find out soon enough.

I'm having problems understanding exactly how getc and getwc work. It's not that I can't use them, but more like I don't know exactly what they do. int and getc return most characters if I `printf("%c") them, including multibyte like € or even £.

My question is: how exactly do these functions work, how do they read stdin exactly? Explanations and good pointers to docs much appreciated.

Edit: Please, read the comment I left in William's answer. It helps clarify the level of detail I'm after.

A: 

The answer is platform dependent. On unix-like machines, getc checks if there is data available in the buffer. If not, it invokes read() to get some data in the buffer, returns the next character, and increments the file pointer (and other details). The details differ on different implementations, and really are not important to the developer.

William Pursell
Thanks, William, but I would like as much info as possible about this. For instance, what do you call "data", a byte? something else? Does it really return the character? What if the character is multibyte, and it still prints it ok (as is the case with "€")?
Dervin Thunk
there are many encodings where € is not multibyte. Anyway, if getc() reads a byte from a multicharacter inputstream (e.g. UTF-8 encoded), it just means it'll take a couple more getc/putc calls until the character is visible on screen. getc reads one byte at a time on unixes.
nos
+3  A: 

If you are on a system with 8-bit chars (that is, UCHAR_MAX == 255) then getc() will return a single 8-bit character. The reason it returns an int is so that the EOF value can be distinguished from any of the possible character values. This is almost any system you are likely to come across today.

The reason that fgetc() is apparently working for multibyte characters for you is because the bytes making up the multibyte character are being read in seperately, written out seperately and then interpreted as a multibyte character by your console. If you change your printf to:

printf("%c ", somechar);

(that is, put a space after each character) then you should see multibyte characters broken up into their constituent bytes, which will probably look quite weird).

caf
+1  A: 

If you really want to know how they work, check the source for glibc.

For starters, getc() from libio/getc.c will call _IO_getc_unlocked(), which is defined in libio/libio.h and will call __uflow() from libio/genops.c on underflow.

Tracking the call chain can get a bit tedious, but you asked for it ;)

Christoph