tags:

views:

76

answers:

2

At the present time I'm doing it like:

GtkTextBuffer *buf = gtk_text_view_get_buffer(...);
gtk_text_buffer_get_bounds(buf, &start, &end);
gchar *data = gtk_text_buffer_get_text(buf, &start, &end, true);
gint size = strlen(data); // ouch

But this is rather ugly. I found (and tested) gtk_text_iter_get_offset() but it returns the size in characters, not physical bytes.

A: 

There's no corresponding gtk_text_buffer_get_byte_count() or gtk_text_iter_get_index() function, unfortunately. If you need an absolute upper bound on the number of bytes required to store the buffer text, you could take the value from gtk_text_buffer_get_char_count() and multiply it by 4, the maximum number of bytes required to encode one UTF-8 character. If it's allocating and deallocating a string holding the full text of the buffer you're worried about, you could do the following:

glong bytecount = 0;
GtkTextIter iter;
for(gtk_text_buffer_get_start_iter(buf, &iter); gtk_text_iter_forward_line(&iter); )
    bytecount += gtk_text_iter_get_bytes_in_line(&iter);

I don't claim that this isn't ugly.

ptomato
+2  A: 

Since GTK+ stores all text in UTF-8 by definition, I think your solution to get a pointer to the characters and use a plain old strlen() is awesome.

UTF-8 guarantees that the byte with value 0 does not occur, so strlen() will perform the proper counting operation and return the length of the buffer in bytes. Plus, it's a classic C runtime function that is well-known and very probably as highly optimized as possible.

unwind
'\0' _is not_ invalid UTF-8: http://www.mail-archive.com/[email protected]/msg08985.htmlSome GTK+ API yet provides the length parameter to allow embedding NUL bytes.
ntd
@ntd: Interesting. But is it possible to type (or otherwise cause) a NUL-byte to appear in a GtkTextBuffer?
unwind
@unwind: I don't know, NUL handling is still an open issue. I suspect embedding a NUL byte, although valid UTF8, will break a lot of code anyway. My comment was more academic than practical.
ntd