ansaurus

Question

Understanding the `ctags -e` file format (ctags for emacs)

Answer 1

+3 A:

It's the number of bytes of tag data following the newline after the number.

Edit: It also doesn't include the ^L character between file tag data. Remember etags comes from a time long ago where reading a 500KB file was an expensive operation. ;)

Here's a complete tags file. I'm showing it two ways, the first with control characters as ^X and no invisible characters. The end-of-line characters implicit in your example are ^J here:

^L^J
hello.cc,45^J
int main(^?5,41^J
int foo(^?9,92^J
int bar(^?13,121^J
^L^J
hello.h,15^J
#define X ^?2,1^J

Here's the same file displayed in hex:

0000000    0c  0a  68  65  6c  6c  6f  2e  63  63  2c  34  35  0a  69  6e
          ff  nl   h   e   l   l   o   .   c   c   ,   4   5  nl   i   n
0000020    74  20  6d  61  69  6e  28  7f  35  2c  34  31  0a  69  6e  74
           t  sp   m   a   i   n   ( del   5   ,   4   1  nl   i   n   t
0000040    20  66  6f  6f  28  7f  39  2c  39  32  0a  69  6e  74  20  62
          sp   f   o   o   ( del   9   ,   9   2  nl   i   n   t  sp   b
0000060    61  72  28  7f  31  33  2c  31  32  31  0a  0c  0a  68  65  6c
           a   r   ( del   1   3   ,   1   2   1  nl  ff  nl   h   e   l
0000100    6c  6f  2e  68  2c  31  35  0a  23  64  65  66  69  6e  65  20
           l   o   .   h   ,   1   5  nl   #   d   e   f   i   n   e  sp
0000120    58  20  7f  32  2c  31  0a                                    
           X  sp del   2   ,   1  nl

There are two sets of tag data in this example: 45 bytes of data for hello.cc and 15 bytes for hello.h.

The hello.cc data starts on the line following "hello.cc,45^J" and runs for 45 bytes--this also happens to be complete lines. The reason why bytes are given is so code reading the file can just allocate room for a 45 byte string and read 45 bytes. The "^L^J" line is after the 45 bytes of tag data. You use this as a marker that there are more files remaining and also to verify that the file is properly formatted.

The hello.h data starts on the line following "hello.h,15^J" and runs for 15 bytes.

Ken Fox 2010-01-02 05:42:13

Thanks, but now I have another question, but I'll put it in the main post.

AlexCombas 2010-01-02 07:13:05

Thanks a lot for the help, that makes sense now but what is the {bytes_offset}? I've updated the edit to the main post.

AlexCombas 2010-01-02 21:47:15

Answer 2

+1 A:

The {byte_offset} for a tag entry is the number of bytes from the start of the file the function is defined in. The number before the byte offset is the line number. In your example:

hello.c,79^J
float foo (float x) {^?foo^A3,20^J

the foo function begins 20 bytes from the start of hello.c. You can verify that with a text editor that shows your cursor position in the file. You can also use the Unix tail command to display a file a number of bytes in:

tail -c +20 hello.c

Ken Fox 2010-01-02 22:43:34

Thanks again Ken!

AlexCombas 2010-01-02 22:59:04

ansaurus

tags:

views:

answers:

Understanding the `ctags -e` file format (ctags for emacs)

related questions