tags:

views:

305

answers:

4

I was wondering how people use tags in emacs when working on a large codebase (approx. 50000 cpp|.h|.cs files). Some of my colleagues use indexing tools (names escape me) which return all the results over the codebase in seconds. I can't seem to get anywhere near that sort of performance on emacs and tags, but it's essentially the same thing!

Some approaches I've tried:

  1. Create one TAGS file for the whole repository. This is usually quite large and generally awkward to use (sometimes too many tags match).
  2. Create seperate TAGS files for cpp|h|cs. A bit more focused if I know (roughly (at least the programming language that will have been used)) what I happen to be looking for.
  3. Smaller tags files on a section of the repository. These are great when I'm pretty sure of the area but the management overhead is a pain. I'll usually generate these as and when needed.

Any suggestions \ examples of your workflow is appreciated.

+3  A: 

Here is a script I use:

#!/usr/bin/bash
echo "Creating list of files to build tags..."
find `pwd` -name '*.c' -o -name '*.h' > cscope_files

echo "Building cscope and ctags databases..."
cscope -bqki cscope_files
ctags -eL cscope_files

> cat cscope_files | wc -l
10700

For 10700 files which is comparable to your project, jumps between tags are instantaneous. As you'd notice this builds both cscope and etags databases. I start this script at the top of the tree. Also if this is of any help, these are the keybindings in my .emacs.el file.

(defun hide-cscope-buffer ()
  "Turn off the display of cscope buffer"
   (interactive)
   (if (not cscope-display-cscope-buffer)
       (progn
         (set-variable 'cscope-display-cscope-buffer t)
         (message "Turning ON display of cscope results buffer."))
     (set-variable 'cscope-display-cscope-buffer nil)
     (message "Toggling OFF display of cscope results buffer.")))


(global-set-key [f9] 'cscope-find-this-symbol)
(global-set-key [f10] 'cscope-find-global-definition-no-prompting)
(global-set-key [f11] 'cscope-find-functions-calling-this-function)
(global-set-key [f12] 'cscope-find-this-file)
(global-set-key (kbd "C-t") 'cscope-pop-mark)
(global-set-key (kbd "C-n") 'cscope-next-symbol)
(global-set-key (kbd "C-p") 'cscope-prev-symbol)
(global-set-key (kbd "C-b") 'hide-cscope-buffer)
(global-set-key [S-f7] 'cscope-next-file)
(global-set-key [S-f8] 'cscope-prev-file)
(global-set-key [S-f9] 'cscope-find-this-text-string)
(global-set-key [S-f10] 'cscope-find-global-definition)
(global-set-key [S-f11] 'cscope-find-egrep-pattern)                                                                                                                           
(global-set-key [S-f12] 'cscope-find-files-including-file)

I am not a regular emacs user though. These were used by me when I was trying to switch to emacs, but then reverted to vim, where I found ways to do all the things that I was happy doing in emacs.

Update: For multiple tags files in a directory hierarchy, take a look at the Multiple tags files section of this article.

Sudhanshu
The jumps between tags arent really the problem - their almost instant. It's really when I'm doing a tags search on the whole repo, with a single file the number of matches can be huge. If I go for smaller chunks of the repo, i really need to be sure where I'm looking. I guess it'd be nice to have a treelike tags hierarchy - I visit a high level one and it pulls in tags from any subdirs.
cristobalito
Take a look at the link in the answer above.
Sudhanshu
Excellent - will give that a go!
cristobalito
BTW, cscope performance really improves with an inverted index. "-q" option in the cscope command in the script above forces it to create an inverted index of tags for its tags database. It gives me instantaneous results for a symbol across all the 10700 files as well. You may not need to deal with tag files in multiple directories at all if you use "-q" for cscope. Another note: I use "-k" option to cscope to tell it that these are kernel files. You wouldn't want to use that option for your project if its not an OS kernel.
Sudhanshu
cscope does look very interesting - my only problem with it at the moment is that it doesn't support C#. It seems like it can be extended to support C++/Java, so perhaps C# is also possible.
cristobalito
+2  A: 

I wrote some packages to help with managing multiple tag hits and many TAGS files:

http://www.emacswiki.org/emacs/EtagsSelect

http://www.emacswiki.org/emacs/EtagsTable

scottfrazer
That looks really useful - great stuff. It would certainly address some of the issues I've come across. I'll definitely be giving it a go when I get back to the office.
cristobalito
I've been using etags-select for a while, and it's great (thank you, Scott!) Bind `M-.` to `etags-select-find-tag` instead of `find-tag` and you're immediately better off.
phils
+2  A: 

You might want to have a look at GNU Global as well. It supports C and C++ (along with Yacc, Java, PHP4 and assembly), so it might work acceptably well for C# (I've not written any C#, though, so I might also be talking utter nonsense).

If it works, it ought to be dramatically faster than a regular TAGS file.

phils
+1  A: 

As mentioned, GNU Global is pretty good. Here's a short introduction for using it from within emacs.

Note, if your project is really 50K files (and 50K locs), the first run (to index all that stuff) could be a bit slow.

djcb
Thanks for the comment - I'd actually seen that post ages back (I subscribe to the blog) but had forgotten. Any idea what the quickest way to install GNU Global on Win7 is?
cristobalito
on windows? that might be hard; I remember that gnu global for win32 has become unmaintained a few years ago...maybe another tagging system like ctags (exuberant or not) may be easier to set up, but haven't used those.
djcb