views:

302

answers:

5

Hello, guys!

Since I've converted to the Church of Emacs, I've been trying to do everything from inside it, and I was wondering how to do some text processing quickly and efficiently with it.

As an example, let's take this list that I was editing some minutes ago on org-mode.

** Diego: b QI
** bruno-gil: b QI
** Koma: jo
** um: rsrs pr0n
** FelipeAugusto: esp
** GustavoPupo: pinto tr etc
** GP: lit gtk
** Alan: jo mil pc
** Jost: b hq jo 1997
** Herbert: b rsrs pr0n
** Andre: maia mil pseudo
** Rodrigo: c
** caue: b rsrs 7arte pseudo
** kenny: cri gif
** daniel: gtk mu pr0n rsrs b
** tony: an 1997 esp
** Vitor: b jo mimimi
** raphael: b rpg 7arte
** Luca: b lit gnu pc prog mmu 7arte 1997
** LZZ: an qt
** William: b an jo pc 1997
** Epic: gtk
** Aldo: b pseudo pol mil fur
** GustavoKyon: an gtk
** CarlosIsaksen : an hq jo 7arte gtk 1997
** Peter: pseudo pol mil est 1997 gtk lit lang
** leandro: b jo cb
** frederico: 7arte lit gtk
** rol: b an pseudo mimimi 7arte
** mathias: jo lit
** henrique: 1997 h gtk qt
** eumané: an qt
** walrus: cri de
** FilipePinheiro: lit pseudo
** Igor: pseudo b
** Erick: b jo rpg q 1997 gtk
** Gabriel: pr0n rsrs qt
** george: clo mimimi
** anão: hq jo 1997 rsrs clô b
** jeff: 7arte gtk
** davidatenas:  an 7arte 1997 esp qt
** HHahaah: b 
** Eduardo: b

It is a list of names associated with tags, and I want to get a list of tags associated with names.

In bash, I would first echo with single quotes the whole thing pasted and then pipe it to awk, looping over each line and adding each its parts to the right temporary variable and then messing with it until it is like I want.

echo '** Diego: b QI
** bruno-gil: b QI
** Koma: jo
** um: rsrs pr0n
** FelipeAugusto: esp
** GustavoPupo: pinto, tr etc
** GP: lit gtk
** Alan: jo mil pc
** Jost: b hq jo 1997
** Herbert: b rsrs pr0n
** Andre: maia mil pseudo
** Rodrigo: c
** caue: b rsrs 7arte pseudo
** kenny: cri gif
** daniel: gtk mu pr0n rsrs b
** tony: an 1997 esp
** Vitor: b jo mimimi
** raphael: b rpg 7arte
** Luca: b lit gnu pc prog mmu 7arte 1997
** LZZ: an qt
** William: b an jo pc 1997
** Epic: gtk
** Aldo: b pseudo pol mil fur
** GustavoKyon: an gtk
** CarlosIsaksen : an hq jo 7arte gtk 1997
** Peter: pseudo pol mil est 1997 gtk lit lang
** leandro: b jo cb
** frederico: 7arte lit gtk
** rol: b an pseudo mimimi 7arte
** mathias: jo lit
** henrique: 1997 h gtk qt
** eumané: an qt
** walrus: cri de
** FilipePinheiro: lit pseudo
** Igor: pseudo b
** Erick: b jo rpg q 1997 gtk
** Gabriel: pr0n rsrs qt
** george: clo mimimi
** anão: hq jo 1997 rsrs clô b
** jeff: 7arte gtk
** davidatenas:  an 7arte 1997 esp qt
** HHahaah: b
** Eduardo: b
' | awk '{sub(":","");for (i=3;i<=NF;i++) members[$i] = members[$i] " " $2}; END{for (j in members) print j ": " members[j]}' | sort

... and TA-DA! The expected output in less than 2 minutes, done in an intuitive and incremental way. Can you show me how to do something like this in elisp, preferably in an emacs buffer, with elegance and simplicity?

Thanks!

+4  A: 

There is a function shell-command-on-region that pretty much does what it says. You can highlight a region, do M-|, type the name of a shell command, and the data is piped to that command. Give it an argument and the region is replaced with the result of the command.

For a trivial example, highlight a region, type 'C-u 0 M-| wc' (control-u, zero, meta-pipe and then 'wc') and the region will be replaced with the number of characters, words and lines of that region.

Another thing you can do is figure out how to manipulate one line, make it a macro, and then run the macro repeatedly. For example, 'C-x ( C-s foo C-g bar C-x )' will search for the word "foo", then type the word "bar", changing it to "foobar". You can then do 'C-u C-x e' once which will continually run the macro until it doesn't find any more occurrences of "foo".

Bryan Oakley
Additionally, modern Emacsen have convenient bindings for keyboard macros. <f3> is bound to `start-macro-of-insert-counter', <f4> is bound to `kmacro-end-or-call-macro' -- this saves typing. Ignoring counter functionality (as always, "C-h k <f3> RET" for full documentation), this lets you hit "<f3> C-s foo C-g bar <f4> <f4> ..." -- the first <f4> ends the macro definition, the second executes it.
ariels
+3  A: 

Ok, here is my first attempt in elisp:

  1. I start a buffer with elisp and paredit modes on, open double quotes and paste the text
  2. I bind it to a symbol using let
(let ((foobar "** Diego: b QI
** bruno-gil: b QI
** Koma: jo
** um: rsrs pr0n
** FelipeAugusto: esp
** GustavoPupo: pinto, tr etc
** GP: lit gtk
** Alan: jo mil pc
** Jost: b hq jo 1997
** Herbert: b rsrs pr0n
** Andre: maia mil pseudo
** Rodrigo: c
** caue: b rsrs 7arte pseudo
** kenny: cri gif
** daniel: gtk mu pr0n rsrs b
** tony: an 1997 esp
** Vitor: b jo mimimi
** raphael: b rpg 7arte
** Luca: b lit gnu pc prog mmu 7arte 1997
** LZZ: an qt
** William: b an jo pc 1997
** Epic: gtk
** Aldo: b pseudo pol mil fur
** GustavoKyon: an gtk
** CarlosIsaksen : an hq jo 7arte gtk 1997
** Peter: pseudo pol mil est 1997 gtk lit lang
** leandro: b jo cb
** frederico: 7arte lit gtk
** rol: b an pseudo mimimi 7arte
** mathias: jo lit
** henrique: 1997 h gtk qt
** eumané: an qt
** walrus: cri de
** FilipePinheiro: lit pseudo
** Igor: pseudo b
** Erick: b jo rpg q 1997 gtk
** Gabriel: pr0n rsrs qt
** george: clo mimimi
** anão: hq jo 1997 rsrs clô b
** jeff: 7arte gtk
** davidatenas:  an 7arte 1997 esp qt
** HHahaah: b 
** Eduardo: b 
"))
  foobar)

Now I change foobar to something fancy.

  1. First I remove the symbols with a regexp and split the text in strings using (split-string)
  2. Then I do a mapcar to turn each line into a list of words
(mapcar #'(lambda (y) (split-string y " " t)) (split-string (replace-regexp-in-string "[:\*]" "" foobar) "\n" t))
  1. Then I create a hashmap and bind it to temphash ((temphash (make-hash-table :test 'equal)))
  2. And then I loop into the nested lists to add the elements to the hash-table. I think I'm not supposed to do non-functional programming with mapcar, but nobody is looking ;)
(mapcar #'(lambda (l)
              (mapcar #'(lambda (m) (puthash m (format "%s %s" (car l) (let ((tempel (gethash m temphash)))
                                                            (if tempel tempel ""))) temphash)) (rest l)))
          (mapcar #'(lambda (y) (split-string y " " t)) (split-string (replace-regexp-in-string "[:\*]" "" foobar) "\n" t))) 
  1. Finally, I extract the elements from the hash table into another set of nested lists with a handy function stolen from Xah Lee's webpage,
  2. And finally I pretty print it to another buffer with M-x pp-eval-last-sexp

It's a little mind-bending, specially the double mapcar, but it sorta works. Here is the full "code":

;; Stolen from Xah Lee's page


(defun hash-to-list (hashtable)
  "Return a list that represent the hashtable."
  (let (mylist)
    (maphash (lambda (kk vv) (setq mylist (cons (list kk vv) mylist))) hashtable)
    mylist
  )
)

;; Code

(let ((foobar "** Diego: b QI
** bruno-gil: b QI
** Koma: jo
** um: rsrs pr0n
** FelipeAugusto: esp
** GustavoPupo: pinto, tr etc
** GP: lit gtk
** Alan: jo mil pc
** Jost: b hq jo 1997
** Herbert: b rsrs pr0n
** Andre: maia mil pseudo
** Rodrigo: c
** caue: b rsrs 7arte pseudo
** kenny: cri gif
** daniel: gtk mu pr0n rsrs b
** tony: an 1997 esp
** Vitor: b jo mimimi
** raphael: b rpg 7arte
** Luca: b lit gnu pc prog mmu 7arte 1997
** LZZ: an qt
** William: b an jo pc 1997
** Epic: gtk
** Aldo: b pseudo pol mil fur
** GustavoKyon: an gtk
** CarlosIsaksen : an hq jo 7arte gtk 1997
** Peter: pseudo pol mil est 1997 gtk lit lang
** leandro: b jo cb
** frederico: 7arte lit gtk
** rol: b an pseudo mimimi 7arte
** mathias: jo lit
** henrique: 1997 h gtk qt
** eumané: an qt
** walrus: cri de
** FilipePinheiro: lit pseudo
** Igor: pseudo b
** Erick: b jo rpg q 1997 gtk
** Gabriel: pr0n rsrs qt
** george: clo mimimi
** anão: hq jo 1997 rsrs clô b
** jeff: 7arte gtk
** davidatenas:  an 7arte 1997 esp qt
** HHahaah: b 
** Eduardo: b 
")
      (temphash  (make-hash-table :test 'equal)))
  (mapcar #'(lambda (l)
              (mapcar #'(lambda (m) (puthash m (format "%s %s" (car l) (let ((tempel (gethash m temphash)))
                                                            (if tempel tempel ""))) temphash)) (rest l)))
          (mapcar #'(lambda (y) (split-string y " " t)) (split-string (replace-regexp-in-string "[:\*]" "" foobar) "\n" t)))
  (hash-to-list temphash)) 

And here is the output:

(("clô" "anão ")
 ("clo" "george ")
 ("q" "Erick ")
 ("de" "walrus ")
 ("h" "henrique ")
 ("cb" "leandro ")
 ("lang" "Peter ")
 ("est" "Peter ")
 ("fur" "Aldo ")
 ("pol" "Peter Aldo ")
 ("qt" "davidatenas Gabriel eumané henrique LZZ ")
 ("mmu" "Luca ")
 ("prog" "Luca ")
 ("gnu" "Luca ")
 ("rpg" "Erick raphael ")
 ("mimimi" "george rol Vitor ")
 ("an" "davidatenas eumané rol CarlosIsaksen GustavoKyon William LZZ tony ")
 ("mu" "daniel ")
 ("gif" "kenny ")
 ("cri" "walrus kenny ")
 ("7arte" "davidatenas jeff rol frederico CarlosIsaksen Luca raphael caue ")
 ("c" "Rodrigo ")
 ("pseudo" "Igor FilipePinheiro rol Peter Aldo caue Andre ")
 ("maia" "Andre ")
 ("1997" "davidatenas anão Erick henrique Peter CarlosIsaksen William Luca tony Jost ")
 ("hq" "anão CarlosIsaksen Jost ")
 ("pc" "William Luca Alan ")
 ("mil" "Peter Aldo Andre Alan ")
 ("gtk" "jeff Erick henrique frederico Peter CarlosIsaksen GustavoKyon Epic daniel GP ")
 ("lit" "FilipePinheiro mathias frederico Peter Luca GP ")
 ("etc" "GustavoPupo ")
 ("tr" "GustavoPupo ")
 ("pinto," "GustavoPupo ")
 ("esp" "davidatenas tony FelipeAugusto ")
 ("pr0n" "Gabriel daniel Herbert um ")
 ("rsrs" "anão Gabriel daniel caue Herbert um ")
 ("jo" "anão Erick mathias leandro CarlosIsaksen William Vitor Jost Alan Koma ")
 ("QI" "bruno-gil Diego ")
 ("b" "Eduardo HHahaah anão Erick Igor rol leandro Aldo William Luca raphael Vitor daniel caue Herbert Jost bruno-gil Diego "))
konr
+5  A: 

The first thing I would do is to take advantage of org-mode's tag support. Instead of

** Diego: b QI

You would have

** Diego                          :b:QI:

Which org-mode recognizes as the tags "b" and "QI".

To transform your current format to the standard org-mode format, you can use the following (assuming the buffer with your source is called "asdf")

(with-current-buffer "asdf"
  (beginning-of-buffer)
  (replace-string " " ":")
  (beginning-of-buffer)
  (replace-string "**:" "** ")
  (beginning-of-buffer)
  (replace-string "::" " :")
  (beginning-of-buffer)
  (replace-string "\n" ":\n")
  (org-set-tags-command t t))

It's not pretty or efficient, but it gets the job done.

After that, you can use the following to produce a buffer that has the format you wanted from the shell script:

(let ((results (get-buffer-create "results"))
      tags)
  (with-current-buffer "asdf"
    (beginning-of-buffer)
    (while (org-on-heading-p)
      (mapc '(lambda (item) (when item (add-to-list 'tags item))) (org-get-local-tags))
      (outline-next-visible-heading 1)))
  (setq tags (sort tags 'string<))
  (with-current-buffer results
    (erase-buffer)
    (mapc '(lambda (item)
             (insert (format "%s: %s\n"
                             item
                             (with-current-buffer "asdf"
                               (org-map-entries '(substring-no-properties (org-get-heading t)) item)))))
          tags)
    (beginning-of-buffer)
    (replace-regexp "[()]" "")))

This puts the results in a buffer called "results", creating it if it doesn't already exist. Basically, it is collecting all the tags in the buffer "asdf", sorting them, then looping through each tag and searching for each headline with that tag in "asdf" and inserting it to "results".

With a bit of cleaning up, this could be made into a function; basically just replacing "asdf" and "results" with arguments. If you need that done, I can do that.

haxney
+1  A: 

The previous alternatives are interesting but I do not believe capture the "how would I do this in Emacs as a recent convert" aspect of the question. I suspect someone learning Emacs with an eye to using Emacs Lisp to do the whole job might start out with something like:

(defun create-tags-to-name (buffer-name)
  "Create a buffer filled with lines containg `** TAG:
LIST-OF-NAMES' by transposing lines in the region matching the
format `** NAME: LIST-OF-TAGS' where the list items are white
space separated."
  (interactive)
  (let ((buf (get-buffer-create buffer-name))
    (tag-to-name-list (list))
    name tags element)
    ;; Clear the destination buffer
    (with-current-buffer buf
      (erase-buffer))
    ;; Build the list of tag to name associations.
    (while (re-search-forward "^** \\([-a-zA-Z0-9 ]+\\):\\(.+\\)$" (point-max) t)
      (setq name (buffer-substring (match-beginning 1) (match-end 1))
        tags (split-string (buffer-substring (match-beginning 2) (match-end 2))))
      ;; For each tag add the name to the tag's name list
      (while tags
    (let ((tag (car tags)))
      (setq element (assoc tag tag-to-name-list)
        tags (cdr tags))
      (if element
          (setcdr element (append (list name) (cdr element)))
        (setq tag-to-name-list (append (list (cons tag (list name))) tag-to-name-list))))))
    ;; Dump the associations to the target buffer
    (with-current-buffer buf
      (while tag-to-name-list
    (setq element (car tag-to-name-list)
          tag-to-name-list (cdr tag-to-name-list))
    (insert (concat "** " (car element) ":"))
    (let ((tag-list (cdr element)))
      (while tag-list
        (insert " " (car tag-list))
        (setq tag-list (cdr tag-list))))
    (insert "\n")))))
pajato0
A: 
konr
This may just be a nitpick, but I find that it helps when learning a new language to learn the kind of code indentation that is idiomatic for that language - the kind of indentation and parenthesization expected with the language helps keep me on in the flow of that language. And also, as someone who uses lisp quite a bit, seeing it paranthasized that way is jarring, like seeing code in all caps in a language other than BASIC or FORTRAN.
Justin Smith
Do you mean the macro or the hash-to-list function? If it's the macro, could you show me how to indent it properly? The function was simply copied from Xah Lee's page
konr