views:

142

answers:

4

This is more of an algorithmic dilemma than a language-specific problem, but since I'm currently using Ruby I'll tag this as such. I've already spent over 20 hours on this and I would've never believed it if someone told me writing a LaTeX parser was a walk in the park in comparison.

I have a loop to read hierarchies (that are prefixed with \m) from different files

art.tex: \m{Art}
graphical.tex: \m{Art}{Graphical}
me.tex: \m{About}{Me}
music.tex: \m{Art}{Music}
notes.tex: \m{Art}{Music}{Sheet Music}
site.tex: \m{About}{Site}
something.tex: \m{Something}
whatever.tex: \m{Something}{That}{Does Not}{Matter}

and I need to sort them alphabetically and print them out as a tree

About
    Me (me.tex)
    Site (site.tex)
Art (art.tex)
    Graphical (graphical.tex)
    Music (music.tex)
        Sheet Music (notes.tex)
Something (something.tex)
    That
        Does Not
            Matter (whatever.tex)

in (X)HTML

<ul>
<li>About</li>
<ul>
<li><a href="me.tex">Me</a></li>
<li><a href="site.tex">Site</a></li>
</ul>
<li><a href="art.tex">Art</a></li>
<ul>
<li><a href="graphical.tex">Graphical</a></li>
<li><a href="music.tex">Music</a></li>
<ul>
<li><a href="notes.tex">Sheet Music</a></li>
</ul>
</ul>
<li><a href="something.tex">Something</a></li>
<ul>
<li>That</li>
<ul>
<li>Doesn't</li>
<ul>
<li><a href="whatever.tex">Matter</a></li>
</ul>
</ul>
</ul>
</ul>

using Ruby without Rails, which means that at least Array.sort and Dir.glob are available.

All of my attempts were formed like this (as this part should work just fine).

def fss_brace_array(ss_input)#a concise version of another function; converts {1}{2}...{n} into an array [1, 2, ..., n] or returns an empty array
    ss_output = ss_input[1].scan(%r{\{(.*?)\}})
rescue
    ss_output = []
ensure
    return ss_output
end
#define tree
s_handle = File.join(:content.to_s, "*")
Dir.glob("#{s_handle}.tex").each do |s_handle|
    File.open(s_handle, "r") do |f_handle|
        while s_line = f_handle.gets
            if s_all = s_line.match(%r{\\m\{(\{.*?\})+\}})
                s_all = s_all.to_a
                #do something with tree, fss_brace_array(s_all) and s_handle
                break
            end
        end
    end
end
#do something else with tree
A: 

I am not familiar with Ruby, but this is how could you do it in most languages:

  1. Make an empty tree structure.
  2. For each line:
  3. For each {} element after \m:
  4. If this element can be found in a tree at the same level, do nothing.
  5. Otherwise create a tree node that is a child of the previous element or a root if it is the first one
  6. At the end of the line, attach the foo.tex part the the leafmost node.

I do not know how this translates to Ruby, specifically, how tree structure is represented there.

Laurynas Biveinis
Trees can be represented as hashes in hashes in Ruby: tree = {"a" => {"b" => {"c" => nil, "d" => nil}}, "e" => {"f"}}. The only problem is that Array.sort will convert this to an array and also take the file paths into account.
Tuplanolla
A: 

There was a question very similar to this one recently, take a look at mine and other answers:

http://stackoverflow.com/questions/2403006/how-to-handle-recursive-parent-child-problems-like-this/2403291#2403291

Mladen Jablanović
+1  A: 

Important: I can't SSH into my linux box from work right now, which means I cannot test this code. Not even the tiniest bit. It could have silly, obvious syntax errors or logic since I wrote it from scratch right in the input box. But it LOOKS right... I think. I'll check it when I get home from work.

SOURCE = <<-INPUT
  art.tex: \m{Art}
  graphical.tex: \m{Art}{Graphical}
  me.tex: \m{About}{Me}
  music.tex: \m{Art}{Music}
  notes.tex: \m{Art}{Music}{Sheet Music}
  site.tex: \m{About}{Site}
  something.tex: \m{Something}
  whatever.tex: \m{Something}{That}{Does Not}{Matter}
INPUT
HREF = '#href'

def insert_leaves(tree,node_list)
  next = node_list[0]
  rest = node_list[1..-1]
  tree[next] ||= {}
  if not rest.empty?
    insert_leaves(tree[next],rest)
  else
    tree[next]
    # recursively, this will fall out to be the final result, making the
    # function return the last (deepest) node inserted.
  end
end

tree = {}

SOURCE.each_line do |line|
  href, folder_string = line.split(': \\m') #=> ['art.tex','{Art}{Graphical}']
  folders = folder_string.scan(/[^{}]+/)     #=> ['Art','Graphical']
  deepest_folder = insert_leaves(tree,folders)
  deepest_folder[HREF] = href
end

# After this insertion, tree looks like this:
#
# {
#   About = {
#     Me = {
#       #href = me.tex
#     }
#     Site = {
#       #href = site.tex
#     }
#   }
#   Art = {
#     Graphical = {
#       #href = graphical.tex
#     }
#     ...
#
# Edge case: No category should be named '#href'.

def recursive_html_construction(branch, html)
  return if branch.keys.reject(HREF).empty? # abort if the only key is
                                            # an href.
  html << '<ul>'
  branch.keys.sort.each do |category|
    next if category == HREF # skip href entries.
    html << '<li>'
    if branch[category].key?(HREF)
      html << "<a href='#{branch[category][HREF]}'> #{category}</a>"
    else
      html << category
    end
    html << '</li>'
    recursive_html_construction(branch[category],html)
  end
  html << '</ul>'
end

html = ""

recursive_html_construction(tree,html)

puts html # => '<ul><li>About</li><ul><li><a href='me.tex'>Me</a></li><li>
          #     <a href='site.tex'>Site</a></li></ul><li>Art</li><ul><li>
          #     <a href='graphical.tex'>Graphical</a></li>...
Myrddin Emrys
+1 that looks like it should work, which I don't think I could do without irb
Jim Schubert
@Jim I did cheat on looking up a few functions online. Of course, I never got around to testing it... forgot when I got home. :-) And of course I'm at work again...
Myrddin Emrys
A: 

Here's a working solution in Python after simply pushing all of the entries into an array.

import operator
import itertools
def overput(input):
    for betweenput, output in itertools.groupby(input, key=operator.itemgetter(0)):
        yield '<li>'
        yield betweenput
        output = [x[1:] for x in output if len(x) > 1]
        if output:
            yield '<ul>'
            for part in overput(output):
                yield part
            yield '</ul>'
        yield '</li>'
Tuplanolla