views:

1077

answers:

5

Given a moderately complex XML structure (dozens of elements, hundreds of attributes) with no XSD and a desire to create an object model, what's an elegant way to avoid writing boilerplate from_xml() and to_xml() methods?

For instance, given:

<Foo bar="1"><Bat baz="blah"/></Foo>

How do I avoid writing endless sequences of:

class Foo
  attr_reader :bar, :bat

  def from_xml(el)
     @bar = el.attributes['bar']
     @bat = Bat.new()
     @bat.from_xml(XPath.first(el, "./bat")
  end
 etc...

I don't mind creating the object structure explicitly; it's the serialization that I'm just sure can be taken care of with some higher-level programming...


I am not trying to save a line or two per class (by moving from_xml behavior into initializer or class method, etc.). I am looking for the "meta" solution that duplicates my mental process:

"I know that every element is going to become a class name. I know that every XML attribute is going to be a field name. I know that the code to assign is just @#{attribute_name} = el.[#{attribute_name}] and then recurse into sub-elements. And reverse on to_xml."


I agree with suggestion that a "builder" class plus XmlSimple seems the right path. XML -> Hash -> ? -> Object model (and Profit!)


Update 2008-09-18 AM: Excellent suggestions from @Roman, @fatgeekuk, and @ScottKoon seem to have broken the problem open. I downloaded HPricot source to see how it solved the problem; key methods are clearly instance_variable_set and class_eval . irb work is very encouraging, am now moving towards implementation .... Very excited

A: 

Could you define a method missing that allows you to do:

@bar = el.bar? That would get rid of some boilerplate. If Bat is always going to be defined that way, you could push the XPath into the initialize method,

class Bar
  def initialize(el)
    self.from_xml(XPath.first(el, "./bat"))
  end
end

Hpricot or REXML might help too.

James Deville
+1  A: 

You could use Builder instead of creating your to_xml method, and you could use XMLSimple to pull your xml file into a Hash instead of using the from _xml method. Unfortunately, I'm not sure you'll really gain all that much from using these techniques.

Dan
A: 

Could you try parsing the XML with hpricot and using the output to build a plain old Ruby object? [DISCLAIMER]I haven't tried this.

ScottKoon
A: 

I would subclass attr_accessor to build your to_xml and from_xml for you.

Something like this (note, this is not fully functional, only an outline)

class XmlFoo
  def self.attr_accessor attributes = {}
    # need to add code here to maintain a list of the fields for the subclass, to be used in to_xml and from_xml
    attributes.each do |name, value|
      super name
    end
  end

  def to_xml options={}
    # need to use the hash of elements, and determine how to handle them by whether they are .kind_of?(XmlFoo)
  end

  def from_xml el
  end
end

you could then use it like....

class Second < XmlFoo
  attr_accessor :first_attr => String, :second_attr => Float
end

class First < XmlFoo
  attr_accessor :normal_attribute => String, :sub_element => Second
end

Hope this gives a general idea.

fatgeekuk
+1  A: 

I suggest using a XmlSimple for a start. After you run the XmlSimple#xml_in on your input file, you get a hash. Then you can recurse into it (obj.instance_variables) and turn all internal hashes (element.is_a?(Hash)) to the objects of the same name, for example:

obj.instance_variables.find {|v| obj.send(v.gsub(/^@/,'').to_sym).is_a?(Hash)}.each do |h|
  klass= eval(h.sub(/^@(.)/) { $1.upcase })

Perhaps a cleaner way can be found to do this. Afterwards, if you want to make an xml from this new object, you'll probably need to change the XmlSimple#xml_out to accept another option, which distinguishes your object from the usual hash it's used to receive as an argument, and then you'll have to write your version of XmlSimple#value_to_xml method, so it'll call the accessor method instead of trying to access a hash structure. Another option, is having all your classes support the [] operator by returning the wanted instance variable.

Roman