views:

40

answers:

1

I want to convert my XML document to Hash in Ruby/Rails. Actually, the default conversion by Hash.from_xml in Rails works for me except in one case.

I have a list of items contained in <item-list> element, these items can be of different types though. For instance, standard-item and special-item, each of which has different set of child elements.

<item-list>
  <standard-item>
    <foo>...</foo>
    <bar>...</bar>
  </standard-item>
  <standard-item>
    <foo>...</foo>
    <bar>...</bar>
  </special-item>
  <special-item>
    <baz>...</baz>
  </special-item>
</item-list>

This XML structure can be confusing for Hash.from_xml as it does not know that both standard-item and special-item are both items and should be in the same level. Hence, from the above XML, the hash generated by Hash.from_xml will be:

{ 'item-list' => { 'standard-item' => [ { 'foo' => '...', 'bar' => '...' },
                                        { 'foo' => '...', 'bar' => '...' } ],
                   'special-item'  => { 'baz' => '...' } }}

But what I want is to have all items as list members, like this:

{ 'item-list' => [ { 'standard-item' => { 'foo' => '...', 'bar' => '...' } },
                   { 'standard-item' => { 'foo' => '...', 'bar' => '...' } },
                   { 'special-item'  => { 'baz' => '...' } } ]

Is it possible to extend/customize from_xml so that it performs to way I want to for this case? If it is not possible, what is the best way to achieve this? Given that this is the only element that contains something that deviates from general XML-to-Hash conversion, it does not seem right to me to implement the whole conversion routine where it might have already been implemented for a thousand times.

Another small note, Hash.to_xml also replaces all dashes with underscores. Is there a way to prevent this replacement?

A: 

This is correct behavior.

<a>
  <b>one</b>
  <b>two</b>
</a>

Think about how this would be converting to a hash, there cannot be two values assigned to the key 'b'. The only solution is to make the value of 'b' a hash containing an array of 'one' and 'two'.

{a => {
  b => ['one', 'two']
}}

This is simply how Rails represents XMLs. You will need to check for an array value in the hash, and act accordingly.

Karl
I am aware that this is the correct behavior. However, there is one case where the sub-elements can be of different types and I want them to be in the single array. For instance, let's say that the element "two" is <i> instead of <b>, but I still want the both "one" and "two" to be in the same array.
ejel
After looking at it again, it can work. However you'll probably have to do it yourself with Nokogiri or something similar. I think it'd be easier just to write some code to abstract the Hash's structure so that it treats the elements as if they were on the same level.
Karl
The reason I'm doing this is to convert my XML to JSON to be send to a client-side JavaScript. Giving the structure of hash/JSON, I guess the only way to represent the given XML is to use sub-hash?
ejel
If you want to maintain your structure, yes, the JSON format is structured in the same way as the Hash.
Karl