views:

156

answers:

3

If you can help with this you're a genius.

Basically, I will have some text like this:

<parent wealthy>
   <parent>
      <children female>
        <child>
          jessica
          <hobbies>
            basketball, soccer, video games
          </hobbies>
        </child>
        <child>
          jane
          <hobbies>
            cooking, shopping, boys
          </hobbies>
        </child>         
      </children female>
      <children male>
       <child>
         josh
         <hobbies>
           tennis, swimming
         </hobbies>
       </child>
      </children male>
    </parent>
   </parent wealthy>
   <parent poor>
     <parent>
       <children male>
         <child>
          ---
          <hobbies>...</hobbies>
         </child>
       </children male>
     </parent>
   </parent poor>

So in all, I will have a parent-child hierarchy like this:

- parent wealthy/ parent poor /parent something else
  -- parent
     -- children male/ children female / children something else
        -- child
         -- (name of the child is given without any tags around it)
         -- hobbies

I'm wondering how I can possibly parse all this info out and have them stored in a php array/object/variable while maintaining the order in which they appear? For example, if <parent wealthy> appears above <parent poor> I would like to keep them in the same order, and the same thing goes if <children male> appear before <children female>.

This would be almost perfectly valid XML and I could use SimpleXML to parse it, however the problem is that the name of the child doesn't appear between any tags and the client wants to keep it this way for user friendliness. for example:

    <child>
      jane
      <hobbies>
        cooking, shopping, boys
      </hobbies>
    </child>      

Here 'jane' appears outside any tags, and the <hobbies> appear between some tags.

How can this be parsed? Please give some advice. If you suggest using regexps, please give the regexps that can be used for your answer to be accepted, as I don't know regexps.

Thanks.

Edit: The main problem is that the client wants to mix normal text with text in tags. For example:

text text test <hobbies>...<hobbies>. text text text <age>30</age>

How can that be parsed?

+2  A: 

When using markup like this:

<child>
  jane
   <hobbies>
    cooking, shopping, boys
   </hobbies>
 </child>     

jane will be in the nodeValue attribute of the child element when parsed with SimpleXML.

Just remember to trim() the value, as it's likely to contain white space because of the following tag(s).

Pekka
yep but i can't do that, that's the problem. the client wants it to be user friendly for people to type this in.
Click Upvote
Then I don't understand what your question is. Can you make an example how you need it to look like?
Pekka
Maybe you misunderstood? I rephrased my answer.
Pekka
Ah ok, I understand now, but what if there was more text below the last `</hobbies>' tag? e.g `<child>jane<hobbie>..</hobbies>some extra text</child>`?
Click Upvote
Tricky. In that case, you would have to find a way to access the "full" text value of a node - *including XML element children* - and parse out manually everything before the first `<`. I don't know such a function in SimpleXML - maybe it's worth to ask an extra question. You can *maybe* detect a element in between because there *could* be some whitespace after `jane` - you'd have to try out what SimpleXML does with those.
Pekka
I just did a test, using `<stuff><parents><parent>test<hobbies>...</hobbies></parent></parents></stuff>`. When I parsed it using SimpleXML and did I print_r() on the XML object returned, I don't see the childNode mentioned anywhere. Can you elaborate on how it can be accessed?
Click Upvote
Sorry, I think I mixed up two XML libraries. Can you try accessing the element directly? `echo (string)$element;` that returns `jane` for me. I also get a line break between `jane` and anything I enter after `hobbies`.
Pekka
A: 

I saw your reply on one of the answers as ... the client wants it to be user friendly for people to type this. An XML structure is one of the unfriendliest means of entering information. Actually is pretty much masochistic, rather use yaml yaml and parse it with spyc

yannis
But the problem with YAML remains the same. The client wants to mix normal text with tags. For example: <child>jenny<hobbies>..</hobbies>some more text<new_tag>..</new_tag></child>. That's the problem. Can YAML help with that, if so, how?
Click Upvote
@Click Upvote: Why would you want a client editing raw XML? "You're entering a world of pain." Give them some decent UI, for crying out loud.
Wim Hollebrandse
Sure, giving a UI would be the best option! I thought there was a holy reason to keep it in a text file! Second best yaml:)
yannis
Well, you can still have a UI with a primary storage mechanism and an export or even sync function to the sh1tty format.
Wim Hollebrandse
+1  A: 

I feel people are trying to answer the question from a technical point of view, but the issue here is process.

Why oh why? Your client is insisting on entering data like that? That is completely ridiculous. You will have a nightmare even validating it. Let alone parsing it properly.

Tell him/her you roll a decent user interface for them, choose your own storage mechanism and it will alleviate all the problems/issues and incorrect formatting that users will have by entering it like that. It is madness.

Another completely different thing to note is that it seems that children come from one parent. I wasn't aware homo sapiens was autogamous.

Wim Hollebrandse