views:

349

answers:

4

I'm aware there are several XML libaries out there, but unfortunately, I am unable to use them for a school project I am working on.

I have a program that created this XML file.

<theKey>
<theValue>23432</theValue>
</theKey>

What I am trying to do is parse out "23432" between the tags. However, there are random tags in the file so may not always on the second line from the top. Also, I don't know how many digits the number is between the tags.

Here is the code I developed so far. It is basic because I don't know what I can use that is part of the C++ language that will parse the value out. My hint, from me working with JAVA, is to use somethign from the "String" library but so far I am coming up short on what I can use.

Can anyone give me direction or a clue on what I can do/use? Thanks a lot.

Here is the code I developed so far:

#include <iostream>
#include <fstream>
#include <string>

using std::cout;
using std::cin;
using std::endl;
using std::fstream;
using std::string;
using std::ifstream;


int main()
{
 ifstream inFile;
 inFile.open("theXML.xml");

 if (!inFile)
 {
 }

 string x;
 while (inFile >> x)
 {
  cout << x << endl;
 }

 inFile.close();

 system ( "PAUSE" );


 return 0;
}
+1  A: 

You will need to create functions to at least:

  • If the node is a container node then
    • Identify/parse elements (beginings and ends) and attributes, if any
    • Parse children recursively
  • Otherwise, extract the value while trimming trailing and leading whitespaces, if any, if they are not significant

The std::string provides quite a few useful member functions such as: find, find_first_of, substr etc. Try to use these in your functions.

dirkgently
A: 

THe C++ Standard library provides no XML parsing features. If you want to write this on your own, I suggest looking at std::geline() to read your data into strings (don't try to use operator>> for this), and then at the std::string class's basic features like the substr() function to chop it up. But be warned that writing your own XML parser, even a basic one, is very far from trivial.

anon
why is it prefered to use std::getline() over << ?
different
The stream operator>> is basically intended for reading space delimited numeric values. You can make it work for other values, but it it is particularly bad at reading strings, which may contain spaces.
anon
I have figured out a solution based on your ideas.Here is my basic algorithm:- read XML file into a string- user an iterator to iterator through the string- find my tag. record location- prase out valueThis may not be the best solution but it works.
different
+4  A: 

To parse arbitrary XML, you really need a proper XML parser. When you include all the character-model nooks and DTD-related crannies of the language, it is not at all simple to parse, and it's a terrible faux pas to write a parser that only understands an arbitrary subset of XML.

In the real world, it would be wrong to use anything but a proper XML parser library to implement this. If you can't use a library and you can't change the program's output format to something more easily-parsed (eg. newline-separated key/value pairs), you're in an untenable position. Any school project that requires you to parse XML without an XML parser is totally misguided.

(Well, unless the whole point of the project is to write an XML parser in C++. But that would be a very cruel assignment.)

bobince
+1  A: 

Here's an outline of what your code should look like (I've left out the tedious parts as an exercise):

std::string whole_file;

// TODO:  read your whole XML file into "whole_file"

std::size_t found = whole_file.find("<theValue>");

// TODO: ensure that the opening tag was actually found ...

std::string aux = whole_file.substr(found);
found = aux.find(">");

// TODO: ensure that the closing angle bracket was actually found ...

aux = aux.substr(found + 1);

std::size_t end_found = aux.find("</theValue>");

// TODO: ensure that the closing tag was actually found ...

std::string num_as_str = aux.substr(0, end_found); // "23432"

int the_num;

// TODO: convert "num_as_str" to int

This is not a proper XML parser of course, just something quick and dirty that solves your problem.

Manuel
Except that it doesn't necessarily solve his problem. It'll produce the wrong value for something like: "<theValue>123</thevalue><theKey><theValue>345</theValue></theKey>".
Jerry Coffin
At least I hope that this will get him started.
Manuel
@Jerry - I've replaced a backslash with a slash in the literal "</theValue>". Is that why you said my code didn't work?
Manuel
No -- at least according to the sample he gave, he only wants a "theValue" that's inside of a theKey, whereas your code appears to look for any theValue anywhere in the file.
Jerry Coffin
Ah, OK, I had missed that. Thanks for pointing it out.
Manuel