views:

66

answers:

4

I have cricket player profiles saved in the form of .xml files in a folder. each file has these tags in it

 <playerid>547</playerid>
 <majorteam>England</majorteam>
 <playername>Don</playername>

the playerid is same as in .xml (each file is of different size,1kb to 5kb). These are about 500 files. What i need is to extract the playername, majorteam, and playerid from all these files to a list. I will convert that list to XML later. If you know how can i do it directly to XML i will be very thankful.

If there is way to do it with c# or windows batch files or vbscript, i can use Java also. I just need get my data (id and name) at one place.

+1  A: 

Why don't you just do cat *.xml > all.xml?

nico
can i extract specific tags with this command
LifeH2O
@LieH2O: this will just concatenate all the files.You can than use the language of your choice to parse the XML. I'm sure there will be libraries for all the major languages.For instance in PHP you just need one call to `simplexml_load_file` to get an array with all the values in it.
nico
this was surely the most simple way to do that. i just concatenated all files, and now going to parse that with c# xpath.
LifeH2O
After that i simply parsed and printed that file acording to my need.
LifeH2O
+1  A: 

Use xsd.exe to generate a schema and class from your XML file.

Open a Visual Studio 2008 Command Prompt.
From the Visual Studio 2008 Command Prompt, run

c:\temp> xsd.exe player.xml

This generates an XML Schema based on your XML file.

Next, from the Visual Studio 2008 Command Prompt, run

c:\temp> xsd.exe player.xsd /classes /language:CS

This creates a new class based on your schema.

Now write code to deserialise the XML file using the class you generated; you can place this code in a loop for more than file.

FileStream fs = new FileStream("Player.XML", FileMode.Open);
// Create an XmlSerializer object to perform the deserialization
XmlSerializer xs = new XmlSerializer(typeof(Player));

Player p = xs.Deserialize(fs) as Player;
if ( s != null )
{
    // process player here          
}
Zamboni
This is very helpfull, i am using this way to load and save whole xml file to database.
LifeH2O
A: 

If I had to do this task, I'd probably do it in Perl. The previous suggestion to concatenate (cat) all the files isn't really correct, since what you'll end up with will not be a valid XML file, but rather a bunch of valid XML files back to back.

Perl has a library called CPAN which contains all sorts of things for getting tasks done. If you install the XPath Library, it should be pretty easy to search for nodes you want and output them in a list.

If XPath is too burdensome, you might also want to look into regular expressions, colloquially known as regexes. Perl has amazing regex support.

If I had to use Java, I'd probably use its support for regular expressions. If I wanted to really get nitty-gritty with the XML nodes of the documents, I'd likely use Sun's Streaming API for XML (StAX).

jasonmp85
i dont know how to use perl
LifeH2O
There is something better than SAX. If you use VTD-XML, you can use getContentFragment() to get the offsets and lengths of the child elements under root. Then you can concatnate those fragments directly into a file...
vtd-xml-author
I was suggesting StAX, not SAX. StAX is streaming-based and lets you look for the nextElement, nextAttribute, etc. SAX is event-driven and fires off events when it encounters new nodes, etc. DOM is tree based. StAX is somewhere in the middle. You manipulate a cursor and pull new information from the file.
jasonmp85
A: 

Pick your scripting tongue of choice. Mine's Python.

In that language, this is about what you're looking for:

import xml.dom.minidom
import glob
from xml.parsers.expat import ExpatError

base_doc = xml.dom.minidom.parseString('<players/>')
doc_element = base_doc.documentElement

for filename in glob.glob("*.xml"):
    f = open( filename )
    x = f.read()
    f.close()
    try:
        player = xml.dom.minidom.parseString(x)
    except ExpatError:
        print "ERROR READING FILE %s" % filename
        continue
    print "Read file %s" % filename
    doc_element.childNodes.insert(-1, player.documentElement.cloneNode(True))

f = open( "all_my_players.xml", "w" )
f.write(doc_element.toxml())
f.close()
Dan Menes
And if you don't have Python, go get it. ActiveState's distribution is pretty all-inclusive, easy to set up, and a free download. The above script is for Python 2.x, not Python 3.x
Dan Menes