views:

40

answers:

1

I need to design a windows application that will reside within an organization's intranet. The application will be deployed on a user's machine and the user will be generating output within an XML file that has a predefined schema. This XML will be written out to a networked folder that will be accessible by other users. These files are named userid_output.xml. The "userid" is pulled from the application environment. While using the application a user should have the capability to search all the XMLs generated by the universe of users until that point. The information retrieved will drive the user to shape his/her application input. A very firm requirement is not to use any RDBMS(Oracle/Sql Server/MySql et al) to store the XML. The shared network folder is "THE REPOSITORY" and is only used for storing the XMLs.The machine hosting the shared folder may not run any services that may assist with indexing the XMLs or optimizing the data for search purposes.

Given these limitations, does anybody know of any design techniques/tools/mechanisms to perform fast information retrieval from this "dataset"?

Thanks

+1  A: 

You could use XQuery. The collection() function allows you to query a directory of XML files.

Here's an example using Saxon. (I'm not sure if other implementations would be the same.):

collection("file:///C:/sample_xml?select=*.xml;")

This would select all of the *.xml files in the C:\sample_xml directory.

You could also narrow down the results by using XPath:

collection("file:///file://///srv1/dir1/sample_xml?select=*.xml;")/doc/sample1[@id='someID']

This would return only the sample1 elements that had an attribute id that was equal to someID.

DevNull
@DevNull- Thanks. I have no prior experience using XQuery but in your snippet above are you using collection() to form an in-memory representation of the xml files in the C:\sample_xml directory which is stored on the client's machine? What happens if we have 7000 sample.xml files and are only interested in the value of the tag <sample1 id = "someID"></sample1> where the attribute id is equal to "someId"? How does XQuery help with returning that subset in an optimized manner without imposing a tremendous amount of overhead?
sc_ray
@DevNull - How does XQuery differ from something like Linq2Xml?
sc_ray
@sc_ray - Sorry, I have no experience with Linq2Xml. I will add another example to my answer to show what I would do to narrow down the results.
DevNull
I also used a UNC path in the second example to show how I would access the network directory.
DevNull
@DevNull- Thanks. But is XQuery doing the heavy lifting on the network folder itself or does it "select, transfers, and then processes a massive amounts of data". I was reading something along these lines in the following post http://stackoverflow.com/questions/214060/using-xquery-in-linq-to-sql
sc_ray