views:

73

answers:

1

Hello,

I need to implement a search engine. So I have a dictionary which is a hash table and it consists words. Also I have some texts, I need to go over all the texts and put into the posting file the text number and the place of each word in the texts.

So each time I have an occurrence of some word and that word already exists in the posting file I need to add another occurrence of that word, meaning to update that line where the word is in the posting file. But because the posting file looks something like that:

word1: 1(2,4,5) 4(66,42,21)
word2: 1(3,66) 6(12,19)

I cant write something new in line 1 because that will affect line 2 as I understand.

So the question is how can I do it? Can I maybe somehow instead of just writing strings into the file, write some data structure? like a hash table? so for each word there will be a hash table in the posting file and if I will see that the word already exists in the posting file I will read its hashtable, update it and rewrite it into the file.

Or is there something better?

Thanks in advance,

Greg

A: 

Have you thought about using XML to do this? A simple structure like:

<searchkeys>
   <key name="word1">
      <text id="1">2,4,5</text>
      <text id="4">66,42,21</text>
   </key>
   <key name="word2">
      <text id="1">3,66</text>
      <text id="6">12,19</text>
   </key>
</searchkeys>

You can use the XmlDocument, XmlReader, XmlWriter, etc classes to manipulate the files and get fancier from there.

If this is going to contain a lot of data you might consider using a DB for doing this (Access, MS SQL (Express, or Standard), SqlLite, MySql etc).

GrayWizardx
So you suggest instead of writing simple text into the file, write it as an xml document?If I will want to add some new data to word1, how will I be able to do it with xml document?
You can do it two ways easily (there are more I am sure). One load the whole file into memory, update (and keep persistent for your application) and then write it out again. Or two use the XpathNodeNavigator to find the node you are interested in and update it, then call save on the Xml to save it again. There should be examples of this readily available, as it is a common usage scenario.
GrayWizardx