views:

137

answers:

3

Hi,

I have a txt file with some data that looks like this:

a:1(2,3) 55(33,45,67)
b:2(1,33,456) 4(123,12444)

which means that word "a" appear in text 1 in places 2 and 3 and in text 55 in places 33,45 and 67..

I have some texts and I go all over those texts and if I see that the word "a" appears in a text then I need to update the text file above accordingly. (the data about "a" can be bigger than a line of course)

How can I update the line where "a" is without damaging the line where "b" is? I saw here in stackoverflow that I can use maybe xml file. If I use xml file can I read the whole "a" data, update it and then write it again without damaging the "b" line? Or maybe each data about a word can be in some data structure that I can somehow read from the file, update it and then write it to the same position?

Thanks in advance,

Greg

A: 

If you change this text file to XML, you can easily manipulate the file using LINQ to XML.

Take a look here. Specifically, the manipulation section.

Jim Schubert
how wold linq to xml help with simple string manipulation?
Hellfrost
As I understood from what I read in that link you can just simply create an xml document and then you can just navigate to the XElement whose contents you want to replace, and then use the ReplaceNodes() method.
@hellfrost, if this were simple string manipulation, it wouldn't help. But, Greg wants to maintain a file of records that tracks the locations of words. If there are a lot of words and a lot of locations, manually processing a file line by line is a lot of unnecessary processing. Using XML will allow for incremental access of the records.@Greg. Correct. But, you can also find an XML element (the word) and insert/update/delete nodes which reference that word in a text.
Jim Schubert
A: 

using: string.Insert

but i would use a 2D array int Places[text][place].

your question has nothing to do with XML though you can use XML for this, a simple data structure, would be simpler....

edit:

ok use this:

 dictionary<string,dictionary<int,list<int>>>  .

use a dictionery the key would be your word the value would be another dictionery with the text numbers and the locations... you can serlize all of this to file by the way....

Hellfrost
But I dont know the size of the array because I have to go all over the texts and add the occurrences of the words to the text file, but I dont know how many occurrences there will be.Plus how can I navigate exactly to the place where the word that I need to update exists in the text file if I will hold some data structure for each word?
+1  A: 

You might encode the same information in XML with

<words>
  <word name="a">
    <text id="1">
      <place id="2" />
      <place id="3" />
    </text>
    ...
  </word>
  ...
</words>

As you can see, XML has the same issue as your text format: updates require rewrites, so it won't be as much help as you're hoping.

Given the sample in your question, I assume you're not indexing a huge corpus. If so, perform updates in the following steps:

  1. Read the current index file into the data structure you're using for the index.
  2. Update your data structure to reflect the contents of the newly added texts in the corpus.
  3. Write the contents of the entire updated index to a new temporary file.
  4. On success, rename the temporary file to the master index file from step 1.

If you do have a very large corpus, the text-based index is inappropriate. Consider using a real database.

Greg Bacon
+1 This is exactly what I described. What would you consider a "very large corpus"?? The data could also easily be loaded from this structure into a DataSet and processed in memory and then re-saved after every "transaction". I've worked with DataSets of 6,000+ records and only found them to become slow around 20,000 records (of about 10 megabytes). In that case, paging is required. This may be a limitation on my LINQ to XML answer.
Jim Schubert