views:

354

answers:

3

Hello everyone,

I'm busy with programming a class that creates an index out of a text-file ASCII/BINARY. My problem is that I don't really know how to start. I already had some tries but none really worked well for me. I do NOT need to find the address of the file via the MFT. Just loading the file and finding stuff much faster by searching for the key in the index-file and going in the text-file to the address it shows.

The index-file should be build up as follows:
KEY ADDRESS
1 0xABCDEF
2 0xFEDCBA
. .
. .

We have a text-file with the following example value:
1, 8752 FW, ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++, **************************************************************************, ------------------------------------------------------------------------------;

I hope that this explains my question a bit better. Thanks!

+1  A: 

Your code snippet isn't so much of an idea as it is the functionality you wish to have in the end.

Recognize that "indexing" merely means "remembering" where things are located. You can accomplish this using any data structure you wish... B-Tree, Red/Black tree, BST, or more andvanced structures like suffix trees/suffix arrays.

I recommend you look into such data structures.

edit:

with the new information, I would suggest making your own key/value lookup. Build an array of keys, and associate their values somehow. this may mean building a class or struct that contains both the key and the value, or instead contains the key and a pointer to a struct or class with a value, etc.

Once you have done this, sort the key array. Now, you have the ability to do a binary search on the keys to find the appropriate value for a given key.

You could build a hash table in a similar manner. you could build a BST or similar structure like i mentioned earlier.

San Jacinto
(Working on the same assignment). It doesn't have to be so complicated. The only thing that needs to be stored is the key + the offset of the record in another file.
Ikke
Ja right... it's a bit to complicated for our assignment. ;)But indeed it's interesting! Maybe I'll use it later in some programs.
shevron
A: 

I still don't really understand the question (work on your question asking skillz), but as far as I can tell the algorithm will be:

  1. scan the file linearly, the first value up to the first comma (',') is a key, probably. All other keys occur wherever a ';' occurs, up to the next ',' (you might need to skip linebreaks here). If it's a homework assignment, just use scanf() or something to read the key.
  2. print out the key and byte position you found it at to your index file

AFAIUI that's the algorithm, I don't really see the problem here?

wds
+1  A: 

It seems to me that all your class needs to do is store an array of pointers or file start offsets to the key locations in the file.

It really depends on what your Key locations represent.

I would suggest that you access the file through your class using some public methods. You can then more easily tie in Key locations with the data written.

For example, your Key locations may be where each new data block written into the file starts from. e.g. first block 1000 bytes, key location 0; second block 2500 bytes, key location 1000; third block 550 bytes; key location 3500; the next block will be 4050 all assuming that 0 is the first byte.

Store the Key values in a variable length array and then you can easily retrieve the starting point for a data block.

If your Key point is signified by some key character then you can use the same class, but with a slight change to store where the Key value is stored. The simplest way is to step through the data until the key character is located, counting the number of characters checked as you go. The count is then used to produce your key location.

ChrisBD