views:

97

answers:

4

Say I write in a file

Mesh: 1
    Vertices: 345
    Indices: 123
    V: 1,3,4 1,4,5 ..
Mesh: 2
    Vertices: 456
    Indices: 42
etc.

How do I go about seeking at any position? E.g. I want to go to Vertices: of Mesh 2 or V: of Mesh 3 etc.

What's the proper way to go about these things?

+3  A: 

You would normally use a binary format. One way would be to allocate a certain amount of space as a header in the file. Here, you put the mesh numbers, vertex and index counts, and an offset into the file where the vertex data begins. You read the header when loading the file, then seek to the appropriate place to read the data you want.

sje397
This. There is no way to seek directly to a point if you don't know where to find that point, and C only knows about byte offsets, not the structure of your data. If you want to preserve the text format, and you actually need to seek within the file rather than simply loading it into memory and working on it there, then you'll have to parse the file and calculate the offsets first.
Porculus
In general, textual is preferred. See, for e.g. <a href="http://www.faqs.org/docs/artu/ch05s01.html">The importance of being textual</a> and this problem seems to be quite suitable for plain text format. So, why binary format?
ArunSaha
@ArunSaha - from your link: "The only good justification for a binary protocol is if you're going to be manipulating large enough data sets...Formats for large images and multimedia are sometimes an example..." I don't agree with much of what's at that link myself - XML is textual, intrinsically supports a type of backward compatibility, and yet is not at all healthy for humans.
sje397
@sje397: Good point. XML is a peculiar beast that manages to combine the disadvantages of a text format with the disadvantages of binary formats. Sometimes I wonder why the hell anyone uses it at all...
slacker
A: 

Open the file for reading, and read a line until End-of-File (EOF) is reache. For each read line, check if the line matches with your query. If match, report and return. Otherwise, move on to the next line.

The main cognitive work is checking for matches. Have a well defined format and easily parse-able format for the lines to make your job easy.

ArunSaha
+1  A: 

There is no efficient way of random seeking in text file formats. This is because you cannot know the right offset in the file without reading all the contents before. The only way of processing these is sequential - from beginning to end.

So read and parse the entire file into some data structure in memory. Then use this structure instead of the file as needed.

If the file is too large to keep everything in memory (these days it's highly improbable), read through the file without storing everything in memory - instead store just file offsets to the beginning of each Mesh in an array. Then you can easily seek to the right place.

slacker
I don't think a multi-terabyte database is at all improbable, these days. Mercy on the poor fool who is faced with processing a plain text data set in that format...
TokenMacGuy
@TokenMacGuy: Well, anyone who has to dump/restore a multi-terabyte SQL database. Not every DB but many dump in a text format by default.
Zan Lynx
Ah.. yes. true. fortunately those dumps aren't online databases. But even in the hundreds of megabytes, they're not that fun to play with.
TokenMacGuy
A: 

As other answers have pointed out, C can only seek to byte offsets within files.

However, if your "Mesh" objects are always stored in the file in numerical order, then you do not have to read the entire file sequentially to find the Mesh that you are after. You can instead perform a binary search on the file - whenever you seek to a position in the file, scan ahead to find the next Mesh.

caf