views:

983

answers:

6

I'm having an issue trying to parse the ascii part of a file, and once I hit the end tag, IMMEDIATELY start reading in the bytes from that point on. Everything I know in Java to read off a line or a whole word creates a buffer, which ruins any chance of getting the bytes immediately following my stop point. Is the only way to do this read in byte-by-byte, find new-lines, reconstruct everything prior to the new-line, see if it's my end tag, and go from there?

A: 

Yup, you're right about the byte-by-byte. Abstraction has its disadvantages.

Crimson
@crimson: AAAAAAARRRRRRRRRRRRGGGGGGGGGGGGGG
hatorade
Java's strong distinction between character and byte streams, while useful for ensuring that you always are dealing with data correctly and distinguishing between strings and encodings thereof, does make this a bit difficult.
Michael E
+2  A: 

It is possible, but as far as I know not with the classes from the API.

You can do it manually - open it as a BufferedInputStream, which supports mark/reset. You read block by block (byte[]) and you parse it as ASCII. Eventually you accumulate it in a buffer until you hit the marker. But before you read you call mark. If you believe you read all you needed in ASCII, you call reset and then you call read to dump the rest of the ASCII part. And now you have a BufferedInputStream (which is an InputStream) ready for reading the binary part of the file.

Marian
wait, how would this work? i don't know how far down the end tag is, so the only data structure i can think of is an arraylist. looking at buffer it seems i need to know how much to allocate it, which i don't. is the best way to deal with this stuff an arraylist?
hatorade
You read 100 bytes. Does it contain the end marker (easy to test, because of the ASCII encoding)? No, then it's part of the string. Remember it somewhere (to parse it as a string). You read next block. Again, it doesn't contain the end marker, you keep track of it. And so on. At one point, you read a block that has the end marker. You cut the fist part (before the marker), you store it for String parsing. You rewind to the beginning of the block, you read/skip bytes till after the marker and you have the right binary input stream.You concatenate the accumulated pieces and use a `Reader`. TBC
Marian
You will need to be careful about the end marker spawning across two consecutive block.You can store the `byte[]`s as `List<byte[]>` before concatenation, to avoid repeated `System.arraycopy`sBTW, that 100 is bad. You should use something like 4096 or 16384.
Marian
+2  A: 

I think the best idea would be to abandon the concept of "lines". To find the end tag, create a ring buffer that's just big enough to contain the end tag, read into it byte-by-byte, and after each byte check if it contains the tag.

There are more sophisticated and efficient search algorithms, but the difference is only relevant with longer search terms (presumably your end tag is short).

Michael Borgwardt
I don't think he can chose the file format.I saw the kind of files he describes. For example, I believe that the Java2SE installation kit for Linux is stored in the same way.
Marian
I'm not saying he has to change the file format, just that he shouldn't try to read it one byte at a time rather than depending on the concept of "lines".
Michael Borgwardt
@michael: is there a standard java class for ring buffer? couldn't find a corresponding java site after googling "ring buffer java"
hatorade
Sorry, I read only the first sentence, I admit :-D
Marian
No, there's no implementation in the standard API. But it's a very simple data structure to implement yourself. Alternatively, you could abuse an ArrayDeque for this purpose by calling removeFirst() for each add() once its length equals the end tag's.
Michael Borgwardt
+1  A: 

How big is this file? My first thought is to read the whole thing into a ByteBuffer or a ByteArrayOutputStream without trying to process it, then locate the tag by comparing byte values. Once you know where the text part ends and the binary part begins, you process each part as appropriate.

Alan Moore
not very big file; i like the simplicity of this. i'll give it a try.
hatorade
actually i really like this now that i read about it. so the plan would be to read the hole thing into a bytebuffer (i know how big the file is in bytes so this buffer will be the right size). then i search the bytebuffer for my end tag, and then i slice the buffer right there. would that work? i imagine searching for my end tag would involve search for the first byte, and if found, check second, third etc to confirm.
hatorade
That's what I was thinking.
Alan Moore
is the most flexible option in terms of processing the bytes to read both sections into separate byte arrays (byte[])? Is there some way to instead of pass a fileinputstream into a filereader that I pass a byte array? one of the byte arrays would be full of ascii encoded text and i would like to buffer it if possible and read out lines (like with BufferedReader or Scanner). Is such a thin possible?
hatorade
ah, i could just pass my byte array into a ByteArrayInputStream, which I could pass into my InputStreamReader to convert bytes to chars, right? And from there to a FileReader and then maybe a BufferedReader?
hatorade
actually, i decided i don't need anything more than a bufferedinputstream to do any of this. i'll find the end tag, read everything else into a byte array, reset the buffer, and read the first part normally into eventually a scanner or bufferedreader to get the ascii out easily.
hatorade
A: 

Is the file growing, or is it static?

If it's static, see http://java.sun.com/javase/6/docs/api/java/nio/MappedByteBuffer.html

mikaelhg
it's static, but i don't see how a mappedbytebuffer really offers me much more than a normal bytebuffer for just reading all the bytes into arrays and such.
hatorade
A: 

I have two text file i want to compare there field ex. accno1 in first file accno2 in athoe if accno1=accno2 then action how can i do this opeartion in java i used IDE

wks
This is not an answer and doesn't have anything to do with the question. You should ask this as a new question, but you need to give more information about the file format. And please pay some attention to forming complete sentences with proper interpunction.
Michael Borgwardt