views:

1316

answers:

4

I'm using the following groovy code to search a file for a string, an account number. The file I'm reading is about 30MB and contains 80,000-120,000 lines. Is there a more efficient way to find a record in a file that contains the given AcctNum? I'm a novice, so I don't know which area to investigate, the toList() or the for-loop. thanks!

AcctNum = 1234567890

if (testfile.exists())
{
  lines = testfile.readLines()
  words = lines.toList() 
  for (word in words) 
  {
    if (word.contains(AcctNum)) { done = true; match = 'YES' ; break }
    chunks += 1
    if (done) { break }
  }
}
+3  A: 

Sad to say, I don't even have Groovy installed on my current laptop - but I wouldn't expect you to have to call toList() at all. I'd also hope you could express the condition in a closure, but I'll have to refer to Groovy in Action to check...

Having said that, do you really need it split into lines? Could you just read the whole thing using getText() and then just use a single call to contains()?

EDIT: Okay, if you need to find the actual line containing the record, you do need to call readLines() but I don't think you need to call toList() afterwards. You should be able to just use:

for (line in lines) 
{
  if (line.contains(AcctNum)) 
  {
     // Grab the results you need here
     break;
  }
}
Jon Skeet
A: 

I should have explained it better, if I find a record with the AcctNum, I extract out other information on the record...so I thought I needed to split the file into multiple lines.

+1  A: 

When you say efficient you usually have to decide which direction you mean: whether it should run quickly, or use as few resources (memory, ...) as possible. Often both lie on opposite sites and you have to pick a trade-off.

If you want to search memory-friendly I'd suggest reading the file line-by-line instead of reading it at once which I suspect it does (I would be wrong there, but in other languages something like readLines reads the whole file into an array of strings).

If you want it to run quickly I'd suggest, as already mentioned, reading in the whole file at once and looking for the given pattern. Instead of just checking with contains you could use indexOf to get the position and then read the record as needed from that position.

gix
A: 

if you control the format of the file you are reading, the solution is to add in an index.

In fact, this is how databases are able to locate records so quickly.

But for 30MB of data, i think a modern computer with a decent harddrive should do the trick, instead of over complicating the program.

Chii