views:

175

answers:

5

For a given set of text files, I need to find every "\" character and replace it with "\\". This is a Windows system, and my scripting language options are Javascript, VBScript, or Perl.

These files are largish (~10MB a piece), and there are a good number of them (~15,000). I've already come up with the following Javascript:

function EscapeSlashes(inFilePath)
{
    var readOnly = 1;
    var fso  = WScript.CreateObject("Scripting.FileSystemObject");
    var outFile = fso.CreateTextFile(inFilePath + "escaped.js", true);
    var inFile = fso.OpenTextFile(inFilePath, readOnly);

    var currChar;
    while(!inFile.AtEndOfStream)
    {
        currChar = inFile.Read(1);

        //check for single backslash
        if(currChar != "\\")
        {
            outFile.Write(currChar);
        }
        else
        {
            //write out a double backslash
            outFile.Write("\\\\");
        }
    }

    outFile.Close();
    inFile.Close();
}

I'm worried that the above might be a bit slow. Is there any way to improve the algorithm? Since I'm replacing one character with two, I don't think this can be done in-place.

Is there any performance advantage to reading line by line, rather than character by character?

Do Perl or VBScript have any advantages over Javascript in this case?

+3  A: 

You can't do it in place, but generally it's a good idea to read data in chunks rather than reading a single value at a time. Read a chunk, and then iterate through it. Read another chunk, etc - until the "chunk" is of length 0, or however the call to Read indicates the end of the stream. (On most platforms the call to Read can indicate that rather than you having to call a separate AtEndOfStream function.)

Also, I wouldn't be surprised if Perl could do this in a single line. Or use sed if you can :)

Jon Skeet
A: 

Like Jon said, Perl could be the good choice.
If you can, use cygwin (which I think has the tools needed for such a thing).

shahkalpesh
+3  A: 

I'd suggest reading and writing bigger chunks (be it lines or a large number of bytes). This should cut down on the IO you need to do and allow you to run faster. However your files may be too large to easily manipulate in memory all together. Play with read/write sizes and see what's fastest for you.

C. Ross
+3  A: 
perl -spi.og -e 's/\\/\\\\/gm' infile

Will leave you infile rewritten and infile.og as your backup.

Beau Simensen
+2  A: 

This is the kind of task that Perl is built for and it would almost certainly be faster, but only if you are already familiar with the language. That being said, you can easily tweak your JavaScript code by reading in a bigger buffer and doing your replacement with a regex. Have a look at the String.replace method.

jiggy