views:

100

answers:

4

Every day we get a flat text file. Some days there are lines in the file that need to be deleted before it can be processed. These lines can appear in different places, but always start with the characters 6999 or 7999. We would like to run a script that will delete these particular lines. However, and this is way beyond me, any where there is a line that starts 6999 there will be a line immediately before it that starts 5442 that also needs to be deleted, but only if it appears immediately before the 6999 line.

We are a Windows shop and would run this script as part of a simple batch file in Windows. We do not use Unix or Linux nor desire to.

The file name extension reflects the date. today's file is file.100621, tomorrow's will be file.100622. I am having trouble with this aspect, as it seems vbscript does not like file.*

Here is a sample of the text file:

4006006602    03334060000100580                                                 
40060066039    0334070000100580                                                 
700600000011571006210060001255863                                               
544264287250111000025000000000040008000801                                      
6999001000000000000000000000000000000000000000000000000000                      
6999001000000000000000000000000000000000000000000000000000                      
6999001000000000000000000000000000000000000000000000000000                      
799900000011571006210030000000000                                               
8007000000115710062102530054008920  

we'd like to remove 5 lines in this file (the 5442 line, the three 6999 lines, and the 7999 line).

here is a sample of the script that I found on this site, have modified and had some success, but don't know the way to delete the lines (only know how to replace data in the line). I realize this will either need major modifications or need to be thrown out altogether, but I post this to provide an idea of what i think we are looking for. I put this in a directory with the cscript.exe and call it from a simple batch file:

Set objFS = CreateObject("Scripting.FileSystemObject")
strFile = "c:\temp\file.100621"
Set objFile = objFS.OpenTextFile(strFile)
Do Until objFile.AtEndOfStream
    strLine = objFile.ReadLine
    If InStr(strLine,"6999")> 0 Then
        strLine = Replace(strLine,"6999","delete line")
    End If 
    WScript.Echo strLine
Loop

which gets me this:

40060066039    0334070000100580                                                 
700600000011571006210060001255863                                               
544264287250111000025000000000040008000801                                      
delete line001000000000000000000000000000000000000000000000000000                      
delete line001000000000000000000000000000000000000000000000000000                      
delete line001000000000000000000000000000000000000000000000000000                      
799900000011571006210030000000000                                               
8007000000115710062102530054008920  

Close! just need to delete lines instead of write "delete line". So here are my specific needs based on what I know:

  1. get the script to process any file in the directory (and there will only ever be 1 at a time, but the extension changes every day)
  2. get the script to delete any line that starts with a 5442 that is immediately before a line that starts 6999
  3. get the script to totally those lines that start with 6999 and 7999

Thanks in advance for any and all help!

+1  A: 

I think this would work (but I'm not that good at VBS so no promises):

Set objFS = CreateObject("Scripting.FileSystemObject")
strFile = "c:\temp\file.100621"
Set objFile = objFS.OpenTextFile(strFile)
Dim cachedLine
Do Until objFile.AtEndOfStream
    strLine = objFile.ReadLine

    If Len(cachedLine) > 0 And InStr(strLine,"6999") = 1 Then
         WScript.Echo cachedLine        
    End If
    cachedLine = ""

    If InStr(strLine,"5442") = 1 Then
        cachedLine = strLine
    Else
        If InStr(strLine,"6999") = 1 Or InStr(strLine,"7999") = 1 Then
            ' do nothing
        Else
            WScript.Echo strLine        
        End If
    End If     
Loop

Note that I think you were checking if the lines contained the numbers anywhere but you said that the rule was if they started with the numbers, that's why I do <> 1 rather than > 0.

ho1
@ho1, I think you're really close here, but don't you want `= 1` instead of `<> 1` in all but the first case?
Bryan Ash
@Bryan Ash: Sounds right (though I think it should be in all cases)
ho1
@ho1: the first case says "if we have cached a line (because it began with "5442") and we're looking at a line that begins with "6999" then keep the cached line". We should only be keeping the cached "5442" line if we're NOT looking at a "6999"
Bryan Ash
@Bryan: I agree, but isn't that what the code does?
ho1
when i run this code i get a script error in Line 21 Char 1 Error is "loop without do".
Jack
@Jack: Missing `End If` at the end, added that in now so please check if it works (I'm not sure so I'm curious :))
ho1
@ho1: if you look below, the code I need is a little more involved, as the file name changes every day.
Jack
@ho1: but thanks very much for your help.
Jack
A: 

I made some changes to try to eliminate the blank line, I also added a function to loop through the output file and remove any blank lines. Hope this one works.

Select Case Wscript.Arguments.Count
    case 1:
        strInput = GetFile(WScript.Arguments(0))
        RemoveUnwantedLines strInput, strInput
        RemoveBlankLines strInput
    case 2:
        strInput = GetFile(WScript.Arguments(0))
        strOutput = Wscript.Arguments(1)
        RemoveUnwantedLines strInput, strOutput
        RemoveBlankLines strOutput
End Select

Function GetFile(strDirectory)
    Set objFSO = CreateObject("Scripting.FileSystemObject")
    Set objFolder = objFSO.GetFolder(strDirectory)
    dateLastModified = Null
    strFile = ""
    For Each objFile in objFolder.Files
        If IsNull(dateLastModified) Then
            dateLastModified = objFile.DateLastModified
            strFile = objFile.Path
        ElseIf dateLastModified < objFile.DateLastModified Then
            dateLastModified = objFile.DateLastModified
            strFile = objFile.Path
        End If
    Next
    GetFile = strFile
End Function

Sub RemoveUnwantedLines(strInputFile, strOutputFile)
        'Open the file for reading.
    Set objFile = CreateObject("Scripting.FileSystemObject").OpenTextFile(strInputFile,1)
        'Read the entire file into memory.
    strFileText = objFile.ReadAll
        'Close the file.
    objFile.Close
        'Split the file at the new line character. *Use the Line Feed character (Char(10))
    arrFileText = Split(strFileText,Chr(10))
        'Open the file for writing.
    Set objFile = CreateObject("Scripting.FileSystemObject").OpenTextFile(strOutputFile,2,true)
        'Loop through the array of lines looking for lines to keep.
    For i = LBound(arrFileText) to UBound(arrFileText)
            'If the line is not blank process it.
        If arrFileText(i) <> "" Then
                'If the line starts "5442", see if the next line is "6999".
            If Left(arrFileText(i),4) = "5442" Then
                    'Make sure the next line exists (Don't want an out of bounds exception).
                If i + 1 <= UBound(arrFileText)Then
                        'If the next line is not "6999" 
                    If Left(arrFileText(i + 1), 4) <> "6999" Then
                            'Write the "5442" line to the file.
                        objFile.WriteLine(arrFileText(i))
                    End If
                Else
                        'If the next line does not exist, write the "5442" line to the file (without a new line).
                    objFile.WriteLine(arrFileText(i))
                End If              
                'If the line does not start with "6999" and the line does not start with "7999".
            Elseif Left(arrFileText(i),4) <> "6999"  AND Left(arrFileText(i),4) <> "7999" Then
                    'Write the line to the file.
                objFile.WriteLine(arrFileText(i))
            End If
        End If
    Next
        'Close the file.
    objFile.Close
    Set objFile = Nothing
End Sub

Sub RemoveBlankLines(strInputFile)
    Set objFile = CreateObject("Scripting.FileSystemObject").OpenTextFile(strInputFile,1)
        'Read the entire file into memory.
    strFileText = objFile.ReadAll
        'Close the file.
    objFile.Close
        'Split the file at the new line character.
    arrFileText = Split(strFileText,VbNewLine)
    Set objFile = CreateObject("Scripting.FileSystemObject").OpenTextFile(strInputFile,2,true)
        'Loop through the array of lines looking for lines to keep.
    For i = LBound(arrFileText) to UBound(arrFileText)
            'If the line is not blank.
        if arrFileText(i) <> "" Then
                'If there is another element.
            if i + 1 <= UBound(arrFileText) Then    
                    'If the next element is not blank.
                if arrFileText(i + 1) <> "" Then
                        'Write the line to the file.
                    objFile.WriteLine(arrFileText(i))
                Else
                        'Write the line to the file (Without a blank line).
                    objFile.Write(arrFileText(i))
                End If
            Else
                    'Write the line to the file (Without a blank line).
                objFile.Write(arrFileText(i))
            End If
        End If
    Next
    'Close the file.
    objFile.Close
    Set objFile = Nothing
End Sub 

To use it call it from the command line in one of two ways.

RemoveUnwantedLines "C:\TestDirectory\" "C:\Output.txt"

or

RemoveUnwantedLines "C:\TestDirectory\"
Tester101
Tester 101 - I created a text file, inserted your code into it, and saved it as removeunwantedlines.vbs. I then run the following from a command line in the directory: removeunwantedlines "c:\temp\file.100621" "c:\newfile.txt". Nothing happens. No error message.
Jack
and no change to the file and no newfile.txt created.
Jack
@Jack: the code is not setup to be run from the command line, it is only the function that should delete the lines from the text file. I could modify it to run on the command line if you need, or you could add the relevant code to handle command line parameters.
Tester101
Tester101 - first off, thanks very much for taking the time to help me out on this one. I'm not a programmer and don't pretend to be one. I am a long time IT Manager who is trying to make this thing work because I'm short a DBA/programmer. What I'd like to do is hve this run in a batch file, so I can set it up as a Scheduled Task on a Windows 2003 server (I realize this is rudimentary, but it's also a work around until I can hire said DBA). Can you show me the way to make this work in a batch file? Thanks.
Jack
oh, also, the file will have the same name but a different extension every successive day - is there any way to do wildcard in the code for the file name?
Jack
@Jack: Updated original post to include code to run from the command line.
Tester101
@Jack: Did you say that the file will be the only file with that name in the directory? or are there a bunch of files with the same name but different extensions?
Tester101
there will only be one file in the directory each day. it's extension will change daily. today it is file.100623 tomorrow it will be file.100624 .
Jack
If it is the only file in the directory, getting it will be easy. The problem comes when you have a bunch of files in the directory, and you have to determine which one to use. (When you do hire your DBA, think about changing the files name rather than its extension).
Tester101
@Jack: just updated my post with a script that will hopefully do what you need.
Tester101
@Jack: I just added a new GetFile function that will get the most recently modified file in the directory. Just copy the script above and replace the GetFile function with the most recent one.
Tester101
OK, Tester, here's where I'm at: sometimes it removes the lines, sometimes it does not, but it always saves the file. When it does not remove the lines, it adds a blank line to the bottom of the file. if I go in and remove any blank lines (there is usually one) at the end so that the file ends with a character and not a CR, the script works. Not sure why this is.
Jack
I have 4 files I am testing with: the 16th works every time and it has a blank line at the end. no other file ever works, each has a blank line at the end, and a CR is added at the end of each. If I remove the blank line at the end of the text files, each one works. so, maybe writing the script to remove the blank line at the end will solve this.
Jack
BTW Tester, the script is all good so far. Hopefully this blank line issue can be resolved. Thanks very much for all your help so far.
Jack
@Jack: I posted an updated script that will hopefully take care of the blank line problem, try it out and let me know how it works.
Tester101
Tester: same behavior. Doesn't delete the 6999 or 7999 lines if there is a blank line at the end of the file (which there always is), except for this one particular file. Any chance it would be easier for you if I could give you a copy of the files in question? they are small - 10 - 50K.
Jack
I think I misunderstood. I thought the script was creating extra lines. Your saying the files you are trying to process have an extra blank line?
Tester101
The file has a blank line at the end. However, it seems that your script was creating a blank line in addition to the existing line because when I went into the text file after running the script there was more than one blank line. If I remove all blank lines at the end (and at the end is the only place they appear) then your script works.
Jack
@Jack: I updated the post again. this time blank lines should all be removed.
Tester101
@Tester: sorry, still not removing the blank line. if I remove the line manually, the script does its job. However, I should clarify: by blank line I mean that the last line of text must have a carriage return, because I can put my cursor onto the first character position immediately below the last line of text. If I do this, then hit the backspace key, my cursor moves up to the last position of the last line of text, and your script removes the 6999 and 7999 lines.
Jack
@Jack: Your running the script exactly as last posted? It should not matter that there are blank lines in the input file, after the script removes the 6999, 7999 and 5442 lines, it should go back through and remove all blank lines in the output file.
Tester101
@Tester: Yes, I am doing a copy and paste of the entire script. I have three files that the script has no effect on (except to save the file with the current timestamp) EXCEPT if I do the following: if I open the text file with WordPad and save it and close it then run the script the script operates as designed.
Jack
Really weird. The save that your script does has no effect, but if I open and hit save without making any change, the script then does work.
Jack
@Jack: Are you specifying a separate output file, or are you modifying the original file in place? Also make sure you are not opening the file before the script has completed.
Tester101
I am modifying the original file in place. I am not using a different output file. So, I modified my batch file to put an output file in a different dir, and the script creates the file, but it still has the 6999 and 7999 lines in it.I am definitely not opening the file before the script has completed - it takes like .75 seconds to complete.
Jack
Tester, any chance I can post these files someplace and you can grab them to see the results for yourself?
Jack
@Jack: Not sure where to post them, but if you do let me know and I will take a look at them.
Tester101
Fixed by Tester101
Jack
A: 

This would be my pseudo algoritme for solving this issue:

(I will rather teach you my thoughts of how I would solve it, than provide the code itself)

  1. Make the file used as a parametre (so it can be flexible) or make a "spooler" folder which this program checks for new content when run, like an "Inbox" for mail. Then you also need an "Outbox". This way you can process files as they come, not knowing what they are named and move them to the "Outbox" when processed.

  2. Make a simple "config" file for this program too. Each line could represent the "filter" and later you could add actions to the lines too, if needed.

    7999 delete

    6999 delete

    5442 delete

    as in [pattern] [action]

  3. Now after reading the config into an array of "keys", then check the "Inbox" for files. For each file, process it with the key-set.

  4. Processing file "XXXXXXXXX.log" (or whatever name) Load all the lines, if there arent too many or readline to grab a single (depending on performance and memory usage)

  5. For each line, take the first 4 letters from the string...

Now we will need a line to parse:

sLine = left(readline(input filestream), 4) 

as we only need the first 4 chars to decide if we need to keep it.

If this "sLine" (string) is in our array of filter/patterns, then we have a match match... do what action we have configured (In your current setup - delete = ignore line).

6a. If ignore, then go on to next line in text file, goto #7

6b. If no match in pattern array, then we have a line to keep. Write this into the OUTPUT stream.

  1. If more lines, NEXT (goto #5)

  2. Close input and output file.

  3. Delete/move input file from Inbox (perhaps to backup?)

  4. If more files in directory [inbox] then parse next... go to #4

This isnt just pure VBSCRIPT but ann algorithm idea for any language...

I hope you can see my idea in it, else you just comment it and I will try to elaborate on it. Hope I have made you a great answer.

BerggreenDK
A: 

OK, here is the final script as awesomely assembled by Tester101. This script removes lines that are not needed as outlined above. It also deals with the line feeds that are at the end of every line (unbeknown to me)

Select Case Wscript.Arguments.Count case 1: strInput = GetFile(WScript.Arguments(0)) RemoveUnwantedLines strInput, strInput RemoveBlankLines strInput case 2: strInput = GetFile(WScript.Arguments(0)) strOutput = Wscript.Arguments(1) RemoveUnwantedLines strInput, strOutput RemoveBlankLines strOutput End Select

Function GetFile(strDirectory) Set objFSO = CreateObject("Scripting.FileSystemObject") Set objFolder = objFSO.GetFolder(strDirectory) dateLastModified = Null strFile = "" For Each objFile in objFolder.Files If IsNull(dateLastModified) Then dateLastModified = objFile.DateLastModified strFile = objFile.Path ElseIf dateLastModified < objFile.DateLastModified Then dateLastModified = objFile.DateLastModified strFile = objFile.Path End If Next GetFile = strFile End Function

Sub RemoveUnwantedLines(strInputFile, strOutputFile) 'Open the file for reading. Set objFile = CreateObject("Scripting.FileSystemObject").OpenTextFile(strInputFile,1) 'Read the entire file into memory. strFileText = objFile.ReadAll 'Close the file. objFile.Close 'Split the file at the new line character. *Use the Line Feed character (Char(10)) arrFileText = Split(strFileText,Chr(10)) 'Open the file for writing. Set objFile = CreateObject("Scripting.FileSystemObject").OpenTextFile(strOutputFile,2,true) 'Loop through the array of lines looking for lines to keep. For i = LBound(arrFileText) to UBound(arrFileText) 'If the line is not blank process it. If arrFileText(i) <> "" Then 'If the line starts "5442", see if the next line is "6999". If Left(arrFileText(i),4) = "5442" Then 'Make sure the next line exists (Don't want an out of bounds exception). If i + 1 <= UBound(arrFileText)Then 'If the next line is not "6999" If Left(arrFileText(i + 1), 4) <> "6999" Then 'Write the "5442" line to the file. objFile.WriteLine(arrFileText(i)) End If Else 'If the next line does not exist, write the "5442" line to the file (without a new line). objFile.WriteLine(arrFileText(i)) End If
'If the line does not start with "6999" and the line does not start with "7999". Elseif Left(arrFileText(i),4) <> "6999" AND Left(arrFileText(i),4) <> "7999" Then 'Write the line to the file. objFile.WriteLine(arrFileText(i)) End If End If Next 'Close the file. objFile.Close Set objFile = Nothing End Sub

Sub RemoveBlankLines(strInputFile) Set objFile = CreateObject("Scripting.FileSystemObject").OpenTextFile(strInputFile,1) 'Read the entire file into memory. strFileText = objFile.ReadAll 'Close the file. objFile.Close 'Split the file at the new line character. arrFileText = Split(strFileText,VbNewLine) Set objFile = CreateObject("Scripting.FileSystemObject").OpenTextFile(strInputFile,2,true) 'Loop through the array of lines looking for lines to keep. For i = LBound(arrFileText) to UBound(arrFileText) 'If the line is not blank. if arrFileText(i) <> "" Then 'If there is another element. if i + 1 <= UBound(arrFileText) Then
'If the next element is not blank. if arrFileText(i + 1) <> "" Then 'Write the line to the file. objFile.WriteLine(arrFileText(i)) Else 'Write the line to the file (Without a blank line). objFile.Write(arrFileText(i)) End If Else 'Write the line to the file (Without a blank line). objFile.Write(arrFileText(i)) End If End If Next 'Close the file. objFile.Close Set objFile = Nothing End Sub

Jack