views:

59

answers:

2

BATCH FILE to remove duplicate strings (containing Double Quotes); and keep blank lines

Note: The Final Output must have original strings with Double Quotes and Blank lines.

I have been working on this for a long time and I can not fine a solution, thanks in advance for your assistance. When I get the remove duplicates working something else doesn't... I know that it looks like I haven't done much work but I have trimmed this down for clarity.


@echo on
REM -- Prepare the Command Processor --
SETLOCAL ENABLEEXTENSIONS
SETLOCAL EnABLEDELAYEDEXPANSION

REM -- Prepare the Prompt for easy debugging -- restore with prompt=$p$g
prompt=$g

rem The finished program will remove duplicates lines

:START
set "_duplicates=TRUE"

set "_infile=copybuffer.txt"
set                        "_oldstr=the"
set                                    "_newstr=and"

call :BATCHSUBSTITUTE %_infile% %_oldstr% %_newstr% 
pause
goto :SHOWINTELL
goto :eof


:BATCHSUBSTITUTE

type nul> %TEMP%.\TEMP.DAT

if "%~2"=="" findstr "^::" "%~f0"&GOTO:EOF
for /f "tokens=1,* delims=]" %%A in ('"type %1|find /n /v """') do (
    set "_line=%%B"
    if defined _line (
        if "%_duplicates%"=="TRUE" (
            set "_unconverted=!_line!"
            set "_converted=!_line:"=""!"
            FIND "!_converted!" %TEMP%.\TEMP.DAT > nul
            if errorlevel==1 (
                >> %TEMP%.\TEMP.DAT echo !_unconverted!
            )
        ) 
    ) ELSE (
        echo(>> %TEMP%.\TEMP.DAT
    )
)
goto :eof


:SHOWINTELL
@echo A|move %TEMP%.\TEMP.DAT doubleFree.txt
start doubleFree.txt
goto :eof

Input: copybuffer.txt

this test 'data' may have a path C:\Users\Documents\30% full.txt 
this test 'data' may have a path C:\Users\Documents\30% full.txt 
this test 'data' may have duplicates 
this test 'data' may have duplicates 


this test 'data' may drive "YOU NUTS" 
this test 'data' may drive "YOU NUTS" 
this test 'data' may drive "YOU NUTS" 
this test 'data' may drive "YOU NUTS" 
this test 'data' may drive "YOU NUTS" 
this test 'data' may drive "YOU NUTS" 
this test 'data' may have Blank Lines 
this test 'data' may have Blank Lines 
this test 'data' may have "Double Quoted text" in the middle of the string 
this test 'data' may have "Double Quoted text" in and middle of and string 
this test 'data' may have "Trouble with the find" command 
this test 'data' may have "Trouble with and find" command 
this test 'data' may drive "YOU NUTS" 
this test 'data' may drive "YOU NUTS"

Actual Output: doubleFree.txt (Note: last two lines are NOT duplicates)

this test 'data' may have a path C:\Users\Documents\30% full.txt 
this test 'data' may have duplicates 


this test 'data' may drive "YOU NUTS" 
this test 'data' may have Blank Lines 
this test 'data' may have "Double Quoted text" in the middle of the string 
this test 'data' may have "Double Quoted text" in and middle of and string 
this test 'data' may have "Trouble with the find" command 
this test 'data' may have "Trouble with and find" command 

The Echo on when run on my Vista computer is:

    ----

C:\Users\foo\Documents\morefoo>REM -- Prepare the Command Processor -- 

C:\Users\foo\Documents\morefoo>SETLOCAL ENABLEEXTENSIONS 

C:\Users\foo\Documents\morefoo>SETLOCAL EnABLEDELAYEDEXPANSION 

C:\Users\foo\Documents\morefoo>REM -- Prepare the Prompt for easy debugging -- restore with prompt=$p$g 

C:\Users\foo\Documents\morefoo>prompt=$g 

>rem The finished program will remove duplicates lines 

>set "_duplicates=TRUE" 

>set "_infile=copybuffer.txt" 

>set                        "_oldstr=the" 

>set                                    "_newstr=and" 

>call :BATCHSUBSTITUTE copybuffer.txt the and  

>type nul 1>C:\Users\foo\AppData\Local\Temp.\TEMP.DAT 

>if "the" == "" findstr "^::" "C:\Users\foo\Documents\morefoo\copybuffer3.bat"  & GOTO:EOF

>for /F "tokens=1,* delims=]" %A in ('"type copybuffer.txt|find /n /v """') do (
set "_line=%B"  
 if defined _line (if "TRUE" == "TRUE" (
set "_unconverted=!_line!"  
 set "_converted=!_line:"=""!"  
 FIND "!_converted!" C:\Users\foo\AppData\Local\Temp.\TEMP.DAT  1>nul  
 if errorlevel 1 (echo !_unconverted! 1>>C:\Users\foo\AppData\Local\Temp.\TEMP.DAT ) 
) )  ELSE (echo( 1>>C:\Users\foo\AppData\Local\Temp.\TEMP.DAT ) 
) 

>(
set "_line=this test 'data' may have a path C:\Users\Documents\30% full.txt "  
 if defined _line (if "TRUE" == "TRUE" (
set "_unconverted=!_line!"  
 set "_converted=!_line:"=""!"  
 FIND "!_converted!" C:\Users\foo\AppData\Local\Temp.\TEMP.DAT  1>nul  
 if errorlevel 1 (echo !_unconverted! 1>>C:\Users\foo\AppData\Local\Temp.\TEMP.DAT ) 
) )  ELSE (echo( 1>>C:\Users\foo\AppData\Local\Temp.\TEMP.DAT ) 
) 

>(
set "_line=this test 'data' may have a path C:\Users\Documents\30% full.txt "  
 if defined _line (if "TRUE" == "TRUE" (
set "_unconverted=!_line!"  
 set "_converted=!_line:"=""!"  
 FIND "!_converted!" C:\Users\foo\AppData\Local\Temp.\TEMP.DAT  1>nul  
 if errorlevel 1 (echo !_unconverted! 1>>C:\Users\foo\AppData\Local\Temp.\TEMP.DAT ) 
) )  ELSE (echo( 1>>C:\Users\foo\AppData\Local\Temp.\TEMP.DAT ) 
) 

>(
set "_line=this test 'data' may have duplicates "  
 if defined _line (if "TRUE" == "TRUE" (
set "_unconverted=!_line!"  
 set "_converted=!_line:"=""!"  
 FIND "!_converted!" C:\Users\foo\AppData\Local\Temp.\TEMP.DAT  1>nul  
 if errorlevel 1 (echo !_unconverted! 1>>C:\Users\foo\AppData\Local\Temp.\TEMP.DAT ) 
) )  ELSE (echo( 1>>C:\Users\foo\AppData\Local\Temp.\TEMP.DAT ) 
) 

>(
set "_line=this test 'data' may have duplicates "  
 if defined _line (if "TRUE" == "TRUE" (
set "_unconverted=!_line!"  
 set "_converted=!_line:"=""!"  
 FIND "!_converted!" C:\Users\foo\AppData\Local\Temp.\TEMP.DAT  1>nul  
 if errorlevel 1 (echo !_unconverted! 1>>C:\Users\foo\AppData\Local\Temp.\TEMP.DAT ) 
) )  ELSE (echo( 1>>C:\Users\foo\AppData\Local\Temp.\TEMP.DAT ) 
) 

>(
set "_line="  
 if defined _line (if "TRUE" == "TRUE" (
set "_unconverted=!_line!"  
 set "_converted=!_line:"=""!"  
 FIND "!_converted!" C:\Users\foo\AppData\Local\Temp.\TEMP.DAT  1>nul  
 if errorlevel 1 (echo !_unconverted! 1>>C:\Users\foo\AppData\Local\Temp.\TEMP.DAT ) 
) )  ELSE (echo( 1>>C:\Users\foo\AppData\Local\Temp.\TEMP.DAT ) 
) 

>(
set "_line="  
 if defined _line (if "TRUE" == "TRUE" (
set "_unconverted=!_line!"  
 set "_converted=!_line:"=""!"  
 FIND "!_converted!" C:\Users\foo\AppData\Local\Temp.\TEMP.DAT  1>nul  
 if errorlevel 1 (echo !_unconverted! 1>>C:\Users\foo\AppData\Local\Temp.\TEMP.DAT ) 
) )  ELSE (echo( 1>>C:\Users\foo\AppData\Local\Temp.\TEMP.DAT ) 
) 

>(
set "_line=this test 'data' may drive "YOU NUTS" "  
 if defined _line (if "TRUE" == "TRUE" (
set "_unconverted=!_line!"  
 set "_converted=!_line:"=""!"  
 FIND "!_converted!" C:\Users\foo\AppData\Local\Temp.\TEMP.DAT  1>nul  
 if errorlevel 1 (echo !_unconverted! 1>>C:\Users\foo\AppData\Local\Temp.\TEMP.DAT ) 
) )  ELSE (echo( 1>>C:\Users\foo\AppData\Local\Temp.\TEMP.DAT ) 
) 

>(
set "_line=this test 'data' may drive "YOU NUTS" "  
 if defined _line (if "TRUE" == "TRUE" (
set "_unconverted=!_line!"  
 set "_converted=!_line:"=""!"  
 FIND "!_converted!" C:\Users\foo\AppData\Local\Temp.\TEMP.DAT  1>nul  
 if errorlevel 1 (echo !_unconverted! 1>>C:\Users\foo\AppData\Local\Temp.\TEMP.DAT ) 
) )  ELSE (echo( 1>>C:\Users\foo\AppData\Local\Temp.\TEMP.DAT ) 
) 

>(
set "_line=this test 'data' may drive "YOU NUTS" "  
 if defined _line (if "TRUE" == "TRUE" (
set "_unconverted=!_line!"  
 set "_converted=!_line:"=""!"  
 FIND "!_converted!" C:\Users\foo\AppData\Local\Temp.\TEMP.DAT  1>nul  
 if errorlevel 1 (echo !_unconverted! 1>>C:\Users\foo\AppData\Local\Temp.\TEMP.DAT ) 
) )  ELSE (echo( 1>>C:\Users\foo\AppData\Local\Temp.\TEMP.DAT ) 
) 

>(
set "_line=this test 'data' may drive "YOU NUTS" "  
 if defined _line (if "TRUE" == "TRUE" (
set "_unconverted=!_line!"  
 set "_converted=!_line:"=""!"  
 FIND "!_converted!" C:\Users\foo\AppData\Local\Temp.\TEMP.DAT  1>nul  
 if errorlevel 1 (echo !_unconverted! 1>>C:\Users\foo\AppData\Local\Temp.\TEMP.DAT ) 
) )  ELSE (echo( 1>>C:\Users\foo\AppData\Local\Temp.\TEMP.DAT ) 
) 

>(
set "_line=this test 'data' may drive "YOU NUTS" "  
 if defined _line (if "TRUE" == "TRUE" (
set "_unconverted=!_line!"  
 set "_converted=!_line:"=""!"  
 FIND "!_converted!" C:\Users\foo\AppData\Local\Temp.\TEMP.DAT  1>nul  
 if errorlevel 1 (echo !_unconverted! 1>>C:\Users\foo\AppData\Local\Temp.\TEMP.DAT ) 
) )  ELSE (echo( 1>>C:\Users\foo\AppData\Local\Temp.\TEMP.DAT ) 
) 

>(
set "_line=this test 'data' may drive "YOU NUTS" "  
 if defined _line (if "TRUE" == "TRUE" (
set "_unconverted=!_line!"  
 set "_converted=!_line:"=""!"  
 FIND "!_converted!" C:\Users\foo\AppData\Local\Temp.\TEMP.DAT  1>nul  
 if errorlevel 1 (echo !_unconverted! 1>>C:\Users\foo\AppData\Local\Temp.\TEMP.DAT ) 
) )  ELSE (echo( 1>>C:\Users\foo\AppData\Local\Temp.\TEMP.DAT ) 
) 

>(
set "_line=this test 'data' may have Blank Lines "  
 if defined _line (if "TRUE" == "TRUE" (
set "_unconverted=!_line!"  
 set "_converted=!_line:"=""!"  
 FIND "!_converted!" C:\Users\foo\AppData\Local\Temp.\TEMP.DAT  1>nul  
 if errorlevel 1 (echo !_unconverted! 1>>C:\Users\foo\AppData\Local\Temp.\TEMP.DAT ) 
) )  ELSE (echo( 1>>C:\Users\foo\AppData\Local\Temp.\TEMP.DAT ) 
) 

>(
set "_line=this test 'data' may have Blank Lines "  
 if defined _line (if "TRUE" == "TRUE" (
set "_unconverted=!_line!"  
 set "_converted=!_line:"=""!"  
 FIND "!_converted!" C:\Users\foo\AppData\Local\Temp.\TEMP.DAT  1>nul  
 if errorlevel 1 (echo !_unconverted! 1>>C:\Users\foo\AppData\Local\Temp.\TEMP.DAT ) 
) )  ELSE (echo( 1>>C:\Users\foo\AppData\Local\Temp.\TEMP.DAT ) 
) 

>(
set "_line=this test 'data' may have "Double Quoted text" in the middle of the string "  
 if defined _line (if "TRUE" == "TRUE" (
set "_unconverted=!_line!"  
 set "_converted=!_line:"=""!"  
 FIND "!_converted!" C:\Users\foo\AppData\Local\Temp.\TEMP.DAT  1>nul  
 if errorlevel 1 (echo !_unconverted! 1>>C:\Users\foo\AppData\Local\Temp.\TEMP.DAT ) 
) )  ELSE (echo( 1>>C:\Users\foo\AppData\Local\Temp.\TEMP.DAT ) 
) 

>(
set "_line=this test 'data' may have "Double Quoted text" in and middle of and string "  
 if defined _line (if "TRUE" == "TRUE" (
set "_unconverted=!_line!"  
 set "_converted=!_line:"=""!"  
 FIND "!_converted!" C:\Users\foo\AppData\Local\Temp.\TEMP.DAT  1>nul  
 if errorlevel 1 (echo !_unconverted! 1>>C:\Users\foo\AppData\Local\Temp.\TEMP.DAT ) 
) )  ELSE (echo( 1>>C:\Users\foo\AppData\Local\Temp.\TEMP.DAT ) 
) 

>(
set "_line=this test 'data' may have "Trouble with the find" command "  
 if defined _line (if "TRUE" == "TRUE" (
set "_unconverted=!_line!"  
 set "_converted=!_line:"=""!"  
 FIND "!_converted!" C:\Users\foo\AppData\Local\Temp.\TEMP.DAT  1>nul  
 if errorlevel 1 (echo !_unconverted! 1>>C:\Users\foo\AppData\Local\Temp.\TEMP.DAT ) 
) )  ELSE (echo( 1>>C:\Users\foo\AppData\Local\Temp.\TEMP.DAT ) 
) 

>(
set "_line=this test 'data' may have "Trouble with and find" command "  
 if defined _line (if "TRUE" == "TRUE" (
set "_unconverted=!_line!"  
 set "_converted=!_line:"=""!"  
 FIND "!_converted!" C:\Users\foo\AppData\Local\Temp.\TEMP.DAT  1>nul  
 if errorlevel 1 (echo !_unconverted! 1>>C:\Users\foo\AppData\Local\Temp.\TEMP.DAT ) 
) )  ELSE (echo( 1>>C:\Users\foo\AppData\Local\Temp.\TEMP.DAT ) 
) 

>(
set "_line=this test 'data' may drive "YOU NUTS" "  
 if defined _line (if "TRUE" == "TRUE" (
set "_unconverted=!_line!"  
 set "_converted=!_line:"=""!"  
 FIND "!_converted!" C:\Users\foo\AppData\Local\Temp.\TEMP.DAT  1>nul  
 if errorlevel 1 (echo !_unconverted! 1>>C:\Users\foo\AppData\Local\Temp.\TEMP.DAT ) 
) )  ELSE (echo( 1>>C:\Users\foo\AppData\Local\Temp.\TEMP.DAT ) 
) 

>(
set "_line=this test 'data' may drive "YOU NUTS""  
 if defined _line (if "TRUE" == "TRUE" (
set "_unconverted=!_line!"  
 set "_converted=!_line:"=""!"  
 FIND "!_converted!" C:\Users\foo\AppData\Local\Temp.\TEMP.DAT  1>nul  
 if errorlevel 1 (echo !_unconverted! 1>>C:\Users\foo\AppData\Local\Temp.\TEMP.DAT ) 
) )  ELSE (echo( 1>>C:\Users\foo\AppData\Local\Temp.\TEMP.DAT ) 
) 

>goto :eof 

>pause
Press any key to continue . . . 

>goto :SHOWINTELL 
        1 file(s) moved.

>start doubleFree.txt 

>goto :eof 


    ----
A: 

Hi Edoctoor,

the main problem seems to be the expansion of special characters, like quotation marks.

You could avoid this by using the delayed expansion, so special characters are ignored. (Not perfect here, but nearly, there are only problems with exclamation marks and carets)

The next problem is to search strings with quotation marks with the find command. You have to double them.

@echo off
REM -- Prepare the Command Processor --
SETLOCAL ENABLEEXTENSIONS
SETLOCAL EnABLEDELAYEDEXPANSION

REM -- Prepare the Prompt for easy debugging -- restore with prompt=$p$g
prompt=$g

rem The finished program will remove duplicates lines

:START
set "_duplicates=TRUE"

set "_infile=copybuffer.txt"
set                        "_oldstr=the"
set                                    "_newstr=and"

call :BATCHSUBSTITUTE %_infile% %_oldstr% %_newstr% 
pause
goto :SHOWINTELL
goto :eof


:BATCHSUBSTITUTE

type nul> %TEMP%.\TEMP.DAT
type nul> %TEMP%.\TEMP2.DAT

if "%~2"=="" findstr "^::" "%~f0"&GOTO:EOF
for /f "tokens=1,* delims=]" %%A in ('"type %1|find /n /v """') do (
    set "_line=%%B"
    if defined _line (
        if "%_duplicates%"=="TRUE" (
            set "_unconverted=!_line!"
            set "_converted=!_line:"=""!"

            FIND "!_converted!" %TEMP%.\TEMP.DAT > nul
            if errorlevel==1 (
                >> %TEMP%.\TEMP.DAT echo !_unconverted!
            )
            call set "_converted=%%_line:"=#%%"
        ) 
    ) ELSE (
        echo(>> %TEMP%.\TEMP.DAT
    )
)
goto :eof


:SHOWINTELL
@echo A|move %TEMP%.\TEMP.DAT doubleFree.txt
start doubleFree.txt
goto :eof
jeb
echo(>> %TEMP%.\TEMP.DATQuestion: Why did you use the ( in the echo?I have not see that before, was it intentional or a typo?
Edoctoor
echo. fails if a file named "echo" exists, the other variants of echo: / ... have the same problem, or the problem with displaying the help ex. echo=/?, and "echo." will do a search on disk, "echo(" not
jeb
On my system(Vista) it works with your copybuffer.txt, try to use @echo on, and show us the broken point
jeb
Works for me too,,, I am going to check it as solved; however, I will comment here if I have an issues when I plug this into the full code.Special thanks for your time, I can not tell you have much that this little batch file has given me; yet, it was in the full code that replaced the "the" and the "and" and I didn't notice that my source data had changed... Good Eye... Thank you, Thank you, THANK YOU!
Edoctoor
A: 

Use a good tool for file processing. If you have the luxury to download stuff, you can try gawk for windows .

C:\test> gawk "!a[$0]++ && $0~/\042|\047/|| !NF" file
this test 'data' may have a path C:\Users\Documents\30% full.txt
this test 'data' may have duplicates


this test 'data' may drive "YOU NUTS"
this test 'data' may have Blank Lines
this test 'data' may have "Double Quoted text" in the middle of the string
this test 'data' may have "Double Quoted text" in and middle of and string
this test 'data' may have "Trouble with the find" command
this test 'data' may have "Trouble with and find" command
this test 'data' may drive "YOU NUTS"

If not, native languages like vbscript is still better at batch to do such stuff.

strFile= WScript.Arguments(0)
Set objFS = CreateObject( "Scripting.FileSystemObject" )
Set d = CreateObject("Scripting.Dictionary")
Set objFile = objFS.OpenTextFile(strFile)
Do Until objFile.AtEndOfStream
    strLine=objFile.ReadLine    
    If Not d.Exists(strLine) Then
        d.Add strLine, 1
    End If 
Loop
objFile.Close
For Each strkey In d.Keys       
    WScript.Echo strkey ',  d.Item(strkey) 
Next

Usage:

C:\test>cscript //nologo myscript.vbs file
ghostdog74
Awesome effort; however, I am looking for a BATCH solution.I can appreciate that others that are searching this may be happy to find your answer, so thanks for your efforts.
Edoctoor