views:

499

answers:

2

Hi, I'm using the FINDSTR function to filter text files, but it's failling on extended ASCII characters. I tried using the CharToOEM function, but I still have characters such as 'à', which FINDSTR doesn't seems to recognize.

I want to use FINDSTR because the text files I work with are 100MB big, so I need something fast. Does a function exist, which renames the Strings so they have no 'weird' characters ?

The code is :

CharToOEM(PChar(lASCFileNameFull),PChar(lASCFileNameFull));
    renameFile(Format('%s.bak',[lASCFileNameFullBak]),Format('%s.bak',[lASCFileNameFull]));

    Si.dwFlags:=STARTF_USESHOWWINDOW;
    Si.wShowWindow:=SW_SHOWNORMAL;

    SetFileApisToOEM;
    CreateProcess(nil,pchar(Format('cmd.exe /K echo on && echo Processing filter...&& findstr "%s" %s.bak > %s',[commandString,lASCFileNameFull,lASCFileNameFull])),nil,nil,True,
    0,nil,nil,Si,Pi);
    WaitForSingleObject(pi.hProcess,INFINITE);
    SetFileApisToANSI;

Too bad, FINDSTR can't find the file... Edit : This is Delphi 2007.

Edit : I thought of using a loop like :

while(!eof) do begin
  readLN(mySrcFile, currentLine);
  if strContains(currentLine, searchSyntax) then
    writeLN(destFile,currentLine);
end;

Unfortunatly, I can't find such a "strContains" function (and it would probably be slow). The search string is nothing complicated, it's a bunch of HEX value : "C2 | 1AF | B8 | ..."

Final edit : Sometimes it's better to get back to basics :) I just replace all the extended characters by an underscore by testing the character value :

for I := 1 to length(lASCFileNameFull) do begin
  if integer(lASCFileNameFull[i])>127 then
    lASCFileNameFull[i]:='_';
end;

I hope someone will use this someday :) Thanks for the help, Gramm

A: 

In order to perform successive search two things are necessary:

  • You have to match your non-unicode language to the language used in your ansi-encoded file. If it's not your current language, change it temporary:

    Control Panel\Regional and Language Options\Advanced\Language for non-Unicode Programs

  • to perform case-insensitive search you have to use /i option in FindStr.

Maksee
I cannot ask each user to change their regional settings when using the software :).The Case-sensitive search is not an issue, as the files are well-formatted (well, except for the file names)
gramm
A: 

Why don't you code it simply in Delphi? One could use simple text I/O (with a slightly enlarged filebuffer), or go all the way and try with binary block level access.

Marco van de Voort