ansaurus

Question

Convert Hi-Ansi chars to Ascii equivalent (é -> e) in Delphi(2007)

Answer 1

+3 A:

I believe your best bet is creating a lookup table.

Padu Merloti 2009-12-11 22:22:53

Also, if you're using a decent regex library with delphi, that could be used as well, but it still is kind of a lookup table.

Padu Merloti 2009-12-11 22:28:40

Thanks Padu. That's what I thought. I'll nevertheless accept Craig's answer because it's more generic.

François 2009-12-14 18:23:19

Answer 2

+1 A:

What you are looking for is normalization.

Michael Kaplan wrote a nice blog article about normalization.

It does not immediately solve your problem, but points you in the right direction.

--jeroen

Jeroen Pluimers 2009-12-11 23:19:19

NFKD + removal of combining marks works a lot of the time. However, there are characters like `ÆÐØÞßæðøþ` that do not decompose and have to be dealt with manually.

dan04 2010-07-02 02:30:13

Answer 3

+12 A:

WideCharToMultiByte does best-fit mapping for any characters that aren't supported by the specified character set, including stripping diacritics. You can do exactly what you want by using that and passing 20127 (US-ASCII) as the codepage.

function BestFit(const AInput: AnsiString): AnsiString;
const
  CodePage = 20127; //20127 = us-ascii
var
  WS: WideString;
begin
  WS := WideString(AInput);
  SetLength(Result, WideCharToMultiByte(CodePage, 0, PWideChar(WS),
    Length(WS), nil, 0, nil, nil));
  WideCharToMultiByte(CodePage, 0, PWideChar(WS), Length(WS),
    PAnsiChar(Result), Length(Result), nil, nil);
end;

procedure TForm1.Button1Click(Sender: TObject);
begin
   ShowMessage(BestFit('aÀàËëÇç–—€¢Š'));
end;

Calling that with your examples produces results you're looking for, including the emdash-to-minus case, which I don't think is handled by Jeroen's suggestion to convert to Normalization form D. If you did want to take that approach, Michael Kaplan has a blog post the explicitly discusses stripping diacritics (rather than normalization in general), but it uses C# and an API that was introduces in Vista. You can get something similar using the FoldString api (any WinNT release).

Of course if you're only doing this for one character set, and you want to avoid the overhead from converting to and from a WideString, Padu is correct that a simple for loop and a lookup table would be just as effective.

Craig Peterson 2009-12-12 05:33:34

Thanks Craig. That's a more generic solution than the lookup. It had a typo in the magic number, so I corrected it and used a constant instead. But anyway, it works on D2007 as well as D2009.

François 2009-12-14 18:20:36

ansaurus

tags:

views:

answers:

Convert Hi-Ansi chars to Ascii equivalent (é -> e) in Delphi(2007)

related questions