views:

103

answers:

2

I am writing a test app for a larger project and cant seem to retrieve Unicode CSV data from the Windows Clipboard, I am successful at retrieving CF_UNICODETEXT, using the built in GetClipboardData api call, however when I place Unicode CSV on the clipboard in MSExcel and try to retrieve with CSV format, I get bad data. Here is some code;

procedure TForm1.Button7Click(Sender: TObject);
var
   hMem     : THandle;
   dwLen    : DWord;
   ps1, ps2 : pChar;
begin
   OpenClipboard( form1.Handle );
   RichEdit1.Lines.Clear;
   try
      if Clipboard.HasFormat( CF_UNICODETEXT ) then
      begin
         hMem := GetClipboardData( CF_UNICODETEXT );
         ps1 := GlobalLock( hMem );
         dwLen := GlobalSize( hMem );
         ps2 := StrAlloc( 1 + dwLen );
         StrLCopy( ps2, ps1, dwLen );
         GlobalUnlock( hMem );
         RichEdit1.Lines.Add( ps2 );
      end
      else
         ShowMessage( 'No CF_UNICODETEXT on Clipboard!' );
   finally
      CloseClipboard;
   end;
end;

Now this code should work for CSV as well, but when I change my Clipboard format to what I'm desiring, the app will not get proper data. It might be important to know that I can get tabbed Unicode just fine, just not he CSV I desire.

+3  A: 

The CSV clipboard format Excel uses is ANSI encoded, not Unicode.

From dumping the Excel 2007 clipboard, the ones that are Unicode enabled are:

  • CF_UNICODETEXT
  • "HTML Format"
  • "Rich Text Format"
  • "XML Spreadsheet"

"XML Spreadsheet" and "HTML Format" both have well defined tables/rows, so they shouldn't be too hard to pull data from.

Craig Peterson
But, but, but, getting data from the clipboard should do an automatic conversion between CF_TEXT, and CF_UNICODETEXT. See: Synthesized Clipboard Formats in http://msdn.microsoft.com/en-us/library/ms649013(VS.85).aspx. But what could be happening is that the CF_UNICODETEXT pulls UTF-8 from the Clipboard instead of UTF-16LE? Would be strange though seeing as Windows is UTF-16LE natively.
Marjan Venema
@Marjan: CF_UNICODE works, but it's *tab* delimited, not comma separated. Excel includes a second format on the clipboard that's ANSI-encoded CSV, and that's what wfoster is asking about. His question is actually: "This code works correctly, but fails if I replace `CF_UNICODETEXT` with `RegisterClipboardFormat('CSV')`".
Craig Peterson
@Craig: Got it. Thanks for the clarification
Marjan Venema
Great, thanks for the help. I'll probably be using tabbed format then. Also stumbled upon this http://blogs.msdn.com/b/michkap/archive/2005/09/17/470413.aspx it would seem that Excel should be doing the Unicode CSV as it can save any language to a CSV file, but I guess I can't have everything
wfoster
+1  A: 

You need to request the CF_CSV format. AFTER you get the data as CF_CSV, then you can treat it as an AnsiString, and then convert to a UnicodeString, if you desire.

Here's a screenshot showing 6 cells copied from Excel2007. I captured into ClipMate as CF_CSV, then displayed with ClipMate's hex viewer. You'll see that the fields are separated by commas (hex 2C), terminated by CRLF (x0Dx0A). What you see below is an annotated composite, showing Excel, the region copied, and ClipMate's rendering of the CF_CSV as hex bytes. alt text

Also, interesting reading in this related thread: http://stackoverflow.com/questions/967878/get-csv-data-from-clipboard-pasted-from-excel-that-contains-accented-characters

Chris Thornton
The standard clipboard formats are listed at http://msdn.microsoft.com/en-us/library/ff729168%28VS.85%29.aspx, and CSV isn't one of them, so yes, you need to use RegisterClipboardFormat. Since the question was specifically about Unicode data on the clipboard, saying that he can convert from ANSI isn't that helpful either.
Craig Peterson
There is no CF_CSV format declared in Windows.pas
wfoster
@wfoster, @Craig - oops! Sorry about that. CF_CSV does need to be registered. But it is what it is, and it isn't going to be Unicode. So you can either treat it as Ansi, or you can build your own CSV from the UnicodeText, and guess at where the cols should break.
Chris Thornton