views:

380

answers:

3

Has anyone noticed that if you retrieve HTML from the clipboard, it gets the encoding wrong and injects weird characters?

For example, executing a command like this:

string s = (string) Clipboard.GetData(DataFormats.Html)

Results in stuff like:

<FONT size=-2>  <A href="/advanced_search?hl=en">Advanced 
Search</A><BR>  <A href="/preferences?hl=en">Preferences</A><BR>  <A 
href="/language_tools?hl=en">Language 
Tools</A></FONT>

Not sure how MarkDown will process this, but there are weird characters in the resulting markup above.

It appears that the bug is with the .NET framework. What do you think is the best way to get correctly-encoded HTML from the clipboard?

A: 

Here's PowerShell script you could modify to the clipboard to change any encoding problems.

http://www.johndcook.com/blog/2008/10/17/manipulating-the-clipboard-with-powershell/

John D. Cook
+1  A: 

Some useful information from this link:

http://www.devnewsgroups.net/group/microsoft.public.dotnet.framework.windowsforms/topic25839.aspx

Turnkey
A: 

You have to interpret the data as UTF-8. See http://stackoverflow.com/questions/189640/ms-office-hyperlinks-change-code-page.

Ken Paul