views:

97

answers:

4

I am trying to fetch records from Oracle 9i database whose value are 'FOO - BAR' for a particular column.

The 'FOO - BAR' condition values are present in the table, But when I run the query, the results are not coming.

After breaking my head for sometime, figured out that there is a difference in the size of the hypen's.

'FOO – BAR'
'FOO - BAR'

This time I copied (through SQLDeveloper) the DB value and pasted the value in Eclipse IDE. I did NOT see any difference in the Eclipse. When I run the query from IDE, to my surprise it worked and the values were coming.

I deployed the same class file to our servers(UX box), and again back to square no 1 - the values are not coming.

I gave a try changing the file encoding of my IDE from default(Cp1252) to UTF-8 and everything is fine now.

Can some one explain what's happening? Will the size of hyphen varies with encoding? do we have three different kinds of dashes?

+5  A: 

There are plenty of different Unicode characters for em-dash, en-dash, hyphen etc. They're used in different contexts in text.

You may want to read the "Dash" wikipedia article for more information. Note that a hyphen isn't quite the same thing :)

As for how encodings come into things - it will depend on how you're inserting, retrieving and then displaying the data. I wouldn't start using cut and paste - that may very well try to find the closest available value to the actual data represented in the database.

Jon Skeet
+2  A: 

Do we have three different kinds of dashes?

Yes. In fact, depending on exactly what you count as a dash, there are many more than 3 Unicode code points for hyphens and dashes. (And the page that @Jon Skeet linked to doesn't even list all of the dash characters. For instance there are 4 "horizontal dash" codepoints in the U2500 code plane ... together with "vertical dashes" and "double dashes".)

Will the size of hyphen vary with encoding.

That's an unanswerable question. The actual size of a hyphen depends on how it is rendered on the screen or page by the rendering software. However, the fact remains that in some character sets, there are different code-points that conceptually represent different sizes of hyphen or dash. Strictly speaking, this has nothing to do with encodings per se.

What is probably going on here is that characters are being mapped from one character set to another character set at various points. At some points the target character set does not support multiple hyphen / dash characters, and the mapping is quietly converting "funky" hyphens to "ordinary" hyphens.

Cutting and pasting characters isn't going to help you. You need a way to see the actual codepoint values in hexadecimal.

Stephen C
A: 

try select rawtohex(col) from table

  • I'm away from my oracle instance but I think this works from memory it gives you a hex dump and then lookup the codes
daven11
A: 

You can discover real codes stored in your DB with help of ASCII() Oracle function. E.g.

select ascii(substr(t.strng_field, 5, 1)) from string_table t

But real function(s) to deal with such cases depends on database character set of string column and sources of data for filling table.

You can choose from string character set translation functions like convert, decompose or unistr listed here.

ThinkJet