views:

2243

answers:

5

This is from a Oracle SQL query. It has these weird skinny rectangle shapes in the database in places where apostrophes should be. (I wish we would could paste screen shots in here)

It looks like this when I copy and paste the results.

spouse�s

is there a way to write a SQL SELECT statement that searches for this character in the field and replaces it with an apostrophe in the results?

Edit: I need to change only the results in a SELECT statement for reporting purposes, I can't change the Database.


I ran this

select dump('�') from dual;

which returned

Typ=96 Len=3: 239,191,189

This seems to work so far

select translate('What is your spouse�s first name?', '�', '''') from dual;

but this doesn't work

select translate(Fieldname, '�', '''') from TableName


Select FN from TN

What is your spouse�s first name?

SELECT DUMP(FN, 1016) from TN

Typ=1 Len=33 CharacterSet=US7ASCII: 57,68,61,74,20,69,73,20,79,6f,75,72,20,73,70,6f,75,73,65,92,73,20,66,69,72,73,74,20,6e,61,6d,65,3f


EDIT: So I have established that is the backquote character. I can't get the DB updated so I'm trying this code

SELECT REGEX_REPLACE(FN,"\0092","\0027") FROM TN

and I"m getting ORA-00904:"Regex_Replace":invalid identifier

A: 

I'm going to propose a front-end application-based, client-side approach to the problem:

I suspect that this problem has more to do with a mismatch between the font you are trying to display the word spouse�s with, and the character �. That icon appears when you are trying to display a character in a Unicode font that doesn't have the glyph for the character's code.

The Oracle database will dutifully return whatever characters were INSERTed into its' column. It's more up to you, and your application, to interpret what it will look like given the font you are trying to display your data with in your application, so I suggest investigating as to what this mysterious � character is that is replacing your apostrophes. Start by using FerranB's recommended DUMP().

Try running the following query to get the character code:

SELECT DUMP(<column with weird character>, 1016) 
FROM <your table> 
WHERE <column with weird character> like '%spouse%';

If that doesn't grab your actual text from the database, you'll need to modify the WHERE clause to actually grab the offending column.

Once you've found the code for the character, you could just replace the character by using the regex_replace() built-in function by determining the raw hex code of the character and then supplying the ASCII / C0 Controls and Basic Latin character 0x0027 ('), using code similar to this:

UPDATE <table>
    set <column with offending character> 
            = REGEX_REPLACE(<column with offending character>,
                            "<character code of �>",
                            "'")
WHERE regex_like(<column with offending character>,"<character code of �>");


If you aren't familiar with Unicode and different ways of character encoding, I recommend reading Joel's article The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!). I wasn't until I read that article.


EDIT: If your'e seeing 0x92, there's likely a charset mismatch here:

0x92 in CP-1252 (default Windows code page) is a backquote character, which looks kinda like an apostrophe. This code isn't a valid ASCII character, and it isn't valid in IS0-8859-1 either. So probably either the database is in CP-1252 encoding (don't find that likely), or a database connection which spoke CP-1252 inserted it, or somehow the apostrophe got converted to 0x92. The database is returning values that are valid in CP-1252 (or some other charset where 0x92 is valid), but your db client connection isn't expecting CP-1252. Hence, the wierd question mark.

And FerranB is likely right. I would talk with your DBA or some other admin about this to get the issue straightened out. If you can't, I would try either doing the update above (seems like you can't), or doing this:

INSERT (<normal table columns>,...,<column with offending character>) INTO <table>
SELECT <all normal columns>, REGEX_REPLACE(<column with offending character>,
                             "\0092",
                             "\0027")  -- for ASCII/ISO-8859-1 apostrophe
FROM <table>
WHERE regex_like(<column with offending character>,"\0092");

DELETE FROM <table> WHERE regex_like(<column with offending character>,"\0092");
sheepsimulator
"The Oracle database will dutifully return whatever characters were INSERTed into its' column"Not quite. If the data is character (eg VARCHAR2) not RAW, Oracle can do conversion from the characterset of the client inserting the data to the characterset for the database and then again to the characterset for the client requesting the data. Generally, if the database is set up with UTF8 and the clients are set the same you don't get a problem though.
Gary
Assuming you have control over said charactersets. In the end, it's all about bits and what you interpret them to mean. A robust application ought to deal with issues like this internally too.
sheepsimulator
I can't do any updates
FashionHouseJewelry.com
How do I get the character code?
FashionHouseJewelry.com
Try running the following query: SELECT DUMP(<column with weird character>, 1016) FROM <your table> WHERE <column with wierd character> like '%spouse%'.
sheepsimulator
I ran it an added it to my post at the top.
FashionHouseJewelry.com
I think it is a backquote. The apps-old vb6-(actually the sql) that people use here can't handle apostrophes so I think the dba told them to use backquotes.
FashionHouseJewelry.com
I can't update the tables... and it would break the apps. I'll see if I can get the select to work though for my app I'm building. thanks
FashionHouseJewelry.com
I"m getting ORA-00904:"Regex_Replace":invalid identifierSELECT REGEX_REPLACE(FN,"\0092","\0027")FROM TN
FashionHouseJewelry.com
@Billy - If your'e making this conversion in your app anyway, you could also try replacing the character in your client code. That way, you could comment thoroughly why you're replacing the character. Or, I'd change your client app code page to show the backquote correctly.
sheepsimulator
A: 

Before you do this you need to understand what actually happened. It looks to me that someone inserted non-ascii strings in the database. For example Unicode or UTF-8. Before you fix this, be very sure that this is actually a bug. The apostrophe comes in many forms, not just the "'".

St3fan
+6  A: 

This seems a problem with your charset configuracion. Check your NLS_LANG and others NLS_xxx enviroment/regedit values. You have to check the oracle server, your client and the client of the inserter of that data.

Try to DUMP the value. you can do it with a select as simple as:

SELECT DUMP(the_column)
  FROM xxx
 WHERE xxx

UPDATE: I think that before try to replace, look for the root of the problem. If this happens because a charset trouble you can get big problems with bad data.

UPDATE 2: Answering the comments. The problem may be is not on the database server side, may be is in the client side. The problem (if this is the problem) can be a translation on server to/from client comunication. It's for a server-client bad configuracion-coordination. For instance if the server has defined UTF8 charset and your client uses US7ASCII, then all acutes will appear as ?.

Another approach can be that if the server has defined UTF8 charset and your client also UTF8 but the application is not able to show UTF8 chars, then the problem is in the application side.

UPDATE 3: On your examples:

  • select translate('What . It works because the � is exactly the same char: You have pasted on both sides.
  • select translate(Fieldname. It does not work because the � is not stored on database, it's the char that the client receives may be because some translation occurs from the data table until it's showed to you.

Next step: Look in DUMP syntax and try to extract the codes for the mysterious char (from the table not pasting �!).

FerranB
+1 for the DUMP info, it's very useful.
sheepsimulator
Note that the OP may have no control over this. He may not be a Oracle admin and may not be able to change or enforce these client-side settings in all deployment areas, including inbound interfaces from other systems. My guess, from looking at his questions, that he is an app developer, not a DB admin. From that perspective, he may better off with just replacing them. It all depends upon his DBAs and how easy they are to work with. :)
sheepsimulator
Anyway if the problem exists this is a workaround for a concrete point. But if the don't solve (or request DBA to solve) the root problem, the database may be is not in a good state.
FerranB
I'm not a DB admin. There's no chance of any changes being made to the database.
FashionHouseJewelry.com
Answered above.
FerranB
Have you tried using nested replace() ?
Sathya
Given his character response, there likely is a charset mismatch between the database and the clients. I wouldn't have thought of that; it'll be something I'll need to be more cognizant of in the future.
sheepsimulator
+2  A: 

I would say there's a good chance the character is a single-tick "smart quote" (I hate the name). The smart quotes are characters 91-94 (using a Windows encoding), or Unicode U+2018, U+2019, U+201C, and U+201D.

Adam Crume
A: 

TRANSLATE() is a useful function for replacing or eliminating known single character codes.

David Aldridge