views:

383

answers:

3

I'm building a gwt app that stores the text of random webpages in a datastore text field. Often the text is formatted UTF-8. All the files of my app are stored as UTF-8 and when I run the application on my local machine the entire process works fine. UTF-8 text is stored as such and retrievable ftom the local version of the app engine as UTF-8. However when I deploy the app to the google app engine somewhere between when I store the text and when I retrieve it it is no longer UTF-8 which causes non-ascii characters to be displayed as ?.

When I view the datastore in the appengine control panel all the special characters appear as ? which leads me to believe that it is a problem when writing to the database.

Does anyone know how to fix this?

The app itself is a little big. Here's some pseudocode:

Text webPageText = new Text(<STRING THAT CONTAINS UNICODE CHARACTERS>);

/*Some Code to store Text object on datastore
Specifically I'm using javax.jdo.PersistenceManager to do this.
Some Code to retrieve text from datastore. */

String retrievedText = webPageText.getValue();

The problem is that retrievedText comes back with ? instead of unicode characters.

Here's a similar problem in python that I found:
"http://stackoverflow.com/questions/3094391/trying-to-store-utf-8-data-in-datastore-getting-unicodeencodeerror." Though my app is not getting any errors.

Unfortunately I think Java strings are default utf-8 and I can't find any code that will let me declare them explicitly as utf-8.

Edit: I've now built a small webapp that takes in unicode text and stores it in the datastore and then retrieves it with no problems. I still have no idea where the problem is in my original source code but I'm going to change the way my code handles webpage retrieval to match the smaller app that I just built. Thank you everyone for your help.

A: 

These links may prove useful, afterall:

http://stackoverflow.com/questions/960330/how-to-set-google-app-engine-java-content-type-to-utf-8

http://code.google.com/appengine/docs/python/tools/webapp/buildingtheresponse.html

George Marian
I don't know python very well but I don't think those are quite what I'm looking for. I only serve 1 webpage which is also UTF-8 and I can send UTF-8 text between the client and the server. The only problem only occurs when UTF-8 text is stored/retrieved from the appengine datastore.
Richard Wallis
My statement above may be incorrect. I'm not sure that I can send UTF-8 text between client and appengine server. Will check this tomorrow.
Richard Wallis
A: 

I tried to convert String to ByteArray and then store it as datastore blob.

//Save String as Blob
Blob webPageText = new Blob(<STRING THAT CONTAINS UNICODE CHARACTERS>.getBytes());

//Retrieve Blob as String
String retrievedText = new String(webPageText.getBytes());

I originally thought this had solved the problem but I had by mistake only tested it on my local server. This code still returns ? instead of unicode characters which leads me to believe that the problem isn't in the datastore but in the transfer from the app engine to the client.

Richard Wallis
A: 

I had encounted the same promblem. Do you find the way to resolve it?

simon
I had resolved the problem. It's because of the form encoding not the production database. Resolve: Before the code that persistent the data to the database, set 'req.setCharacterEncoding( "utf-8" );'.
simon