views:

352

answers:

3

I'm using the GAE datastore for a Java application, and storing some text that will be in numerous languages. In my servlet, I'm first checking to see if there's any data in the data store, and, if not, I'm creating some, similar to the following:

ArrayList<Lang> list = new ArrayList<Lang>();
list.add(new Lang("EN", "English", 1));
list.add(new Lang("ES", "Español", 0));
//more languages here...

PersistenceManager pm = PMF.get().getPersistenceManager();
for(Lang l : list) {
  pm.makePersistent(l);
}

Since this is using JDO, I guess I should include the relevent parts of the Lang class too:

@PersistenceCapable
public class Lang {
 @PrimaryKey
 private String code;
 @Persistent
 private String name;
 @Persistent
 private int popularity;
// getters & setters & constructors...
}

However, the non-ASCII characters are giving me grief. I've set my Eclipse project to use the UTF-8 encoding instead of the default Cp1252, so I think I'm okay from that perspective, but when I use the App Engine Data Viewer to look at my data, that Español entry becomes Español, and when I click on it to view it, I get a 500 Server Error. (There are some other entries with right-to-left text that don't even show up in the Data Viewer at all, but one problem at a time...)

Is there anything special I can do in my code to set the character encoding, or specify to GAE that the data I'm storing is UTF-8? Or is the problem on the Eclipse side, and is there something I should be doing with my Java code?

A: 

Are you sure you have a problem with your data? I also encountered the similar issues before but it turns out it's a problem in the Python version of the Data Viewer. I can retrieve my data fine in Java.

ZZ Coder
Yes, I'm sure it's a problem with the data. When I enter the data through the Data Viewer manually, I see the data tine, and my app is able to get the data back properly through JSON as well. But when I create the data through Java code, it somehow gets garbled on its way to the database.
sernaferna
Maybe your string is messed up in Java already. Say your editor is in UTF-8 but your server is in Latin-1. You will get that garbled text.
ZZ Coder
+1  A: 

" I've set my Eclipse project to use the UTF-8 encoding instead of the default Cp1252, so I think I'm okay from that perspective"

This is not relevant to this problem. The character encoding you set in Eclipse is only used for your source files, it does not affect what encoding is used to load from or save to the data store.

Kees Kist
Well, it's relevant from the point of view that at least I know the original data was correct; if it was in Cp1252 in the source files, then it would have been incorrect right from the start. But to your point, how does one determine and/or set the encoding when saving to the data store?
sernaferna
A: 

Hi. I had I think the same problem with encoding several month ago. You can take a look to my sources, maybe it'll help: 1) http://code.google.com/p/vocrecaptor/source/browse/trunk/vocrecaptorweb/src/com/vocrecaptor/web/server/DictionaryServiceImpl.java

2) And class /com/vocrecaptor/web/server/servlet/AbstractServiceServlet.java

sandlex