In Java, I have written a program that reads a UTF8 text file. The text file contains a SQL query of the SELECT kind. The program then executes the query on the Microsoft Access 2007 database and writes all fields of the first row to a UTF8 text file.
The problem I have is when a row is returned that contains unicode characters, such as "♪". These characters show up as "?" in the text file.
I know that the text files are read and written correctly, because a dummy UTF8 character ("◎") is read from the text file containing the SQL query and written to the text file containing the resulting row. The UTF8 character looks correct when the written text file is opened in Notepad, so the reading and writing of the text files are not part of the problem.
This is how I connect to the database and how I execute the SQL query:
Connection c = DriverManager.getConnection("jdbc:odbc:Driver={Microsoft Access Driver (*.mdb, *.accdb)};DBQ=C:/database.accdb;Pwd=temp");
ResultSet r = c.createStatement().executeQuery(sql);
I have tried making a charSet property to the Connection but it makes no difference:
Properties p = new Properties();
p.put("charSet", "utf-8");
p.put("lc_ctype", "utf-8");
p.put("encoding", "utf-8");
Connection c = DriverManager.getConnection("...", p);
Tried with "utf8"/"UTF8"/"UTF-8", no difference. If I enter "UTF-16" I get the following exception:
java.lang.IllegalArgumentException: Illegal replacement
Been searching around for hours with no results and now turn my hope to you. Please help!
I also accept workaround suggestions. =) What I want to be able to do is to make a Unicode query (for example one that searches for posts that contain the "あ" character) and to have results with Unicode characters receieved and saved correctly.
Thank you!
Update. Here is a self-contained example of the issue:
package test;
import java.io.BufferedReader;import java.io.File;import java.io.FileInputStream;import java.io.FileOutputStream;import java.io.InputStreamReader;import java.io.OutputStreamWriter;import java.nio.charset.Charset;import java.sql.Connection;import java.sql.DriverManager;import java.sql.ResultSet;import java.util.Properties;
public class Standalone {
public static void main(String[] args) {
try {
Properties p = new Properties();
p.put("charSet", "UTF8");
Connection c = DriverManager.getConnection("jdbc:odbc:Driver={Microsoft Access Driver (*.mdb, *.accdb)};DBQ=./dummy.accdb;Pwd=pass", p);
ResultSet r = c.createStatement().executeQuery("SELECT TOP 1 * FROM main;");
r.next();
OutputStreamWriter osw = new OutputStreamWriter(new FileOutputStream(new File("results.txt")), Charset.forName("UTF-8"));
osw.write(new BufferedReader(new InputStreamReader(new FileInputStream("utf8.txt"), Charset.forName("UTF-8"))).readLine() +" : "+ r.getString("content"));
osw.close();
c.close();
System.out.println("Done.");
} catch (Exception e) {
e.printStackTrace();
}
}
}
What the example does is that it opens the database "dummy.accdb" encrypted with the password "pass" and pulls the first post out of the table "main". It then reads the text file "utf8.txt" and writes a text file "results.txt" which will contain the first row of "utf8.txt" plus the value of the field "content" it got from the database.
In the file "utf8.txt" I have stored "♜♞♝♛♚♝♞♜♟♖♘♗♕♔♗♘♖♙". In the database's "main" table's "content" field I have stored "♫♪あキタℳℴℯ♥∞۞♀♂".
After the application has finished running the "results.txt" has the following content: "♜♞♝♛♚♝♞♜♟♖♘♗♕♔♗♘♖♙ : ?????Moe?8???".
It successfully read and write the UTF8 characters of the "utf8.txt" text file, but failed to obtain the correct characters from the database. This is where the problem lies.
Update. Thought I should mention that the field in the database is of the type "memo", I have tried havig "Unicode Compression" set both to "No" and to "Yes" (recreating the post between tries to make sure no compression were there when "No" was selected). To my understanding Access uses UTF-16 when it saves Unicode characters, however with compression on it changes to UTF-8. In any case this did not make any difference.
Bonus question, anyone know how to connect to the database using a pure ODBC provider in Java? Or any other kind of method? This would provide me with a good workaround.
Update. I have been trying to feed these four to getConnection:
"Provider=Microsoft.Jet.OLEDB.4.0;Data Source=./dummy.accdb"
"jdbc:odbc:Provider=Microsoft.Jet.OLEDB.4.0;Data Source=./dummy.accdb"
"jdbc:odbc:Driver={Microsoft.Jet.OLEDB.4.0};Data Source=./dummy.accdb"
"jdbc:odbc:Provider=Microsoft.ACE.OLEDB.12.0;Data Source=./dummy.accdb"
The first give the error "java.sql.SQLException: No suitable driver found for Provider=Microsoft.Jet.OLEDB.4.0;Data Source=./dummy.accdb" and the two in the middle gets "java.sql.SQLException: [Microsoft][ODBC Driver Manager] Data source name not found and no default driver specified". The last one gets "java.sql.SQLException: [Microsoft][ODBC Driver Manager] Data source name too long".
I don't understand what getConnection wants. The parameter description is as follows: "url - a database url of the form jdbc:subprotocol:subname". Huh? I clearly don't get what that means.
Anyone know any alternative working ways of connecting to the Access 2007 database through Java? Maybe the providers I tried aren't supported but some other might be?