views:

69

answers:

5

Hey,

I really seem to have a big problem here. I'm using MySQL to store part-of-speech tagged sentences in a table. The Table looks like this:

+------------+------------------+------+-----+---------+-------+
| Field      | Type             | Null | Key | Default | Extra |
+------------+------------------+------+-----+---------+-------+
| idTitle    | varchar(25)      | NO   | PRI | NULL    |       |
| idReview   | int(10) unsigned | NO   | PRI | NULL    |       |
| idSentence | int(10) unsigned | NO   | PRI | NULL    |       |
| content    | text             | NO   |     | NULL    |       |
| POSInfo    | text             | YES  |     | NULL    |       |
+------------+------------------+------+-----+---------+-------+

These are the indexes on the table:

+-----------------+------------+-----------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table           | Non_unique | Key_name                    | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+-----------------+------------+-----------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| reviewsentences |          0 | PRIMARY                     |            1 | idSentence  | A         |          23 |     NULL | NULL   |      | BTREE      |         |
| reviewsentences |          0 | PRIMARY                     |            2 | idTitle     | A         |       32087 |     NULL | NULL   |      | BTREE      |         |
| reviewsentences |          0 | PRIMARY                     |            3 | idReview    | A         |     2470720 |     NULL | NULL   |      | BTREE      |         |
| reviewsentences |          1 | fk_ReviewSentences_Reviews1 |            1 | idTitle     | A         |         983 |     NULL | NULL   |      | BTREE      |         |
| reviewsentences |          1 | fk_ReviewSentences_Reviews1 |            2 | idReview    | A         |      494144 |     NULL | NULL   |      | BTREE      |         |
+-----------------+------------+-----------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+

I'm trying to read the reviewsentences that blong to a certain review an add them to the a review object. I'm accessing the Database via JDBC and the reads take forever!! I'm talking 2 minutes for 26 rows! This is the java code I'm using to query the database:

public List<Review> fillupReviews(List<Review> reviews, boolean tagged){

    try {
        Statement stmt = dbConnection.createStatement() ;


        for (Review review : reviews) {
            ResultSet rs=null;
            if(tagged==true){
                rs = stmt.executeQuery("SELECT idSentence, POSInfo FROM reviewsentences WHERE idTitle="+review.getMovieID()+" and idReview="+review.getReviewID()+";") ;
            }else{
                rs = stmt.executeQuery("SELECT idSentence, content FROM reviewsentences WHERE idTitle="+review.getMovieID()+" and idReview="+review.getReviewID()+";") ;
            }

            while(rs.next()){
                review.addTaggedSentence(rs.getInt(1),rs.getString(2));
            }
        }
    } catch (SQLException e) {
        e.printStackTrace();
    }

    return reviews;
}

If I access the same table with the same query via the MySQL Workbench it takes 0.296 seconds?? So my guess is there has to be something seriously wrong! But I really can not see what goes wrong or what to change to speed this darn thing up. Please can someone give me a hint?

It's me again, I finally found the solution! Is called Prepared Statement!! <-- who would have guessed!? Here is the code:

public List<Review> fillupReviews(List<Review> reviews, boolean tagged){

        try {

            PreparedStatement selectReview=null;
            if(tagged==true){
                selectReview = dbConnection.prepareStatement("SELECT idSentence, POSInfo FROM reviewsentences WHERE idTitle= ? AND idReview= ?;");
            }else{
                selectReview = dbConnection.prepareStatement("SELECT idSentence, Content FROM reviewsentences WHERE idTitle= ? AND idReview= ?;");
            }

            for (Review review : reviews) {

                selectReview.setString(1, review.getMovieID());
                selectReview.setInt(2, review.getReviewID());

                ResultSet rs = selectReview.executeQuery();

                while(rs.next()){
                    review.addTaggedSentence(rs.getInt(1),rs.getString(2));
                }
                rs.close();
            }
        } catch (SQLException e) {
            e.printStackTrace();
        }

        return reviews;
    } 

Now this hole thing runs like hell (nearly as fast as MySQL Workbench does[0.3 sec]). What i do not exactly get is why a normal statement is so slow? Has someone an explanation for that?

A: 

If it's local MySQL server, then I would try to comment MySQL fetch, and replace with dummy code to check performance of your code (don't forget to use yout GetMovieId() etc things).

BarsMonster
A: 

First, are you timing just this method call?

Where do you get the db connection, are you timing just the time to execute the query or the time to get the connection also?

Are you using connection pooling? Maybe there is an issue there, try getting a new connection first to narrow it down.

Regardless it shouldn't take this long, something is wrong, I suspect with your connection setup maybe the way java is finding mysql (is it local, are you using dns etc).

Also I would use prepared statements, they are more secure and better performing.

Also what driver are you using?

Joelio
I get the db connection in the constructor of my class like this this.dbConnection = MySQLConnectionFactory.getConnection("moviereviews");. Im timing just the time till the rs = stmt.executeQuery(... returns. No i don't use connection pooling. The database runs one the same machine where the java code runs so i don't you any dns. This is the driver I'm using http://www.mysql.com/downloads/connector/j/
evermean
Just a WAG, but is there enough memory on your box, are you swapping like crazy? Is it possible the JVM is using up your memory that normally mysql would make use of?
Joelio
Wheres the love, see my suggestion for prepared statement....
Joelio
A: 

Comment out review.addTaggedSentence(rs.getInt(1),rs.getString(2)); Does it still take that much time ?

You are not closing the ResultSet, you need to do rs.close() after your while(rs.next()) is done.

Print out the actual SQL you're running in Java - are you 100% sure it's the same query you run in in the MySQL workbench ?

You're also running the query in a for (Review review : reviews), so how many queries does that run ?

You're not telling us what takes 2.36, min. so - profile or do some simple System.out.println()s in your java app, so you're really certain what SQL you're actually running, and how many of them gets run.

nos
1. If commented out review.addTaggedSentence(rs.getInt(1),rs.getString(2)); it doesn't help :(2. Added rs.close() still the same.3. Checked and Yes 100% sure!4. exactly 159598! But it takes 2min for every single one! So the problem is not the number of 159598 but that each query takes 2min :(5. It takes 2min for the rs = stmt.executeQuery(... to return an even get to the part that assigns the values to the review.
evermean
Are you sure you really get 159598 rows back when you do this in MySQL workbench ? run the `mysqladmin pr` status on the machine with mysql on, and see what it's doing while your java code is executing
nos
A: 

It appears you have 2470720 reviews. Compared to running locally in Mysql Workbench, running remotely (in your code) will certainly take longer to query, return, and transfer that many values.

Mysql workbench likely fetches the count of the results and paginates what it gives you -- only returning the results as necessary; also, your workbench could be enabling compression on the connection while JDBC is not, thus creating an even faster connection.

nessence
A: 

First of all. Can you give specific detail on which part of your code that takes 2m++. Is it in statement stmt.executeQuery(String) or another part?

I myself happen to face this problem with a big major DBMS. But my data was larger than 150K. Unfortunately, i don't have a ready-baked solution for your problem. But there are some footprints of what i did.

  1. I tried to switch from driver to driver. Please keep in mind that some driver may runs faster but it asks you to sacrifice portability.
  2. I tried to switch from hard-coded connection to connection pool. Unfortunately, it wasn't really helpful.
  3. I tried not to use "WHERE" clause on VARCHAR field.
  4. I tried to index some field which i frequently "WHERE"d
  5. I tried to use Prepared Statement just to make sure that the DBMS doesn't re-hash same query.

There are other things but i think they are DMBS-specific.

jancrot
Yeah its the stmt.executeQuery() part that takes that long. Thanks for your answer!
evermean