views:

565

answers:

8

Hi All,

I'm using Java 1.6, JTDS 1.2.2 (also just tried 1.2.4 to no avail) and SQL Server 2005 to create a CallableStatement to run a stored procedure (with no parameters). I am seeing the Java wrapper running the same stored procedure 30% slower than using SQL Server Management Studio. I've run the MS SQL profiler and there is little difference in I/O between the two processes, so I don't think it's related to query plan caching.

The stored proc takes no arguments and returns no data. It uses a server-side cursor to calculate the values that are needed to populate a table.

I can't see how the calling a stored proc from Java should add a 30% overhead, surely it's just a pipe to the database that SQL is sent down and then the database executes it....Could the database be giving the Java app a different query plan??

I've posted to both the MSDN forums, and the sourceforge JTDS forums (topic: "stored proc slower in JTDS than direct in DB") I was wondering if anyone has any suggestions as to why this might be happening?

Thanks in advance,

-James

(N.B. Fear not, I will collate any answers I get in other forums together here once I find the solution)

Java code snippet:

sLogger.info("Preparing call...");
stmt = mCon.prepareCall("SP_WB200_POPULATE_TABLE_limited_rows");
sLogger.info("Call prepared.  Executing procedure...");
stmt.executeQuery();
sLogger.info("Procedure complete.");

I have run sql profiler, and found the following:

Java app : CPU: 466,514 Reads: 142,478,387 Writes: 284,078 Duration: 983,796

SSMS : CPU: 466,973 Reads: 142,440,401 Writes: 280,244 Duration: 769,851

(Both with DBCC DROPCLEANBUFFERS run prior to profiling, and both produce the correct number of rows)

So my conclusion is that they both execute the same reads and writes, it's just that the way they are doing it is different, what do you guys think?

It turns out that the query plans are significantly different for the different clients (the Java client is updating an index during an insert that isn't in the faster SQL client, also, the way it is executing joins is different (nested loops Vs. gather streams, nested loops Vs index scans, argh!)). Quite why this is, I don't know yet (I'll re-post when I do get to the bottom of it)

Epilogue

I couldn't get this to work properly. I tried homogenising the connection properties (arithabort, ansi_nulls etc) between the Java and Mgmt studio clients. It ended up the two different clients had very similar query/execution plans (but still with different actual plan_ids). I posted a summary of what I found to the MSDN SQL Server forums as I found differing performance not just between a JDBC client and management studio, but also between Microsoft's own command line client, SQLCMD, I also checked some more radical things like network traffic too, or wrapping the stored proc inside another stored proc, just for grins.

I have a feeling the problem lies somewhere in the way the cursor was being executed, and it was somehow giving rise to the Java process being suspended, but why a different client should give rise to this different locking/waiting behaviour when nothing else is running and the same execution plan is in operation is a little beyond my skills (I'm no DBA!).

As a result, I have decided that 4 days is enough of anyone's time to waste on something like this, so I will grudgingly code around it (if I'm honest, the stored procedure needed re-coding to be more incremental instead of re-calculating all data each week anyway), and chalk this one down to experience. I'll leave the question open, big thanks to everyone who put their hat in the ring, it was all useful, and if anyone comes up with anything further, I'd love to hear some more options...and if anyone finds this post as a result of seeing this behaviour in their own environments, then hopefully there's some pointers here that you can try yourself, and hope fully see further than we did.

I'm ready for my weekend now!

-James

A: 

Does the Java case include transmission of the results to the Java server (network overhead) plus some Java processing? A 12 minute query might produce quite a large amount of data.

djna
Thanks, I've updated my original description, but no, the stored proc doesn't return any data. Literally the call should fire the sql at the database and then block until the procedure is finished, and this just takes a lot longer in Java.
James B
A: 

If you are looking at the profiler and there is no difference between the executions then the difference must be with the client systems.

4 mins does seem like to long just to prepare a statement to send so the 12 min wait must cause some other effect -- no idea what it is.

Hogan
Thanks, I've updated my original description with a bit of code from the Java client.The 16 minutes is pure database time spent executing the stored procedure, hence the direct comparison with the database taking 12 minutes in Mgmt Studio. i.e. it is 16 minutes between seeing the messages "Executing procedure..." and "Procedure complete."
James B
+2  A: 

You can attach the Profiler and monitor for the events SQL:BatchCompleted and SP:Completed, with a filter on duration > 1000. Run the procedure from your Java client and from SSMS. Compare the Reads and the Writes of the two events (Java vs. SSMS). Are they significantly different? This would indicate considerably different execution paths or plans, with significant difference in I/O.

Also try to capture the Showplan XML event of the two and compare the plans (save the event as a .sqlplan file, open it in SSMS to easy analysis). Do they have similar plans? Are there wild differences in Estimate vs. Actual (rows, rewinds, rebinds)? Do they have same degree of parallelism? The plans can aso be retrieved from sys.dm_exec_requests view.

Are there any warning events raised, like Missing Column Statistics, Sort Warnings, Hash Warning, Execution Warnings, Blocked Process?

the point is that you have at your disposal a whole arsenal of investigation tools. Once you find the root cause of the difference, you can trace it down to what is different between your Java environment settings and the SSMS environment (ADO.Net SqlClient). Things like default transaction isolation level, ANSI settings etc etc.

Remus Rusanu
Thanks, this is definitely along the same lines I have been thinking, I have run sql profiler, and found the following: Java app: CPU: 466,514 Reads: 142,478,387 Writes: 284,078 Duration: 983,796 SSMS: CPU: 466,973 Reads: 142,440,401 Writes: 280,244 Duration: 769,851 So I concluded from this that they're doing similar work, it's just how they are doing it that's causing the bottleneck? I've just asked a new question as to how to get the xml showplan using a JDBC connection: http://stackoverflow.com/questions/1791494/is-it-possible-to-get-the-query-plan-out-using-jdbc-on-sql-server
James B
They have very similar IO pattern and CPU is almost identical. The only significant differente is the duration. This would indicate Java application does more waiting. Run `DBCC SQLPERF('sys.dm_os_wait_stats', CLEAR)` before each test, then collect the `sys.dm_os_wait_stats` *after* each test, it should point out what if being waited on.
Remus Rusanu
WOAH!....The plans are different for the different clients! (keeping in mind that the stored proc call is _exactly the same_!) Okay, now I have something to go on (instead of the usual bug report of "it's slow"). I'll close off this question and if I need to, I'll ask a more targeted one regarding query plan re-use and reference it here. Thanks for all your help Remus, I'll credit you with the asnwer for this one.
James B
+1  A: 

Checking: Is your problem that two applications (SSMS, Java) are making the exact same identical call to SQL Server, and SQL Server is acting differently for each? If so, I hit things like this every year or two, and they hurt my brain for days.

Once, I ultimately isolated each process call and logging everything for the entire process in Profiler. I eventually noticed that the Login event (under TextData) showed a host of information, like so:

-- network protocol: TCP/IP
set quoted_identifier on
set arithabort off
set numeric_roundabort off
set ansi_warnings on
set ansi_padding on
set ansi_nulls on
set concat_null_yields_null on
set cursor_close_on_commit off
set implicit_transactions off
set language us_english
set dateformat mdy
set datefirst 7
set transaction isolation level read committed

The "Existing Connection" event will show this information as well--but, sometimes immediately subsequent calls (batches, RPCs, I disremember just now) are sent [ISQL or OSQL did this, I think] to immediately reset some of these -- Arithabort and Quoted_Identifier seem to be favorites, and other SET options also get modified depending on the settings or requirements of whatever connectivity protocols your application's database interface is using.

Another one: some settings are kept as attributes of a procedure at "create" time, and others are factored in at compile time. On the one hand, your connection's SET values may be being overwritten by the configuration saved at the time the procedure was created; on the other hand, your two connections may differ so much that two execution plans are generated for one procedure. (All of this information is, after sufficient research, available in the sys. tables and DMVs.)

In short, it seems to me that SQL obscurities are messing you up. To this day, I loathe all these goombah settings. Things below my notice keep messing around with them [I mean, really, what fool would set implicit_transaction for a connection pool on? But once they did...] and it's hard to build structures when the ground (rules) keep changing out from underneath you. After all, remember what the guy said about building castles in a swamp...

Philip Kelley
Thanks Philip, this is confirming some suspicions that I was having, I had originally discounted the "goombah" settings, as I thought the query plans were different between the different clients, however after several runs, they converge (I think that's due to the stats that are collected on the table usage over time). My plan today is to look into the "goombah" (loving that term, BTW) settings, especially as I found the same behaviour with sqlcmd as with my Java client...I'm still on the case and I have to solve this one, I'll report back what I find. cheers,-James
James B
I'd also originally discounted the settings because they look so inoccuous, especially for a stored proc execution, although there are subtle differences.
James B
If memory serves, I think Arithabort was a key malefactor, as SQL thinks (*has* to think) that Off or On makes a big difference. Or if not this, it was one like it.
Philip Kelley
Artithabort is definitely a well-known culprit elsewhere on the net for this kind of behaviour (which is why I mentioned it), however, I tried it on and off in both cases and still no joy. I've posted to an MSDN list on this (as I was seeing the same behaviour with sqlcmd), but I'm afraid I'm going to have to admit defeat (that's tough to admit, especially on this forum!) on this one and code my way around it.
James B
A: 

Sorry, I've not found a correct answer to this, so I don't want to allocate any of these as correct, so I am going to mark this answer as correct, and wish anyone luck who comes across anything similar!

James B
@downvoter, I think the downvote on this is pretty rude, I was just trying to wrap up the discussion
James B
A: 

Did you know that Microsoft ship JDBC drivers for their databases?

These may be more performant.

Obviously.. you may have resolved the problem by now.

Fortyrunner
A: 

I have similar problem, where executing stored proc from java takes forever (1 hour plus), while from SSMS it takes roughly 35 min. The stored proc is not meant to return any data, just does a lot of manipulation within itself. I'm using java 1.5 to call the stored proc via the merlia driver. Has the original poster/anyone else been able to solve this problem?

PG2
@PG2 I've not tried the Merlia driver, have you tried any of the things I tried above? I didn't get any satisfactory conclusion to this whole sorry episode! The only thing I can suggest is running traces on the DB and see if the query plans etc. are the same..alternatively, give the jtds driver a go and see if you get the same problem. Best of luck with this...sorry I can't be more positive!
James B
A: 

I recall having a similar issue a while ago, because JTDS was silently converting a string parameter to Unicode or something similar. As a result of that conversion, SQL Server was unable to use the index which is was using when we ran the stored proc from SSMS.

HIH

AlexKuznetsov