views:

1432

answers:

5

I have transactional replication running between two databases. I fear they have fallen slightly out of sync, but I don't know which records are affected. If I knew, I could fix it manually on the subscriber side.

SQL Server is giving me this message:

The row was not found at the Subscriber when applying the replicated command. (Source: MSSQLServer, Error number: 20598)

I've looked around to try to find out what table, or even better what record is causing the issue, but I can't find that information anywhere.

The most detailed data I've found so far is:

Transaction sequence number: 0x0003BB0E000001DF000600000000, Command ID: 1

But how do I find the table and row from that? Any ideas?

+2  A: 

If your database is not prohibitively large, I would stop replication, re-snapshot and then re-start replication. This technet article describes the steps.

If it got out of sync due to a user accidently changing data on the replica, I would set the necessary permissions to prevent this.

This replication article is worth reading.

Mitch Wheat
Yeah, I'm trying to avoid that since the tables are really large. We'll see if I have to end up going there.
jeremcc
Do you have the RedGate Data Compare tool? (it has a 14 day trial) That would at least let you find the rows quickly.
Mitch Wheat
Yep, that's exactly what I'm using. :-) The problem is that the current diff tells me x rows are missing (since replication is lagging), but I don't know which one is the culprit. I think I've found the workaround though. It has to do with a "SkipErrors" flag on the distributor. I'll post it soon.
jeremcc
+2  A: 

I'll answer my own question with a workaround I ended up using.

Unfortunately, I could not figure out which table was causing the issue through the SQL Server replication interface (or the Event Log for that matter). It just didn't say.

So the next thing I thought of was, "What if I could get replication to continue even though there is an error?" And lo and behold, there is a way. In fact, it's easy. There is a special Distribution Agent profile called "Continue on data consistency errors." If you enable that, then these types of errors will just be logged and passed on by. Once it is through applying the transactions and potentially logging the errors (I only encountered two), then you can go back and use RedGate SQL Data Compare (or some other tool) to compare your two databases, make any corrections to the subscriber and then start replication running again.

Keep in mind, for this to work, your publication database will need to be "quiet" during the part of the process where you diff and fix the subscriber database. Luckily, I had that luxury in this case.

jeremcc
A: 

of course if you check the error when the replication fails it also tells you which record is at fault and you could extract that data from the core system and just insert it on the subscriber.

This is better than skipping errors as with the SQL Data Compare it will lock the table for the comparison and if you have millions of rows this can take a long time to run.

Tris

A: 

Profile changing to "Continue on data consistency errors" wont work alwayz......... obviously it ll reduces or nullifies an error but u wont get the whole proper data... i will skip the rows by which an error occurs and hence you fails to get an exact data.

Shahid (DBA)

Shahid
+1  A: 

This gives you the table the error is against

use distribution go

select * from dbo.MSarticles where article_id in ( select article_id from MSrepl_commands where xact_seqno = 0x0003BB0E000001DF000600000000)

And this will give you the command (and the primary key (ie the row) the command was executing against)

exec sp_browsereplcmds @xact_seqno_start = '0x0003BB0E000001DF000600000000', @xact_seqno_end = '0x0003BB0E000001DF000600000000'

Matthew