View Issue Details

IDProjectCategoryView StatusLast Update
0001590SymmetricDSBugpublic2020-05-21 12:56
Reporterazarubkin Assigned To 
Prioritynormal 
Status resolvedResolutionfixed 
Product Version3.5.13 
Fixed in Version3.5.22 
Summary0001590: Router job deadlocked
DescriptionThere are deadlocks happening from time to time. The log file says:
{{{
2014-02-18 09:05:45,406 ERROR [main] [RouterJob] [main-job-17] Transaction (Process ID 52) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction.
org.jumpmind.db.sql.SqlException: Transaction (Process ID 52) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction.
    at org.jumpmind.db.sql.AbstractSqlTemplate.translate(AbstractSqlTemplate.java:288)
    at org.jumpmind.db.sql.AbstractSqlTemplate.translate(AbstractSqlTemplate.java:279)
    at org.jumpmind.db.sql.JdbcSqlTemplate.execute(JdbcSqlTemplate.java:417)
    at org.jumpmind.db.sql.JdbcSqlTemplate.update(JdbcSqlTemplate.java:285)
    at org.jumpmind.db.sql.AbstractSqlTemplate.update(AbstractSqlTemplate.java:212)
    at org.jumpmind.symmetric.service.impl.OutgoingBatchService.updateAbandonedRoutingBatches(OutgoingBatchService.java:119)
    at org.jumpmind.symmetric.service.impl.RouterService.routeData(RouterService.java:156)
    at org.jumpmind.symmetric.job.RouterJob.doJob(RouterJob.java:40)
    at org.jumpmind.symmetric.job.AbstractJob.invoke(AbstractJob.java:180)
    at org.jumpmind.symmetric.job.AbstractJob.run(AbstractJob.java:224)
    at org.springframework.scheduling.support.DelegatingErrorHandlingRunnable.run(DelegatingErrorHandlingRunnable.java:53)
    at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
    at java.util.concurrent.FutureTask.runAndReset(Unknown Source)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(Unknown Source)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
    at java.lang.Thread.run(Unknown Source)
Caused by: java.sql.SQLException: Transaction (Process ID 52) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction.
    at net.sourceforge.jtds.jdbc.SQLDiagnostic.addDiagnostic(SQLDiagnostic.java:368)
    at net.sourceforge.jtds.jdbc.TdsCore.tdsErrorToken(TdsCore.java:2820)
    at net.sourceforge.jtds.jdbc.TdsCore.nextToken(TdsCore.java:2258)
    at net.sourceforge.jtds.jdbc.TdsCore.getMoreResults(TdsCore.java:632)
    at net.sourceforge.jtds.jdbc.JtdsStatement.processResults(JtdsStatement.java:584)
    at net.sourceforge.jtds.jdbc.JtdsStatement.executeSQL(JtdsStatement.java:546)
    at net.sourceforge.jtds.jdbc.JtdsPreparedStatement.execute(JtdsPreparedStatement.java:560)
    at org.apache.commons.dbcp.DelegatingPreparedStatement.execute(DelegatingPreparedStatement.java:172)
    at org.apache.commons.dbcp.DelegatingPreparedStatement.execute(DelegatingPreparedStatement.java:172)
    at org.jumpmind.db.sql.JdbcSqlTemplate$5.execute(JdbcSqlTemplate.java:308)
    at org.jumpmind.db.sql.JdbcSqlTemplate$5.execute(JdbcSqlTemplate.java:285)
    at org.jumpmind.db.sql.JdbcSqlTemplate.execute(JdbcSqlTemplate.java:415)
    ... 15 more
}}}

And this is the information from Microsoft SQL Server log:
{{{
2014-02-18 09:05:45.39 spid18s deadlock-list
2014-02-18 09:05:45.39 spid18s deadlock victim=process3ec52e8
2014-02-18 09:05:45.39 spid18s process-list
2014-02-18 09:05:45.39 spid18s process id=process3ec52e8 taskpriority=0 logused=0 waitresource=PAGE: 5:1:631780 waittime=828 ownerId=124466588 transactionname=UPDATE lasttranstarted=2014-02-18T09:05:44.443 XDES=0x1291d55d0 lockMode=U schedulerid=2 kpid=2136 status=suspended spid=52 sbid=0 ecid=0 priority=0 transcount=2 lastbatchstarted=2014-02-18T09:05:44.443 lastbatchcompleted=2014-02-18T09:05:44.443 clientapp=jTDS hostname=SERVER-MAIN-1 hostpid=123 loginname=symmetricds isolationlevel=read committed (2) xactid=124466588 currentdb=5 lockTimeout=4294967295 clientoption1=671088672 clientoption2=128056
2014-02-18 09:05:45.39 spid18s executionStack
2014-02-18 09:05:45.39 spid18s frame procname=adhoc line=1 stmtstart=78 sqlhandle=0x020000000755fa071b8c380923157f23928fec9bb98b3ec9
2014-02-18 09:05:45.39 spid18s update sym_outgoing_batch set status= @P0 where status = @P1
2014-02-18 09:05:45.39 spid18s inputbuf
2014-02-18 09:05:45.39 spid18s (@P0 nvarchar(4000),@P1 nvarchar(4000))update sym_outgoing_batch set status= @P0 where status = @P1
2014-02-18 09:05:45.39 spid18s process id=process5549438 taskpriority=0 logused=760 waitresource=PAGE: 5:1:631232 waittime=828 ownerId=124466602 transactionname=implicit_transaction lasttranstarted=2014-02-18T09:05:44.550 XDES=0x80034dd0 lockMode=IX schedulerid=8 kpid=556 status=suspended spid=54 sbid=0 ecid=0 priority=0 transcount=2 lastbatchstarted=2014-02-18T09:05:44.550 lastbatchcompleted=2014-02-18T09:05:44.550 clientapp=jTDS hostname=SERVER-MAIN-1 hostpid=123 loginname=symmetricds isolationlevel=read committed (2) xactid=124466602 currentdb=5 lockTimeout=4294967295 clientoption1=671088672 clientoption2=128058
2014-02-18 09:05:45.39 spid18s executionStack
2014-02-18 09:05:45.39 spid18s frame procname=adhoc line=1 stmtstart=832 sqlhandle=0x02000000ee7e1d2d3e775b0d293df106757e957f0d46bd2f
2014-02-18 09:05:45.39 spid18s update sym_outgoing_batch set status= @P0 , load_id= @P1 , extract_job_flag= @P2 , load_flag= @P3 , error_flag= @P4 , byte_count= @P5 , extract_count= @P6 , sent_count= @P7 , load_count= @P8 , data_event_count= @P9 , reload_event_count= @P10 , insert_event_count= @P11 , update_event_count= @P12 , delete_event_count= @P13 , other_event_count= @P14 , ignore_count= @P15 , router_millis= @P16 , network_millis= @P17 , filter_millis= @P18 , load_millis= @P19 , extract_millis= @P20 , sql_state= @P21 , sql_code= @P22 , sql_message= @P23 , failed_data_id= @P24 , last_update_hostname= @P25 , last_update_time= @P26 where batch_id= @P27 and node_id= @P28
2014-02-18 09:05:45.39 spid18s inputbuf
2014-02-18 09:05:45.39 spid18s (@P0 nvarchar(4000),@P1 bigint,@P2 decimal(38,0),@P3 decimal(38,0),@P4 decimal(38,0),@P5 bigint,@P6 bigint,@P7 bigint,@P8 bigint,@P9 bigint,@P10 bigint,@P11 bigint,@P12 bigint,@P13 bigint,@P14 bigint,@P15 bigint,@P16 bigint,@P17 bigint,@P18 bigint,@P19 bigint,@P20 bigint,@P21 nvarchar(4000),@P22 decimal(38,0),@P23 nvarchar(4000),@P24 bigint,@P25 nvarchar(4000),@P26 datetime,@P27 decimal(38,0),@P28 nvarchar(4000))update sym_outgoing_batch set status= @P0 , load_id= @P1 , extract_job_flag= @P2 , load_flag= @P3 , error_flag= @P4 , byte_count= @P5 , extract_count= @P6 , sent_count= @P7 , load_count= @P8 , data_event_count= @P9 , reload_event_count= @P10 , insert_event_count= @P11 , update_event_count= @P12 , delete_event_count= @P13 , other_event_count= @P14 , ignore_count= @P15 , router_millis= @P16 , network_millis= @P17 , filter_millis= @P18 , load_millis= @P19 , extract_millis= @P20 , sql_state= @P21 , sql_code= @P22 , sql_message= @P23 , failed_data_id= @P24 , last_update_hostname= @P25 , last_update_time=
2014-02-18 09:05:45.39 spid18s resource-list
2014-02-18 09:05:45.39 spid18s pagelock fileid=1 pageid=631780 dbid=5 objectname=guard.symmetricds.sym_outgoing_batch id=lock824c4280 mode=IX associatedObjectId=72057594055557120
2014-02-18 09:05:45.39 spid18s owner-list
2014-02-18 09:05:45.39 spid18s owner id=process5549438 mode=IX
2014-02-18 09:05:45.39 spid18s waiter-list
2014-02-18 09:05:45.39 spid18s waiter id=process3ec52e8 mode=U requestType=wait
2014-02-18 09:05:45.39 spid18s pagelock fileid=1 pageid=631232 dbid=5 objectname=guard.symmetricds.sym_outgoing_batch id=lock8c1c4f80 mode=U associatedObjectId=72057594055557120
2014-02-18 09:05:45.39 spid18s owner-list
2014-02-18 09:05:45.39 spid18s owner id=process3ec52e8 mode=U
2014-02-18 09:05:45.39 spid18s waiter-list
2014-02-18 09:05:45.39 spid18s waiter id=process5549438 mode=IX requestType=wait
}}}

Hope it helps you to understand what's going on.
Tagsdialect: sql-server, routing

Relationships

related to 0001708 closedchenson Disable page level locking and enable row level locking on sym_data and sym_data_event 

Activities

chenson

2014-02-18 07:21

administrator   ~0000481

How often does this happen? Do you ever do maintenance on your sym_ tables?

I think SQL Server is escalating a lock to a page lock which causes the deadlock.

You could try the following:

ALTER INDEX ALL ON SYM_OUTGOING_BATCH SET (ALLOW_ROW_LOCKS = ON);
ALTER TABLE SYM_OUTGOING_BATCH SET (LOCK_ESCALATION = DISABLE);

azarubkin

2014-02-18 08:17

reporter   ~0000482

It happens several times per day, usually. I haven't touched sym_ tables except initial configuration. Purge job runs every 5 days, however, as far as I remember default configuration.

I think it's better to explicitly disable page locks on indices, since row locks are already enabled by default (I checked):

ALTER INDEX ... WITH (ALLOW_PAGE_LOCKS = OFF, ALLOW_ROW_LOCKS = ON);

And then the second line about disabling lock escalation... Should I try them one by one, or both at the same time?

chenson

2014-02-18 08:26

administrator   ~0000483

I'd try one by one. I'd also set purge to be more aggressive and periodically reorg the runtime sym_ tables.

ALTER INDEX ALL ON SYM_INCOMING_BATCH REORGANIZE;
ALTER INDEX ALL ON SYM_OUTGOING_BATCH REORGANIZE;
ALTER INDEX ALL ON SYM_DATA_GAP REORGANIZE;
ALTER INDEX ALL ON SYM_DATA_EVENT_BATCH REORGANIZE;
ALTER INDEX ALL ON SYM_DATA REORGANIZE;

azarubkin

2014-02-18 09:09

reporter   ~0000484

Thank you, I'll try it.

By the way, SQL Server never escalates row lock to page lock. It escalates them to table lock. So, page locks were there from the very beginning of transactions, it's not the escalation, it's the query plan.

Moreover, if the escalation fails, the transaction continues and escalation is retried later. Here we see a deadlock on sym_idx_ob_node_status index, it's another situation.

It emerges because SQL Server needs to update sym_idx_ob_node_status index when we update rows in table. But it needs to scan (and possibly lock) many pages in the index, because the rows where status = @P1 are scattered. I think if the index was created as (status; node_id), i.e. the order of columns was reversed, SQL Server would need to scan lesser amount of rows (and maybe avoid page locks at all).
Moreover, we wouldn't need another index (sym_idx_ob_status) then, because the first column from sym_idx_ob_node_status could be used instead.

Please consider this proposal. I think it's better than disabling page locks because it places less burden on server.

azarubkin

2014-02-18 09:11

reporter   ~0000485

Last edited: 2014-02-18 09:12

View 2 revisions

Of course this will work if there are no queries like "SELECT * FROM SYM_OUTGOING_BATCH WHERE NODE_ID = @P0". In that case current index on (node_id; status) should be preserved or another index on (node_id) only should be created.

I think I can assist you in checking different index configurations, if you want.

chenson

2014-02-18 20:36

administrator   ~0000487

Any help in optimization is very much appreciated. If you want to do some analysis yourself, you could actually modify the symmetric-schema.xml in symmetric-core-X.XX.jar to change the indexes. Jar files can be edited by most archive tools. I think you could also extract symmetric-schema.xml and put it in the patches directory (versus updating the jar file). Editing the schema xml should be pretty straightforward. We'd be happy to review any possibly apply to the next release improvements in the indexes.

Issue History

Date Modified Username Field Change
2014-02-18 01:58 azarubkin New Issue
2014-02-18 07:21 chenson Note Added: 0000481
2014-02-18 08:17 azarubkin Note Added: 0000482
2014-02-18 08:26 chenson Note Added: 0000483
2014-02-18 09:09 azarubkin Note Added: 0000484
2014-02-18 09:11 azarubkin Note Added: 0000485
2014-02-18 09:12 azarubkin Note Edited: 0000485 View Revisions
2014-02-18 20:36 chenson Note Added: 0000487
2019-04-22 12:47 elong Tag Attached: routing
2019-04-22 12:47 elong Tag Attached: mssql
2019-04-23 16:49 admin Tag Renamed mssql => dialect: mssql
2019-04-24 08:50 admin Tag Renamed dialect: mssql => dialect: sql-server
2020-05-21 12:55 elong Relationship added related to 0001708
2020-05-21 12:56 elong Status new => resolved
2020-05-21 12:56 elong Resolution open => fixed
2020-05-21 12:56 elong Fixed in Version => 3.5.22