View Issue Details

IDProjectCategoryView StatusLast Update
0001597SymmetricDSBugpublic2014-03-02 14:49
Reportermsmith@cbnco.com Assigned Tochenson  
Prioritynormal 
Status closedResolutionfixed 
Product Version3.5.11 
Target Version3.5.18Fixed in Version3.5.18 
Summary0001597: Overlapping database pull processes after connection outage - Default pull.thread.per.server.count=1
DescriptionHi, we've been testing SymmetricDS for an application with spotty network connections. We used netem to set the bandwidth down to 512 kbit with 3% packet loss, inserted 20,000 rows with avg 19K blob each row, then, in the middle of a batch pull, blocked connections in both directions for half an hour using iptables.

The server side OS closed the connection but the client side OS did not. This is a scenario we see in production with other technologies like Oracle Streams.

As you'd expect, Symmetric kept waiting for the pull to complete. After pull.lock.timeout.ms expired (2 hours), the client started a new pull job for the same batch.

A few minutes later - 2 hours after breaking the connection - http.timeout.ms expired and the original pull completed with a java.net.SocketTimeoutException: Read timed out.

At this point, another pull job was started for the same batch. So two pull jobs were running concurrently, for different batches, offset by a few minutes, and the first worker was downloading something the second worker had already grabbed. The first one to complete committed its records to the database.

This kept going until there were no more batches to transfer - every subsequent batch was downloaded twice.

In the end everything recovered and the synchronization was successful. I just don't think it was the intended behaviour to transfer the batches twice so I figured I'd file a bug and see what you think. On a slow, unreliable link it could make a big difference.
Steps To ReproduceSet up an environment with one central node and one remote node. Each node has a SymmetricDS instance and an Oracle database.

On both ends, set up the network emulator for 800 ms RTT, 512 kbit symmetric bandwidth, 3% packet loss:

tc qdisc add dev eth0 root handle 1: htb default 12
tc class add dev eth0 parent 1:1 classid 1:12 htb rate 512kbit ceil 512kbit
tc qdisc add dev eth0 parent 1:12 netem delay 400ms 80ms distribution normal loss 5% 25% (seems to work out to 3% loss)

Insert 20,000 records with avg 19K blob (total > 380 MB data) as separate transactions.

Use "netstat -nap | grep :<symmetric server port>" on the server to look for non-zero SendQ values - this indicates a batch is being transferred. Make a note of the client's source port. In the middle of the batch block traffic on both ends:

iptables -I INPUT -s <other node IP> -j DROP

Wait 30+ min then remove the rule: iptables -D INPUT -s <other node IP> -j DROP

If you're lucky the server will have closed the connection - the client's port number it will no longer show up as ESTABLISHED in "netstat -nap | grep :<symmetric server port>". Meanwhile on the client the connection will still show up as ESTABLISHED.

Wait for pull.lock.timeout.ms to expire (default 2 hours).

Wait again for http.timeout.ms to expire (2 hours after the first iptables rule).

Check the SymmetricDS log on the client and see each batch being transferred in parallel by two threads.
Additional InformationLogs from the client. Batch 17469 was being transferred when the connection was blocked around 12:40:00. The connection was restored at 13:11. The pull lock timeout expired at 14:18 and the original HTTP timeout expired at 14:40.

2014-02-19 12:36:32,622 INFO [Sym_Client-Client01] [DataProcessor] [sym_client-client01-pull-1] Batch 17468 for node 'Client01' has been transfer to stage processing for 1075 secon
ds. The following stats have been gathered: {STATEMENTCOUNT=1410, BYTECOUNT=35401329, LINENUMBER=1410}
2014-02-19 12:37:32,842 INFO [Sym_Client-Client01] [DataProcessor] [sym_client-client01-pull-1] Batch 17468 for node 'Client01' has been transfer to stage processing for 1135 secon
ds. The following stats have been gathered: {STATEMENTCOUNT=1625, BYTECOUNT=40782177, LINENUMBER=1625}
2014-02-19 12:38:48,899 INFO [Sym_Client-Client01] [RouterService] [Sym_Client-Client01-job-10] Routed 1778 data events in 501 ms
2014-02-19 12:39:41,752 INFO [Sym_Client-Client01] [DataProcessor] [sym_client-client01-pull-1] Batch 17469 for node 'Client01' has been transfer to stage processing for 60 seconds
. The following stats have been gathered: {STATEMENTCOUNT=294, BYTECOUNT=7573892, LINENUMBER=294}
2014-02-19 12:40:00,082 INFO [Sym_Client-Client01] [PurgeService] [Sym_Client-Client01-job-12] 6 stats rows were purged
2014-02-19 12:41:30,777 INFO [Sym_Client-Client01] [DataGapDetector] [Sym_Client-Client01-job-20] Found a gap in data_id from 346766 to 347539. Skipping it because the gap expired
2014-02-19 12:42:31,479 INFO [Sym_Client-Client01] [RouterService] [Sym_Client-Client01-job-18] Routed 1 data events in 127 ms
2014-02-19 12:44:03,815 WARN [Sym_Client-Client01] [FileSyncService] [sym_client-client01-file_pull-2] Could not communicate with Sym_Server:Server:Server at http://172.20.29.52:31
415/sync/Sym_Server-Server because: Connection timed out

...

2014-02-19 13:12:17,287 INFO [Sym_Client-Client01] [PushService] [sym_client-client01-push-1] Push data sent to Sym_Server:Server:Server
2014-02-19 13:12:18,405 INFO [Sym_Client-Client01] [PushService] [sym_client-client01-push-1] Pushed data to Sym_Server:Server:Server. 2 data and 2 batches were processed
2014-02-19 13:12:31,494 INFO [Sym_Client-Client01] [RouterService] [Sym_Client-Client01-job-5] Routed 1 data events in 129 ms
2014-02-19 13:13:32,283 INFO [Sym_Client-Client01] [PushService] [sym_client-client01-push-3] Push data sent to Sym_Server:Server:Server
2014-02-19 13:13:33,439 INFO [Sym_Client-Client01] [PushService] [sym_client-client01-push-3] Pushed data to Sym_Server:Server:Server. 1 data and 1 batches were processed
2014-02-19 13:17:44,898 INFO [Sym_Client-Client01] [DataGapDetector] [Sym_Client-Client01-job-6] Found a gap in data_id from 350353 to 350971. Skipping it because the gap expired
2014-02-19 13:20:00,115 INFO [Sym_Client-Client01] [PurgeService] [Sym_Client-Client01-job-11] 6 stats rows were purged
2014-02-19 13:25:00,077 INFO [Sym_Client-Client01] [PurgeService] [Sym_Client-Client01-job-9] 12 stats rows were purged
2014-02-19 13:27:31,177 INFO [Sym_Client-Client01] [DataGapDetector] [Sym_Client-Client01-job-14] Found a gap in data_id from 350973 to 351709. Skipping it because the gap expired

...

2014-02-19 14:18:27,441 WARN [Sym_Client-Client01] [StatisticManager] [sym_client-client01-pull-2] Starting a new process even though the previous one (processType=Database Pull,so
urceNodeId=Server,targetNodeId=Client01,status=Transferring,startTime=Wed Feb 19 12:18:25 EST 2014) had not finished
2014-02-19 14:18:40,487 WARN [Sym_Client-Client01] [IncomingBatchService] [sym_client-client01-pull-2] Skipping batch Server-17476
2014-02-19 14:18:42,083 INFO [Sym_Client-Client01] [RouterService] [Sym_Client-Client01-job-17] Routed 8 data events in 96 ms
2014-02-19 14:19:41,752 INFO [Sym_Client-Client01] [DataProcessor] [sym_client-client01-pull-2] Batch 17468 for node 'Client01' has been transfer to stage processing for 60 seconds
. The following stats have been gathered: {STATEMENTCOUNT=134, BYTECOUNT=3348804, LINENUMBER=134}
2014-02-19 14:20:00,119 INFO [Sym_Client-Client01] [PurgeService] [Sym_Client-Client01-job-8] 6 stats rows were purged
2014-02-19 14:20:42,627 INFO [Sym_Client-Client01] [DataProcessor] [sym_client-client01-pull-2] Batch 17468 for node 'Client01' has been transfer to stage processing for 121 second
s. The following stats have been gathered: {STATEMENTCOUNT=299, BYTECOUNT=7484376, LINENUMBER=299}
2014-02-19 14:21:43,097 INFO [Sym_Client-Client01] [DataProcessor] [sym_client-client01-pull-2] Batch 17468 for node 'Client01' has been transfer to stage processing for 182 second
s. The following stats have been gathered: {STATEMENTCOUNT=514, BYTECOUNT=12930054, LINENUMBER=514}
2014-02-19 14:22:43,535 INFO [Sym_Client-Client01] [DataProcessor] [sym_client-client01-pull-2] Batch 17468 for node 'Client01' has been transfer to stage processing for 242 second
s. The following stats have been gathered: {STATEMENTCOUNT=728, BYTECOUNT=18341337, LINENUMBER=728}

...

2014-02-19 14:27:45,010 INFO [Sym_Client-Client01] [DataProcessor] [sym_client-client01-pull-2] Batch 17468 for node 'Client01' has been transfer to stage processing for 543 second
s. The following stats have been gathered: {STATEMENTCOUNT=1742, BYTECOUNT=43763830, LINENUMBER=1742}
2014-02-19 14:27:55,546 WARN [Sym_Client-Client01] [IncomingBatchService] [sym_client-client01-pull-2] Skipping batch Server-17468
2014-02-19 14:27:57,050 WARN [Sym_Client-Client01] [StagedResource] [sym_client-client01-pull-2] We had to delete /apps/symmetric-pro/bin/../tmp/Sym_Client-Client01/incoming/Server
/0000017469.create because it already existed
2014-02-19 14:28:41,024 INFO [Sym_Client-Client01] [PushService] [sym_client-client01-push-8] Push data sent to Sym_Server:Server:Server
2014-02-19 14:28:46,657 INFO [Sym_Client-Client01] [PushService] [sym_client-client01-push-8] Pushed data to Sym_Server:Server:Server. 1 data and 1 batches were processed
2014-02-19 14:28:57,927 INFO [Sym_Client-Client01] [DataProcessor] [sym_client-client01-pull-2] Batch 17469 for node 'Client01' has been transfer to stage processing for 60 seconds
. The following stats have been gathered: {STATEMENTCOUNT=215, BYTECOUNT=5485872, LINENUMBER=215}
2014-02-19 14:29:58,527 INFO [Sym_Client-Client01] [DataProcessor] [sym_client-client01-pull-2] Batch 17469 for node 'Client01' has been transfer to stage processing for 121 second
s. The following stats have been gathered: {STATEMENTCOUNT=382, BYTECOUNT=9768726, LINENUMBER=382}

...

2014-02-19 14:36:00,302 INFO [Sym_Client-Client01] [DataProcessor] [sym_client-client01-pull-2] Batch 17469 for node 'Client01' has been transfer to stage processing for 483 second
s. The following stats have been gathered: {STATEMENTCOUNT=1666, BYTECOUNT=42006814, LINENUMBER=1666}
2014-02-19 14:37:13,414 INFO [Sym_Client-Client01] [RouterService] [Sym_Client-Client01-job-18] Routed 1786 data events in 515 ms
2014-02-19 14:38:11,796 INFO [Sym_Client-Client01] [DataProcessor] [sym_client-client01-pull-2] Batch 17470 for node 'Client01' has been transfer to stage processing for 60 seconds
. The following stats have been gathered: {STATEMENTCOUNT=342, BYTECOUNT=8693603, LINENUMBER=342}
2014-02-19 14:39:28,946 INFO [Sym_Client-Client01] [DataProcessor] [sym_client-client01-pull-2] Batch 17470 for node 'Client01' has been transfer to stage processing for 137 second
s. The following stats have been gathered: {STATEMENTCOUNT=546, BYTECOUNT=13882922, LINENUMBER=546}


2014-02-19 14:40:21,644 ERROR [Sym_Client-Client01] [DataLoaderService] [sym_client-client01-pull-1] Failed while parsing batch
org.jumpmind.exception.IoException: java.net.SocketTimeoutException: Read timed out
        at org.jumpmind.symmetric.io.data.reader.ProtocolDataReader.readNext(ProtocolDataReader.java:283)
        at org.jumpmind.symmetric.io.data.reader.ProtocolDataReader.nextData(ProtocolDataReader.java:335)
        at org.jumpmind.symmetric.io.data.DataProcessor.forEachDataInTable(DataProcessor.java:185)
        at org.jumpmind.symmetric.io.data.DataProcessor.forEachTableInBatch(DataProcessor.java:164)
        at org.jumpmind.symmetric.io.data.DataProcessor.process(DataProcessor.java:114)
        at org.jumpmind.symmetric.service.impl.DataLoaderService.loadDataFromTransport(DataLoaderService.java:394)
        at org.jumpmind.symmetric.service.impl.DataLoaderService.loadDataFromPull(DataLoaderService.java:265)
        at org.jumpmind.symmetric.service.impl.PullService.execute(PullService.java:129)
        at org.jumpmind.symmetric.service.impl.NodeCommunicationService$2.run(NodeCommunicationService.java:307)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1156)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:626)
        at java.lang.Thread.run(Thread.java:780)
Caused by: java.net.SocketTimeoutException: Read timed out
        at java.net.SocketInputStream.socketRead0(Native Method)
        at java.net.SocketInputStream.read(SocketInputStream.java:161)
...

2014-02-19 14:40:25,947 INFO [Sym_Client-Client01] [PullService] [sym_client-client01-pull-1] Pull data received from Sym_Server:Server:Server. 1779 rows and 2 batches were proces
sed
2014-02-19 14:40:27,761 WARN [Sym_Client-Client01] [StatisticManager] [sym_client-client01-pull-3] Starting a new process even though the previous one (processType=Database Pull,so
urceNodeId=Server,targetNodeId=Client01,status=Transferring,startTime=Wed Feb 19 14:18:27 EST 2014) had not finished
2014-02-19 14:40:30,395 INFO [Sym_Client-Client01] [DataProcessor] [sym_client-client01-pull-2] Batch 17470 for node 'Client01' has been transfer to stage processing for 199 second
s. The following stats have been gathered: {STATEMENTCOUNT=739, BYTECOUNT=18696396, LINENUMBER=739}
2014-02-19 14:40:30,418 INFO [Sym_Client-Client01] [PullService] [sym_client-client01-pull-3] Sym_Server:Server:Server at http://172.20.29.52:31415/sync/Sym_Server-Server was busy
2014-02-19 14:40:30,419 WARN [Sym_Client-Client01] [DefaultOfflineClientListener] [sym_client-client01-pull-3] Node 'Server' was too busy to accept the connection
2014-02-19 14:41:30,857 INFO [Sym_Client-Client01] [DataProcessor] [sym_client-client01-pull-2] Batch 17470 for node 'Client01' has been transfer to stage processing for 259 second
s. The following stats have been gathered: {STATEMENTCOUNT=957, BYTECOUNT=24180681, LINENUMBER=957}
2014-02-19 14:41:38,229 WARN [Sym_Client-Client01] [IncomingBatchService] [sym_client-client01-pull-4] Skipping batch Server-17478
2014-02-19 14:41:38,246 WARN [Sym_Client-Client01] [IncomingBatchService] [sym_client-client01-pull-4] Skipping batch Server-17480
2014-02-19 14:41:38,266 WARN [Sym_Client-Client01] [IncomingBatchService] [sym_client-client01-pull-4] Skipping batch Server-17481
2014-02-19 14:41:38,285 WARN [Sym_Client-Client01] [IncomingBatchService] [sym_client-client01-pull-4] Skipping batch Server-17482
2014-02-19 14:41:38,302 WARN [Sym_Client-Client01] [IncomingBatchService] [sym_client-client01-pull-4] Skipping batch Server-17485
2014-02-19 14:41:38,323 WARN [Sym_Client-Client01] [IncomingBatchService] [sym_client-client01-pull-4] Skipping batch Server-17487
2014-02-19 14:41:38,342 WARN [Sym_Client-Client01] [IncomingBatchService] [sym_client-client01-pull-4] Skipping batch Server-17489
2014-02-19 14:41:38,368 WARN [Sym_Client-Client01] [IncomingBatchService] [sym_client-client01-pull-4] Skipping batch Server-17491
2014-02-19 14:41:46,415 INFO [Sym_Client-Client01] [RouterService] [Sym_Client-Client01-job-3] Routed 2 data events in 386 ms
2014-02-19 14:42:31,326 INFO [Sym_Client-Client01] [DataProcessor] [sym_client-client01-pull-2] Batch 17470 for node 'Client01' has been transfer to stage processing for 320 second
s. The following stats have been gathered: {STATEMENTCOUNT=1167, BYTECOUNT=29440354, LINENUMBER=1167}
2014-02-19 14:42:36,961 INFO [Sym_Client-Client01] [RouterService] [Sym_Client-Client01-job-8] Routed 1 data events in 104 ms
2014-02-19 14:42:41,220 INFO [Sym_Client-Client01] [DataProcessor] [sym_client-client01-pull-4] Batch 17469 for node 'Client01' has been transfer to stage processing for 62 seconds
. The following stats have been gathered: {STATEMENTCOUNT=6, BYTECOUNT=159930, LINENUMBER=6}
2014-02-19 14:43:31,709 INFO [Sym_Client-Client01] [DataProcessor] [sym_client-client01-pull-2] Batch 17470 for node 'Client01' has been transfer to stage processing for 380 second
s. The following stats have been gathered: {STATEMENTCOUNT=1378, BYTECOUNT=34707245, LINENUMBER=1378}
2014-02-19 14:43:42,942 INFO [Sym_Client-Client01] [PushService] [sym_client-client01-push-3] Push data sent to Sym_Server:Server:Server
2014-02-19 14:43:46,778 INFO [Sym_Client-Client01] [DataProcessor] [sym_client-client01-pull-4] Batch 17469 for node 'Client01' has been transfer to stage processing for 128 second
s. The following stats have been gathered: {STATEMENTCOUNT=11, BYTECOUNT=295268, LINENUMBER=11}

...

2014-02-19 14:47:33,660 INFO [Sym_Client-Client01] [DataProcessor] [sym_client-client01-pull-2] Batch 17470 for node 'Client01' has been transfer to stage processing for 622 second
s. The following stats have been gathered: {STATEMENTCOUNT=2207, BYTECOUNT=55611310, LINENUMBER=2207}
2014-02-19 14:47:37,311 INFO [Sym_Client-Client01] [DataProcessor] [sym_client-client01-pull-4] Batch 17469 for node 'Client01' has been transfer to stage processing for 358 second
s. The following stats have been gathered: {STATEMENTCOUNT=34, BYTECOUNT=875112, LINENUMBER=34}
2014-02-19 14:48:41,770 INFO [Sym_Client-Client01] [RouterService] [Sym_Client-Client01-job-7] Routed 2293 data events in 553 ms
2014-02-19 14:48:54,767 INFO [Sym_Client-Client01] [DataProcessor] [sym_client-client01-pull-4] Batch 17469 for node 'Client01' has been transfer to stage processing for 436 second
s. The following stats have been gathered: {STATEMENTCOUNT=38, BYTECOUNT=967244, LINENUMBER=38}
2014-02-19 14:49:36,183 INFO [Sym_Client-Client01] [DataProcessor] [sym_client-client01-pull-2] Batch 17471 for node 'Client01' has been transfer to stage processing for 60 seconds
. The following stats have been gathered: {STATEMENTCOUNT=336, BYTECOUNT=8524759, LINENUMBER=336}
2014-02-19 14:50:00,125 INFO [Sym_Client-Client01] [PurgeService] [Sym_Client-Client01-job-20] 6 stats rows were purged
2014-02-19 14:50:00,854 INFO [Sym_Client-Client01] [DataProcessor] [sym_client-client01-pull-4] Batch 17469 for node 'Client01' has been transfer to stage processing for 502 second
s. The following stats have been gathered: {STATEMENTCOUNT=48, BYTECOUNT=1213326, LINENUMBER=48}

...

2014-02-19 15:04:31,858 INFO [Sym_Client-Client01] [DataProcessor] [sym_client-client01-pull-4] Batch 17469 for node 'Client01' has been transfer to stage processing for 1373 secon
ds. The following stats have been gathered: {STATEMENTCOUNT=1150, BYTECOUNT=29032265, LINENUMBER=1150}
2014-02-19 15:05:00,071 INFO [Sym_Client-Client01] [PurgeService] [Sym_Client-Client01-job-13] 6 stats rows were purged
2014-02-19 15:05:13,450 INFO [Sym_Client-Client01] [RouterService] [Sym_Client-Client01-job-3] Routed 2387 data events in 598 ms
2014-02-19 15:05:14,158 INFO [Sym_Client-Client01] [PullService] [sym_client-client01-pull-2] Pull data received from Sym_Server:Server:Server. 8253 rows and 13 batches were proce
ssed
2014-02-19 15:05:28,132 WARN [Sym_Client-Client01] [StatisticManager] [sym_client-client01-pull-5] Starting a new process even though the previous one (processType=Database Pull,so
urceNodeId=Server,targetNodeId=Client01,status=Transferring,startTime=Wed Feb 19 14:41:27 EST 2014) had not finished
2014-02-19 15:05:32,792 INFO [Sym_Client-Client01] [DataProcessor] [sym_client-client01-pull-4] Batch 17469 for node 'Client01' has been transfer to stage processing for 1434 secon
ds. The following stats have been gathered: {STATEMENTCOUNT=1331, BYTECOUNT=33550548, LINENUMBER=1331}
2014-02-19 15:05:49,906 WARN [Sym_Client-Client01] [IncomingBatchService] [sym_client-client01-pull-5] Skipping batch Server-17493
2014-02-19 15:05:49,939 WARN [Sym_Client-Client01] [IncomingBatchService] [sym_client-client01-pull-5] Skipping batch Server-17495
2014-02-19 15:05:53,964 INFO [Sym_Client-Client01] [RouterService] [Sym_Client-Client01-job-12] Routed 1 data events in 99 ms
2014-02-19 15:06:36,814 INFO [Sym_Client-Client01] [DataProcessor] [sym_client-client01-pull-4] Batch 17469 for node 'Client01' has been transfer to stage processing for 1498 secon
ds. The following stats have been gathered: {STATEMENTCOUNT=1469, BYTECOUNT=37025036, LINENUMBER=1469}
2014-02-19 15:06:50,343 INFO [Sym_Client-Client01] [DataProcessor] [sym_client-client01-pull-5] Batch 17472 for node 'Client01' has been transfer to stage processing for 60 seconds
. The following stats have been gathered: {STATEMENTCOUNT=125, BYTECOUNT=3172352, LINENUMBER=125}

...

2014-02-19 15:09:55,460 INFO [Sym_Client-Client01] [DataProcessor] [sym_client-client01-pull-5] Batch 17472 for node 'Client01' has been transfer to stage processing for 245 second
s. The following stats have been gathered: {STATEMENTCOUNT=561, BYTECOUNT=14153824, LINENUMBER=561}
2014-02-19 15:10:00,073 INFO [Sym_Client-Client01] [PurgeService] [Sym_Client-Client01-job-9] 6 stats rows were purged
2014-02-19 15:10:24,238 WARN [Sym_Client-Client01] [IncomingBatchService] [sym_client-client01-pull-4] Skipping batch Server-17469
2014-02-19 15:10:25,891 WARN [Sym_Client-Client01] [StagedResource] [sym_client-client01-pull-4] We had to delete /apps/symmetric-pro/bin/../tmp/Sym_Client-Client01/incoming/Server
/0000017472.create because it already existed
2014-02-19 15:10:58,656 INFO [Sym_Client-Client01] [DataProcessor] [sym_client-client01-pull-5] Batch 17472 for node 'Client01' has been transfer to stage processing for 308 second
s. The following stats have been gathered: {STATEMENTCOUNT=590, BYTECOUNT=14883444, LINENUMBER=590}



Server logs. The server OS seems to have closed the original connection at 12:56:01.

2014-02-19 12:37:28,204 INFO [Sym_Server-Server] [DataProcessor] [qtp-1870560214-31] Batch 17468 for node 'Client01' has been send from stage processing for 1132 seconds. The foll
owing stats have been gathered: {STATEMENTCOUNT=1660, BYTECOUNT=41697667, LINENUMBER=1660}
2014-02-19 12:39:12,386 INFO [Sym_Server-Server] [DataProcessor] [qtp-1870560214-31] Batch 17469 for node 'Client01' has been send from stage processing for 60 seconds. The follow
ing stats have been gathered: {STATEMENTCOUNT=246, BYTECOUNT=6294833, LINENUMBER=246}
2014-02-19 12:39:59,096 INFO [Sym_Server-Server] [RouterService] [Sym_Server-Server-job-12] Routed 1 data events in 102 ms
2014-02-19 12:40:13,115 INFO [Sym_Server-Server] [DataProcessor] [qtp-1870560214-31] Batch 17469 for node 'Client01' has been send from stage processing for 120 seconds. The follo
wing stats have been gathered: {STATEMENTCOUNT=459, BYTECOUNT=11684538, LINENUMBER=459}
2014-02-19 12:45:00,098 INFO [Sym_Server-Server] [PurgeService] [Sym_Server-Server-job-9] 13 stats rows were purged
2014-02-19 12:50:00,077 INFO [Sym_Server-Server] [PurgeService] [Sym_Server-Server-job-16] 7 stats rows were purged
2014-02-19 12:55:00,067 INFO [Sym_Server-Server] [PurgeService] [Sym_Server-Server-job-2] 7 stats rows were purged
2014-02-19 12:55:07,478 INFO [Sym_Server-Server] [RouterService] [Sym_Server-Server-job-14] Routed 1 data events in 352 ms
2014-02-19 12:56:01,359 WARN [Sym_Server-Server] [DataExtractorService] [qtp-1870560214-31] Failed to transport batch Client01-17469. The stream was closed by the client. There i
s a good chance that a previously sent batch errored out and the stream was closed or there was a network error. The error was: Connection timed out
2014-02-19 13:00:00,050 INFO [Sym_Server-Server] [PurgeService] [Sym_Server-Server-job-20] 7 stats rows were purged
2014-02-19 13:05:00,060 INFO [Sym_Server-Server] [PurgeService] [Sym_Server-Server-job-4] 1 stats rows were purged

...

2014-02-19 14:15:00,220 INFO [Sym_Server-Server] [PurgeService] [Sym_Server-Server-job-17] 13 stats rows were purged
2014-02-19 14:19:38,058 INFO [Sym_Server-Server] [DataProcessor] [qtp-1870560214-31] Batch 17468 for node 'Client01' has been send from stage processing for 60 seconds. The follow
ing stats have been gathered: {STATEMENTCOUNT=149, BYTECOUNT=3754152, LINENUMBER=149}
2014-02-19 14:20:00,073 INFO [Sym_Server-Server] [PurgeService] [Sym_Server-Server-job-10] 7 stats rows were purged
2014-02-19 14:20:38,398 INFO [Sym_Server-Server] [DataProcessor] [qtp-1870560214-31] Batch 17468 for node 'Client01' has been send from stage processing for 120 seconds. The follo
wing stats have been gathered: {STATEMENTCOUNT=313, BYTECOUNT=7838587, LINENUMBER=313}
2014-02-19 14:21:38,663 INFO [Sym_Server-Server] [DataProcessor] [qtp-1870560214-31] Batch 17468 for node 'Client01' has been send from stage processing for 180 seconds. The follo
wing stats have been gathered: {STATEMENTCOUNT=527, BYTECOUNT=13269255, LINENUMBER=527}

...

2014-02-19 14:27:41,194 INFO [Sym_Server-Server] [DataProcessor] [qtp-1870560214-31] Batch 17468 for node 'Client01' has been send from stage processing for 543 seconds. The follo
wing stats have been gathered: {STATEMENTCOUNT=1777, BYTECOUNT=44634477, LINENUMBER=1777}
2014-02-19 14:28:48,219 INFO [Sym_Server-Server] [RouterService] [Sym_Server-Server-job-7] Routed 1 data events in 77 ms
2014-02-19 14:28:52,672 INFO [Sym_Server-Server] [DataProcessor] [qtp-1870560214-31] Batch 17469 for node 'Client01' has been send from stage processing for 60 seconds. The follow
ing stats have been gathered: {STATEMENTCOUNT=250, BYTECOUNT=6388733, LINENUMBER=250}

...

2014-02-19 14:35:57,380 INFO [Sym_Server-Server] [DataProcessor] [qtp-1870560214-31] Batch 17469 for node 'Client01' has been send from stage processing for 485 seconds. The follo
wing stats have been gathered: {STATEMENTCOUNT=1721, BYTECOUNT=43420274, LINENUMBER=1721}
2014-02-19 14:37:30,484 INFO [Sym_Server-Server] [DataProcessor] [qtp-1870560214-31] Batch 17470 for node 'Client01' has been send from stage processing for 60 seconds. The follow
ing stats have been gathered: {STATEMENTCOUNT=262, BYTECOUNT=6663158, LINENUMBER=262}

...

2014-02-19 14:40:26,302 WARN [Sym_Server-Server] [ConcurrentConnectionManager] [qtp-1870560214-30] Node 'Client01' requested a /sync/Sym_Server-Server/pull connection, but was reje
cted because it already has one
2014-02-19 14:40:34,397 INFO [Sym_Server-Server] [DataProcessor] [qtp-1870560214-31] Batch 17470 for node 'Client01' has been send from stage processing for 244 seconds. The follo
wing stats have been gathered: {STATEMENTCOUNT=823, BYTECOUNT=20813120, LINENUMBER=823}
2014-02-19 14:41:31,504 WARN [Sym_Server-Server] [StatisticManager] [qtp-1870560214-29] Starting a new process even though the previous one (processType=Service Database Pull,sourc
eNodeId=Server,targetNodeId=Client01,status=Transferring,startTime=Wed Feb 19 14:18:26 EST 2014) had not finished
2014-02-19 14:41:34,910 INFO [Sym_Server-Server] [DataProcessor] [qtp-1870560214-31] Batch 17470 for node 'Client01' has been send from stage processing for 304 seconds. The follo
wing stats have been gathered: {STATEMENTCOUNT=1035, BYTECOUNT=26158318, LINENUMBER=1035}
2014-02-19 14:42:35,545 INFO [Sym_Server-Server] [DataProcessor] [qtp-1870560214-31] Batch 17470 for node 'Client01' has been send from stage processing for 365 seconds. The follo
wing stats have been gathered: {STATEMENTCOUNT=1243, BYTECOUNT=31335561, LINENUMBER=1243}
2014-02-19 14:42:45,334 INFO [Sym_Server-Server] [DataProcessor] [qtp-1870560214-29] Batch 17469 for node 'Client01' has been send from stage processing for 73 seconds. The follow
ing stats have been gathered: {STATEMENTCOUNT=28, BYTECOUNT=731582, LINENUMBER=28}

...

2014-02-19 15:03:28,961 INFO [Sym_Server-Server] [DataProcessor] [qtp-1870560214-29] Batch 17469 for node 'Client01' has been send from stage processing for 1317 seconds. The foll
owing stats have been gathered: {STATEMENTCOUNT=1161, BYTECOUNT=29309072, LINENUMBER=1161}
2014-02-19 15:04:06,273 INFO [Sym_Server-Server] [DataProcessor] [qtp-1870560214-31] Batch 17471 for node 'Client01' has been send from stage processing for 970 seconds. The follo
wing stats have been gathered: {STATEMENTCOUNT=2378, BYTECOUNT=60056141, LINENUMBER=2378}
2014-02-19 15:04:07,579 INFO [Sym_Server-Server] [DataExtractorService] [qtp-1870560214-31] Reached the total byte threshold after 13 of 17 batches were extracted for node 'Client0
1'. The remaining batches will be extracted on a subsequent sync
2014-02-19 15:04:29,400 INFO [Sym_Server-Server] [DataProcessor] [qtp-1870560214-29] Batch 17469 for node 'Client01' has been send from stage processing for 1377 seconds. The foll
owing stats have been gathered: {STATEMENTCOUNT=1187, BYTECOUNT=29963051, LINENUMBER=1187}
2014-02-19 15:05:00,069 INFO [Sym_Server-Server] [PurgeService] [Sym_Server-Server-job-10] 7 stats rows were purged
2014-02-19 15:05:10,792 WARN [Sym_Server-Server] [AcknowledgeService] [qtp-1870560214-33] Batch 17468 was already set to OK. Not updating to OK.
2014-02-19 15:05:10,928 WARN [Sym_Server-Server] [AcknowledgeService] [qtp-1870560214-33] Batch 17476 was already set to OK. Not updating to OK.
2014-02-19 15:05:28,851 WARN [Sym_Server-Server] [StatisticManager] [qtp-1870560214-32] Starting a new process even though the previous one (processType=Service Database Pull,sourc
eNodeId=Server,targetNodeId=Client01,status=Transferring,startTime=Wed Feb 19 14:41:31 EST 2014) had not finished
2014-02-19 15:05:30,009 INFO [Sym_Server-Server] [DataProcessor] [qtp-1870560214-29] Batch 17469 for node 'Client01' has been send from stage processing for 1438 seconds. The foll
owing stats have been gathered: {STATEMENTCOUNT=1365, BYTECOUNT=34400312, LINENUMBER=1365}

...

TagsNo tags attached.

Activities

chenson

2014-02-20 13:38

administrator   ~0000488

Set pull.thread.per.server.count=1. If that works, I'll make it the default in the next release.

msmith@cbnco.com

2014-02-20 15:12

reporter   ~0000489

Thanks. So I should set pull.thread.per.server.count=1 on the client, and then there would only be one thread in the pool?

chenson

2014-02-20 15:21

administrator   ~0000490

Yep.

msmith@cbnco.com

2014-02-21 19:31

reporter   ~0000492

Yes, that fixed it. Thanks.

chenson

2014-02-25 12:46

administrator   ~0000494

Set pull, push, file.pull and file.push thread count defaults to 1.

Related Changesets

SymmetricDS: master 52e6a98d

2014-02-25 07:46:21

chenson

Details Diff
0001597: Overlapping database pull processes after connection outage - Default pull.thread.per.server.count=1 Affected Issues
0001597
mod - symmetric-core/src/main/resources/symmetric-default.properties Diff File

Issue History

Date Modified Username Field Change
2014-02-20 04:46 msmith@cbnco.com New Issue
2014-02-20 13:38 chenson Note Added: 0000488
2014-02-20 13:39 chenson Target Version => 3.6.0
2014-02-20 15:12 msmith@cbnco.com Note Added: 0000489
2014-02-20 15:21 chenson Note Added: 0000490
2014-02-21 19:31 msmith@cbnco.com Note Added: 0000492
2014-02-25 00:52 chenson Assigned To => chenson
2014-02-25 00:52 chenson Status new => assigned
2014-02-25 12:45 chenson Summary Overlapping database pull processes after connection outage => Overlapping database pull processes after connection outage - Default pull.thread.per.server.count=1
2014-02-25 12:46 chenson Note Added: 0000494
2014-02-25 12:48 chenson Fixed in Version => 3.5.18
2014-02-25 12:48 chenson Target Version 3.6.0 => 3.5.18
2014-02-25 12:48 chenson Status assigned => resolved
2014-02-25 12:48 chenson Resolution open => fixed
2014-02-25 13:00 Changeset attached => SymmetricDS trunk r7975
2014-02-25 13:00 Changeset attached => SymmetricDS 3.5 r7976
2014-03-02 14:49 chenson Status resolved => closed
2015-07-31 01:49 chenson Changeset attached => SymmetricDS master 52e6a98d