View Issue Details
ID | Project | Category | View Status | Date Submitted | Last Update |
---|---|---|---|---|---|
0001597 | SymmetricDS | Bug | public | 2014-02-20 04:46 | 2014-03-02 14:49 |
Reporter | msmith@cbnco.com | Assigned To | chenson | ||
Priority | normal | ||||
Status | closed | Resolution | fixed | ||
Product Version | 3.5.11 | ||||
Target Version | 3.5.18 | Fixed in Version | 3.5.18 | ||
Summary | 0001597: Overlapping database pull processes after connection outage - Default pull.thread.per.server.count=1 | ||||
Description | Hi, we've been testing SymmetricDS for an application with spotty network connections. We used netem to set the bandwidth down to 512 kbit with 3% packet loss, inserted 20,000 rows with avg 19K blob each row, then, in the middle of a batch pull, blocked connections in both directions for half an hour using iptables. The server side OS closed the connection but the client side OS did not. This is a scenario we see in production with other technologies like Oracle Streams. As you'd expect, Symmetric kept waiting for the pull to complete. After pull.lock.timeout.ms expired (2 hours), the client started a new pull job for the same batch. A few minutes later - 2 hours after breaking the connection - http.timeout.ms expired and the original pull completed with a java.net.SocketTimeoutException: Read timed out. At this point, another pull job was started for the same batch. So two pull jobs were running concurrently, for different batches, offset by a few minutes, and the first worker was downloading something the second worker had already grabbed. The first one to complete committed its records to the database. This kept going until there were no more batches to transfer - every subsequent batch was downloaded twice. In the end everything recovered and the synchronization was successful. I just don't think it was the intended behaviour to transfer the batches twice so I figured I'd file a bug and see what you think. On a slow, unreliable link it could make a big difference. | ||||
Steps To Reproduce | Set up an environment with one central node and one remote node. Each node has a SymmetricDS instance and an Oracle database. On both ends, set up the network emulator for 800 ms RTT, 512 kbit symmetric bandwidth, 3% packet loss: tc qdisc add dev eth0 root handle 1: htb default 12 tc class add dev eth0 parent 1:1 classid 1:12 htb rate 512kbit ceil 512kbit tc qdisc add dev eth0 parent 1:12 netem delay 400ms 80ms distribution normal loss 5% 25% (seems to work out to 3% loss) Insert 20,000 records with avg 19K blob (total > 380 MB data) as separate transactions. Use "netstat -nap | grep :<symmetric server port>" on the server to look for non-zero SendQ values - this indicates a batch is being transferred. Make a note of the client's source port. In the middle of the batch block traffic on both ends: iptables -I INPUT -s <other node IP> -j DROP Wait 30+ min then remove the rule: iptables -D INPUT -s <other node IP> -j DROP If you're lucky the server will have closed the connection - the client's port number it will no longer show up as ESTABLISHED in "netstat -nap | grep :<symmetric server port>". Meanwhile on the client the connection will still show up as ESTABLISHED. Wait for pull.lock.timeout.ms to expire (default 2 hours). Wait again for http.timeout.ms to expire (2 hours after the first iptables rule). Check the SymmetricDS log on the client and see each batch being transferred in parallel by two threads. | ||||
Additional Information | Logs from the client. Batch 17469 was being transferred when the connection was blocked around 12:40:00. The connection was restored at 13:11. The pull lock timeout expired at 14:18 and the original HTTP timeout expired at 14:40. 2014-02-19 12:36:32,622 INFO [Sym_Client-Client01] [DataProcessor] [sym_client-client01-pull-1] Batch 17468 for node 'Client01' has been transfer to stage processing for 1075 secon ds. The following stats have been gathered: {STATEMENTCOUNT=1410, BYTECOUNT=35401329, LINENUMBER=1410} 2014-02-19 12:37:32,842 INFO [Sym_Client-Client01] [DataProcessor] [sym_client-client01-pull-1] Batch 17468 for node 'Client01' has been transfer to stage processing for 1135 secon ds. The following stats have been gathered: {STATEMENTCOUNT=1625, BYTECOUNT=40782177, LINENUMBER=1625} 2014-02-19 12:38:48,899 INFO [Sym_Client-Client01] [RouterService] [Sym_Client-Client01-job-10] Routed 1778 data events in 501 ms 2014-02-19 12:39:41,752 INFO [Sym_Client-Client01] [DataProcessor] [sym_client-client01-pull-1] Batch 17469 for node 'Client01' has been transfer to stage processing for 60 seconds . The following stats have been gathered: {STATEMENTCOUNT=294, BYTECOUNT=7573892, LINENUMBER=294} 2014-02-19 12:40:00,082 INFO [Sym_Client-Client01] [PurgeService] [Sym_Client-Client01-job-12] 6 stats rows were purged 2014-02-19 12:41:30,777 INFO [Sym_Client-Client01] [DataGapDetector] [Sym_Client-Client01-job-20] Found a gap in data_id from 346766 to 347539. Skipping it because the gap expired 2014-02-19 12:42:31,479 INFO [Sym_Client-Client01] [RouterService] [Sym_Client-Client01-job-18] Routed 1 data events in 127 ms 2014-02-19 12:44:03,815 WARN [Sym_Client-Client01] [FileSyncService] [sym_client-client01-file_pull-2] Could not communicate with Sym_Server:Server:Server at http://172.20.29.52:31 415/sync/Sym_Server-Server because: Connection timed out ... 2014-02-19 13:12:17,287 INFO [Sym_Client-Client01] [PushService] [sym_client-client01-push-1] Push data sent to Sym_Server:Server:Server 2014-02-19 13:12:18,405 INFO [Sym_Client-Client01] [PushService] [sym_client-client01-push-1] Pushed data to Sym_Server:Server:Server. 2 data and 2 batches were processed 2014-02-19 13:12:31,494 INFO [Sym_Client-Client01] [RouterService] [Sym_Client-Client01-job-5] Routed 1 data events in 129 ms 2014-02-19 13:13:32,283 INFO [Sym_Client-Client01] [PushService] [sym_client-client01-push-3] Push data sent to Sym_Server:Server:Server 2014-02-19 13:13:33,439 INFO [Sym_Client-Client01] [PushService] [sym_client-client01-push-3] Pushed data to Sym_Server:Server:Server. 1 data and 1 batches were processed 2014-02-19 13:17:44,898 INFO [Sym_Client-Client01] [DataGapDetector] [Sym_Client-Client01-job-6] Found a gap in data_id from 350353 to 350971. Skipping it because the gap expired 2014-02-19 13:20:00,115 INFO [Sym_Client-Client01] [PurgeService] [Sym_Client-Client01-job-11] 6 stats rows were purged 2014-02-19 13:25:00,077 INFO [Sym_Client-Client01] [PurgeService] [Sym_Client-Client01-job-9] 12 stats rows were purged 2014-02-19 13:27:31,177 INFO [Sym_Client-Client01] [DataGapDetector] [Sym_Client-Client01-job-14] Found a gap in data_id from 350973 to 351709. Skipping it because the gap expired ... 2014-02-19 14:18:27,441 WARN [Sym_Client-Client01] [StatisticManager] [sym_client-client01-pull-2] Starting a new process even though the previous one (processType=Database Pull,so urceNodeId=Server,targetNodeId=Client01,status=Transferring,startTime=Wed Feb 19 12:18:25 EST 2014) had not finished 2014-02-19 14:18:40,487 WARN [Sym_Client-Client01] [IncomingBatchService] [sym_client-client01-pull-2] Skipping batch Server-17476 2014-02-19 14:18:42,083 INFO [Sym_Client-Client01] [RouterService] [Sym_Client-Client01-job-17] Routed 8 data events in 96 ms 2014-02-19 14:19:41,752 INFO [Sym_Client-Client01] [DataProcessor] [sym_client-client01-pull-2] Batch 17468 for node 'Client01' has been transfer to stage processing for 60 seconds . The following stats have been gathered: {STATEMENTCOUNT=134, BYTECOUNT=3348804, LINENUMBER=134} 2014-02-19 14:20:00,119 INFO [Sym_Client-Client01] [PurgeService] [Sym_Client-Client01-job-8] 6 stats rows were purged 2014-02-19 14:20:42,627 INFO [Sym_Client-Client01] [DataProcessor] [sym_client-client01-pull-2] Batch 17468 for node 'Client01' has been transfer to stage processing for 121 second s. The following stats have been gathered: {STATEMENTCOUNT=299, BYTECOUNT=7484376, LINENUMBER=299} 2014-02-19 14:21:43,097 INFO [Sym_Client-Client01] [DataProcessor] [sym_client-client01-pull-2] Batch 17468 for node 'Client01' has been transfer to stage processing for 182 second s. The following stats have been gathered: {STATEMENTCOUNT=514, BYTECOUNT=12930054, LINENUMBER=514} 2014-02-19 14:22:43,535 INFO [Sym_Client-Client01] [DataProcessor] [sym_client-client01-pull-2] Batch 17468 for node 'Client01' has been transfer to stage processing for 242 second s. The following stats have been gathered: {STATEMENTCOUNT=728, BYTECOUNT=18341337, LINENUMBER=728} ... 2014-02-19 14:27:45,010 INFO [Sym_Client-Client01] [DataProcessor] [sym_client-client01-pull-2] Batch 17468 for node 'Client01' has been transfer to stage processing for 543 second s. The following stats have been gathered: {STATEMENTCOUNT=1742, BYTECOUNT=43763830, LINENUMBER=1742} 2014-02-19 14:27:55,546 WARN [Sym_Client-Client01] [IncomingBatchService] [sym_client-client01-pull-2] Skipping batch Server-17468 2014-02-19 14:27:57,050 WARN [Sym_Client-Client01] [StagedResource] [sym_client-client01-pull-2] We had to delete /apps/symmetric-pro/bin/../tmp/Sym_Client-Client01/incoming/Server /0000017469.create because it already existed 2014-02-19 14:28:41,024 INFO [Sym_Client-Client01] [PushService] [sym_client-client01-push-8] Push data sent to Sym_Server:Server:Server 2014-02-19 14:28:46,657 INFO [Sym_Client-Client01] [PushService] [sym_client-client01-push-8] Pushed data to Sym_Server:Server:Server. 1 data and 1 batches were processed 2014-02-19 14:28:57,927 INFO [Sym_Client-Client01] [DataProcessor] [sym_client-client01-pull-2] Batch 17469 for node 'Client01' has been transfer to stage processing for 60 seconds . The following stats have been gathered: {STATEMENTCOUNT=215, BYTECOUNT=5485872, LINENUMBER=215} 2014-02-19 14:29:58,527 INFO [Sym_Client-Client01] [DataProcessor] [sym_client-client01-pull-2] Batch 17469 for node 'Client01' has been transfer to stage processing for 121 second s. The following stats have been gathered: {STATEMENTCOUNT=382, BYTECOUNT=9768726, LINENUMBER=382} ... 2014-02-19 14:36:00,302 INFO [Sym_Client-Client01] [DataProcessor] [sym_client-client01-pull-2] Batch 17469 for node 'Client01' has been transfer to stage processing for 483 second s. The following stats have been gathered: {STATEMENTCOUNT=1666, BYTECOUNT=42006814, LINENUMBER=1666} 2014-02-19 14:37:13,414 INFO [Sym_Client-Client01] [RouterService] [Sym_Client-Client01-job-18] Routed 1786 data events in 515 ms 2014-02-19 14:38:11,796 INFO [Sym_Client-Client01] [DataProcessor] [sym_client-client01-pull-2] Batch 17470 for node 'Client01' has been transfer to stage processing for 60 seconds . The following stats have been gathered: {STATEMENTCOUNT=342, BYTECOUNT=8693603, LINENUMBER=342} 2014-02-19 14:39:28,946 INFO [Sym_Client-Client01] [DataProcessor] [sym_client-client01-pull-2] Batch 17470 for node 'Client01' has been transfer to stage processing for 137 second s. The following stats have been gathered: {STATEMENTCOUNT=546, BYTECOUNT=13882922, LINENUMBER=546} 2014-02-19 14:40:21,644 ERROR [Sym_Client-Client01] [DataLoaderService] [sym_client-client01-pull-1] Failed while parsing batch org.jumpmind.exception.IoException: java.net.SocketTimeoutException: Read timed out at org.jumpmind.symmetric.io.data.reader.ProtocolDataReader.readNext(ProtocolDataReader.java:283) at org.jumpmind.symmetric.io.data.reader.ProtocolDataReader.nextData(ProtocolDataReader.java:335) at org.jumpmind.symmetric.io.data.DataProcessor.forEachDataInTable(DataProcessor.java:185) at org.jumpmind.symmetric.io.data.DataProcessor.forEachTableInBatch(DataProcessor.java:164) at org.jumpmind.symmetric.io.data.DataProcessor.process(DataProcessor.java:114) at org.jumpmind.symmetric.service.impl.DataLoaderService.loadDataFromTransport(DataLoaderService.java:394) at org.jumpmind.symmetric.service.impl.DataLoaderService.loadDataFromPull(DataLoaderService.java:265) at org.jumpmind.symmetric.service.impl.PullService.execute(PullService.java:129) at org.jumpmind.symmetric.service.impl.NodeCommunicationService$2.run(NodeCommunicationService.java:307) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1156) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:626) at java.lang.Thread.run(Thread.java:780) Caused by: java.net.SocketTimeoutException: Read timed out at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:161) ... 2014-02-19 14:40:25,947 INFO [Sym_Client-Client01] [PullService] [sym_client-client01-pull-1] Pull data received from Sym_Server:Server:Server. 1779 rows and 2 batches were proces sed 2014-02-19 14:40:27,761 WARN [Sym_Client-Client01] [StatisticManager] [sym_client-client01-pull-3] Starting a new process even though the previous one (processType=Database Pull,so urceNodeId=Server,targetNodeId=Client01,status=Transferring,startTime=Wed Feb 19 14:18:27 EST 2014) had not finished 2014-02-19 14:40:30,395 INFO [Sym_Client-Client01] [DataProcessor] [sym_client-client01-pull-2] Batch 17470 for node 'Client01' has been transfer to stage processing for 199 second s. The following stats have been gathered: {STATEMENTCOUNT=739, BYTECOUNT=18696396, LINENUMBER=739} 2014-02-19 14:40:30,418 INFO [Sym_Client-Client01] [PullService] [sym_client-client01-pull-3] Sym_Server:Server:Server at http://172.20.29.52:31415/sync/Sym_Server-Server was busy 2014-02-19 14:40:30,419 WARN [Sym_Client-Client01] [DefaultOfflineClientListener] [sym_client-client01-pull-3] Node 'Server' was too busy to accept the connection 2014-02-19 14:41:30,857 INFO [Sym_Client-Client01] [DataProcessor] [sym_client-client01-pull-2] Batch 17470 for node 'Client01' has been transfer to stage processing for 259 second s. The following stats have been gathered: {STATEMENTCOUNT=957, BYTECOUNT=24180681, LINENUMBER=957} 2014-02-19 14:41:38,229 WARN [Sym_Client-Client01] [IncomingBatchService] [sym_client-client01-pull-4] Skipping batch Server-17478 2014-02-19 14:41:38,246 WARN [Sym_Client-Client01] [IncomingBatchService] [sym_client-client01-pull-4] Skipping batch Server-17480 2014-02-19 14:41:38,266 WARN [Sym_Client-Client01] [IncomingBatchService] [sym_client-client01-pull-4] Skipping batch Server-17481 2014-02-19 14:41:38,285 WARN [Sym_Client-Client01] [IncomingBatchService] [sym_client-client01-pull-4] Skipping batch Server-17482 2014-02-19 14:41:38,302 WARN [Sym_Client-Client01] [IncomingBatchService] [sym_client-client01-pull-4] Skipping batch Server-17485 2014-02-19 14:41:38,323 WARN [Sym_Client-Client01] [IncomingBatchService] [sym_client-client01-pull-4] Skipping batch Server-17487 2014-02-19 14:41:38,342 WARN [Sym_Client-Client01] [IncomingBatchService] [sym_client-client01-pull-4] Skipping batch Server-17489 2014-02-19 14:41:38,368 WARN [Sym_Client-Client01] [IncomingBatchService] [sym_client-client01-pull-4] Skipping batch Server-17491 2014-02-19 14:41:46,415 INFO [Sym_Client-Client01] [RouterService] [Sym_Client-Client01-job-3] Routed 2 data events in 386 ms 2014-02-19 14:42:31,326 INFO [Sym_Client-Client01] [DataProcessor] [sym_client-client01-pull-2] Batch 17470 for node 'Client01' has been transfer to stage processing for 320 second s. The following stats have been gathered: {STATEMENTCOUNT=1167, BYTECOUNT=29440354, LINENUMBER=1167} 2014-02-19 14:42:36,961 INFO [Sym_Client-Client01] [RouterService] [Sym_Client-Client01-job-8] Routed 1 data events in 104 ms 2014-02-19 14:42:41,220 INFO [Sym_Client-Client01] [DataProcessor] [sym_client-client01-pull-4] Batch 17469 for node 'Client01' has been transfer to stage processing for 62 seconds . The following stats have been gathered: {STATEMENTCOUNT=6, BYTECOUNT=159930, LINENUMBER=6} 2014-02-19 14:43:31,709 INFO [Sym_Client-Client01] [DataProcessor] [sym_client-client01-pull-2] Batch 17470 for node 'Client01' has been transfer to stage processing for 380 second s. The following stats have been gathered: {STATEMENTCOUNT=1378, BYTECOUNT=34707245, LINENUMBER=1378} 2014-02-19 14:43:42,942 INFO [Sym_Client-Client01] [PushService] [sym_client-client01-push-3] Push data sent to Sym_Server:Server:Server 2014-02-19 14:43:46,778 INFO [Sym_Client-Client01] [DataProcessor] [sym_client-client01-pull-4] Batch 17469 for node 'Client01' has been transfer to stage processing for 128 second s. The following stats have been gathered: {STATEMENTCOUNT=11, BYTECOUNT=295268, LINENUMBER=11} ... 2014-02-19 14:47:33,660 INFO [Sym_Client-Client01] [DataProcessor] [sym_client-client01-pull-2] Batch 17470 for node 'Client01' has been transfer to stage processing for 622 second s. The following stats have been gathered: {STATEMENTCOUNT=2207, BYTECOUNT=55611310, LINENUMBER=2207} 2014-02-19 14:47:37,311 INFO [Sym_Client-Client01] [DataProcessor] [sym_client-client01-pull-4] Batch 17469 for node 'Client01' has been transfer to stage processing for 358 second s. The following stats have been gathered: {STATEMENTCOUNT=34, BYTECOUNT=875112, LINENUMBER=34} 2014-02-19 14:48:41,770 INFO [Sym_Client-Client01] [RouterService] [Sym_Client-Client01-job-7] Routed 2293 data events in 553 ms 2014-02-19 14:48:54,767 INFO [Sym_Client-Client01] [DataProcessor] [sym_client-client01-pull-4] Batch 17469 for node 'Client01' has been transfer to stage processing for 436 second s. The following stats have been gathered: {STATEMENTCOUNT=38, BYTECOUNT=967244, LINENUMBER=38} 2014-02-19 14:49:36,183 INFO [Sym_Client-Client01] [DataProcessor] [sym_client-client01-pull-2] Batch 17471 for node 'Client01' has been transfer to stage processing for 60 seconds . The following stats have been gathered: {STATEMENTCOUNT=336, BYTECOUNT=8524759, LINENUMBER=336} 2014-02-19 14:50:00,125 INFO [Sym_Client-Client01] [PurgeService] [Sym_Client-Client01-job-20] 6 stats rows were purged 2014-02-19 14:50:00,854 INFO [Sym_Client-Client01] [DataProcessor] [sym_client-client01-pull-4] Batch 17469 for node 'Client01' has been transfer to stage processing for 502 second s. The following stats have been gathered: {STATEMENTCOUNT=48, BYTECOUNT=1213326, LINENUMBER=48} ... 2014-02-19 15:04:31,858 INFO [Sym_Client-Client01] [DataProcessor] [sym_client-client01-pull-4] Batch 17469 for node 'Client01' has been transfer to stage processing for 1373 secon ds. The following stats have been gathered: {STATEMENTCOUNT=1150, BYTECOUNT=29032265, LINENUMBER=1150} 2014-02-19 15:05:00,071 INFO [Sym_Client-Client01] [PurgeService] [Sym_Client-Client01-job-13] 6 stats rows were purged 2014-02-19 15:05:13,450 INFO [Sym_Client-Client01] [RouterService] [Sym_Client-Client01-job-3] Routed 2387 data events in 598 ms 2014-02-19 15:05:14,158 INFO [Sym_Client-Client01] [PullService] [sym_client-client01-pull-2] Pull data received from Sym_Server:Server:Server. 8253 rows and 13 batches were proce ssed 2014-02-19 15:05:28,132 WARN [Sym_Client-Client01] [StatisticManager] [sym_client-client01-pull-5] Starting a new process even though the previous one (processType=Database Pull,so urceNodeId=Server,targetNodeId=Client01,status=Transferring,startTime=Wed Feb 19 14:41:27 EST 2014) had not finished 2014-02-19 15:05:32,792 INFO [Sym_Client-Client01] [DataProcessor] [sym_client-client01-pull-4] Batch 17469 for node 'Client01' has been transfer to stage processing for 1434 secon ds. The following stats have been gathered: {STATEMENTCOUNT=1331, BYTECOUNT=33550548, LINENUMBER=1331} 2014-02-19 15:05:49,906 WARN [Sym_Client-Client01] [IncomingBatchService] [sym_client-client01-pull-5] Skipping batch Server-17493 2014-02-19 15:05:49,939 WARN [Sym_Client-Client01] [IncomingBatchService] [sym_client-client01-pull-5] Skipping batch Server-17495 2014-02-19 15:05:53,964 INFO [Sym_Client-Client01] [RouterService] [Sym_Client-Client01-job-12] Routed 1 data events in 99 ms 2014-02-19 15:06:36,814 INFO [Sym_Client-Client01] [DataProcessor] [sym_client-client01-pull-4] Batch 17469 for node 'Client01' has been transfer to stage processing for 1498 secon ds. The following stats have been gathered: {STATEMENTCOUNT=1469, BYTECOUNT=37025036, LINENUMBER=1469} 2014-02-19 15:06:50,343 INFO [Sym_Client-Client01] [DataProcessor] [sym_client-client01-pull-5] Batch 17472 for node 'Client01' has been transfer to stage processing for 60 seconds . The following stats have been gathered: {STATEMENTCOUNT=125, BYTECOUNT=3172352, LINENUMBER=125} ... 2014-02-19 15:09:55,460 INFO [Sym_Client-Client01] [DataProcessor] [sym_client-client01-pull-5] Batch 17472 for node 'Client01' has been transfer to stage processing for 245 second s. The following stats have been gathered: {STATEMENTCOUNT=561, BYTECOUNT=14153824, LINENUMBER=561} 2014-02-19 15:10:00,073 INFO [Sym_Client-Client01] [PurgeService] [Sym_Client-Client01-job-9] 6 stats rows were purged 2014-02-19 15:10:24,238 WARN [Sym_Client-Client01] [IncomingBatchService] [sym_client-client01-pull-4] Skipping batch Server-17469 2014-02-19 15:10:25,891 WARN [Sym_Client-Client01] [StagedResource] [sym_client-client01-pull-4] We had to delete /apps/symmetric-pro/bin/../tmp/Sym_Client-Client01/incoming/Server /0000017472.create because it already existed 2014-02-19 15:10:58,656 INFO [Sym_Client-Client01] [DataProcessor] [sym_client-client01-pull-5] Batch 17472 for node 'Client01' has been transfer to stage processing for 308 second s. The following stats have been gathered: {STATEMENTCOUNT=590, BYTECOUNT=14883444, LINENUMBER=590} Server logs. The server OS seems to have closed the original connection at 12:56:01. 2014-02-19 12:37:28,204 INFO [Sym_Server-Server] [DataProcessor] [qtp-1870560214-31] Batch 17468 for node 'Client01' has been send from stage processing for 1132 seconds. The foll owing stats have been gathered: {STATEMENTCOUNT=1660, BYTECOUNT=41697667, LINENUMBER=1660} 2014-02-19 12:39:12,386 INFO [Sym_Server-Server] [DataProcessor] [qtp-1870560214-31] Batch 17469 for node 'Client01' has been send from stage processing for 60 seconds. The follow ing stats have been gathered: {STATEMENTCOUNT=246, BYTECOUNT=6294833, LINENUMBER=246} 2014-02-19 12:39:59,096 INFO [Sym_Server-Server] [RouterService] [Sym_Server-Server-job-12] Routed 1 data events in 102 ms 2014-02-19 12:40:13,115 INFO [Sym_Server-Server] [DataProcessor] [qtp-1870560214-31] Batch 17469 for node 'Client01' has been send from stage processing for 120 seconds. The follo wing stats have been gathered: {STATEMENTCOUNT=459, BYTECOUNT=11684538, LINENUMBER=459} 2014-02-19 12:45:00,098 INFO [Sym_Server-Server] [PurgeService] [Sym_Server-Server-job-9] 13 stats rows were purged 2014-02-19 12:50:00,077 INFO [Sym_Server-Server] [PurgeService] [Sym_Server-Server-job-16] 7 stats rows were purged 2014-02-19 12:55:00,067 INFO [Sym_Server-Server] [PurgeService] [Sym_Server-Server-job-2] 7 stats rows were purged 2014-02-19 12:55:07,478 INFO [Sym_Server-Server] [RouterService] [Sym_Server-Server-job-14] Routed 1 data events in 352 ms 2014-02-19 12:56:01,359 WARN [Sym_Server-Server] [DataExtractorService] [qtp-1870560214-31] Failed to transport batch Client01-17469. The stream was closed by the client. There i s a good chance that a previously sent batch errored out and the stream was closed or there was a network error. The error was: Connection timed out 2014-02-19 13:00:00,050 INFO [Sym_Server-Server] [PurgeService] [Sym_Server-Server-job-20] 7 stats rows were purged 2014-02-19 13:05:00,060 INFO [Sym_Server-Server] [PurgeService] [Sym_Server-Server-job-4] 1 stats rows were purged ... 2014-02-19 14:15:00,220 INFO [Sym_Server-Server] [PurgeService] [Sym_Server-Server-job-17] 13 stats rows were purged 2014-02-19 14:19:38,058 INFO [Sym_Server-Server] [DataProcessor] [qtp-1870560214-31] Batch 17468 for node 'Client01' has been send from stage processing for 60 seconds. The follow ing stats have been gathered: {STATEMENTCOUNT=149, BYTECOUNT=3754152, LINENUMBER=149} 2014-02-19 14:20:00,073 INFO [Sym_Server-Server] [PurgeService] [Sym_Server-Server-job-10] 7 stats rows were purged 2014-02-19 14:20:38,398 INFO [Sym_Server-Server] [DataProcessor] [qtp-1870560214-31] Batch 17468 for node 'Client01' has been send from stage processing for 120 seconds. The follo wing stats have been gathered: {STATEMENTCOUNT=313, BYTECOUNT=7838587, LINENUMBER=313} 2014-02-19 14:21:38,663 INFO [Sym_Server-Server] [DataProcessor] [qtp-1870560214-31] Batch 17468 for node 'Client01' has been send from stage processing for 180 seconds. The follo wing stats have been gathered: {STATEMENTCOUNT=527, BYTECOUNT=13269255, LINENUMBER=527} ... 2014-02-19 14:27:41,194 INFO [Sym_Server-Server] [DataProcessor] [qtp-1870560214-31] Batch 17468 for node 'Client01' has been send from stage processing for 543 seconds. The follo wing stats have been gathered: {STATEMENTCOUNT=1777, BYTECOUNT=44634477, LINENUMBER=1777} 2014-02-19 14:28:48,219 INFO [Sym_Server-Server] [RouterService] [Sym_Server-Server-job-7] Routed 1 data events in 77 ms 2014-02-19 14:28:52,672 INFO [Sym_Server-Server] [DataProcessor] [qtp-1870560214-31] Batch 17469 for node 'Client01' has been send from stage processing for 60 seconds. The follow ing stats have been gathered: {STATEMENTCOUNT=250, BYTECOUNT=6388733, LINENUMBER=250} ... 2014-02-19 14:35:57,380 INFO [Sym_Server-Server] [DataProcessor] [qtp-1870560214-31] Batch 17469 for node 'Client01' has been send from stage processing for 485 seconds. The follo wing stats have been gathered: {STATEMENTCOUNT=1721, BYTECOUNT=43420274, LINENUMBER=1721} 2014-02-19 14:37:30,484 INFO [Sym_Server-Server] [DataProcessor] [qtp-1870560214-31] Batch 17470 for node 'Client01' has been send from stage processing for 60 seconds. The follow ing stats have been gathered: {STATEMENTCOUNT=262, BYTECOUNT=6663158, LINENUMBER=262} ... 2014-02-19 14:40:26,302 WARN [Sym_Server-Server] [ConcurrentConnectionManager] [qtp-1870560214-30] Node 'Client01' requested a /sync/Sym_Server-Server/pull connection, but was reje cted because it already has one 2014-02-19 14:40:34,397 INFO [Sym_Server-Server] [DataProcessor] [qtp-1870560214-31] Batch 17470 for node 'Client01' has been send from stage processing for 244 seconds. The follo wing stats have been gathered: {STATEMENTCOUNT=823, BYTECOUNT=20813120, LINENUMBER=823} 2014-02-19 14:41:31,504 WARN [Sym_Server-Server] [StatisticManager] [qtp-1870560214-29] Starting a new process even though the previous one (processType=Service Database Pull,sourc eNodeId=Server,targetNodeId=Client01,status=Transferring,startTime=Wed Feb 19 14:18:26 EST 2014) had not finished 2014-02-19 14:41:34,910 INFO [Sym_Server-Server] [DataProcessor] [qtp-1870560214-31] Batch 17470 for node 'Client01' has been send from stage processing for 304 seconds. The follo wing stats have been gathered: {STATEMENTCOUNT=1035, BYTECOUNT=26158318, LINENUMBER=1035} 2014-02-19 14:42:35,545 INFO [Sym_Server-Server] [DataProcessor] [qtp-1870560214-31] Batch 17470 for node 'Client01' has been send from stage processing for 365 seconds. The follo wing stats have been gathered: {STATEMENTCOUNT=1243, BYTECOUNT=31335561, LINENUMBER=1243} 2014-02-19 14:42:45,334 INFO [Sym_Server-Server] [DataProcessor] [qtp-1870560214-29] Batch 17469 for node 'Client01' has been send from stage processing for 73 seconds. The follow ing stats have been gathered: {STATEMENTCOUNT=28, BYTECOUNT=731582, LINENUMBER=28} ... 2014-02-19 15:03:28,961 INFO [Sym_Server-Server] [DataProcessor] [qtp-1870560214-29] Batch 17469 for node 'Client01' has been send from stage processing for 1317 seconds. The foll owing stats have been gathered: {STATEMENTCOUNT=1161, BYTECOUNT=29309072, LINENUMBER=1161} 2014-02-19 15:04:06,273 INFO [Sym_Server-Server] [DataProcessor] [qtp-1870560214-31] Batch 17471 for node 'Client01' has been send from stage processing for 970 seconds. The follo wing stats have been gathered: {STATEMENTCOUNT=2378, BYTECOUNT=60056141, LINENUMBER=2378} 2014-02-19 15:04:07,579 INFO [Sym_Server-Server] [DataExtractorService] [qtp-1870560214-31] Reached the total byte threshold after 13 of 17 batches were extracted for node 'Client0 1'. The remaining batches will be extracted on a subsequent sync 2014-02-19 15:04:29,400 INFO [Sym_Server-Server] [DataProcessor] [qtp-1870560214-29] Batch 17469 for node 'Client01' has been send from stage processing for 1377 seconds. The foll owing stats have been gathered: {STATEMENTCOUNT=1187, BYTECOUNT=29963051, LINENUMBER=1187} 2014-02-19 15:05:00,069 INFO [Sym_Server-Server] [PurgeService] [Sym_Server-Server-job-10] 7 stats rows were purged 2014-02-19 15:05:10,792 WARN [Sym_Server-Server] [AcknowledgeService] [qtp-1870560214-33] Batch 17468 was already set to OK. Not updating to OK. 2014-02-19 15:05:10,928 WARN [Sym_Server-Server] [AcknowledgeService] [qtp-1870560214-33] Batch 17476 was already set to OK. Not updating to OK. 2014-02-19 15:05:28,851 WARN [Sym_Server-Server] [StatisticManager] [qtp-1870560214-32] Starting a new process even though the previous one (processType=Service Database Pull,sourc eNodeId=Server,targetNodeId=Client01,status=Transferring,startTime=Wed Feb 19 14:41:31 EST 2014) had not finished 2014-02-19 15:05:30,009 INFO [Sym_Server-Server] [DataProcessor] [qtp-1870560214-29] Batch 17469 for node 'Client01' has been send from stage processing for 1438 seconds. The foll owing stats have been gathered: {STATEMENTCOUNT=1365, BYTECOUNT=34400312, LINENUMBER=1365} ... | ||||
Tags | No tags attached. | ||||
|
Set pull.thread.per.server.count=1. If that works, I'll make it the default in the next release. |
|
Thanks. So I should set pull.thread.per.server.count=1 on the client, and then there would only be one thread in the pool? |
|
Yep. |
|
Yes, that fixed it. Thanks. |
|
Set pull, push, file.pull and file.push thread count defaults to 1. |
Date Modified | Username | Field | Change |
---|---|---|---|
2014-02-20 04:46 | msmith@cbnco.com | New Issue | |
2014-02-20 13:38 | chenson | Note Added: 0000488 | |
2014-02-20 13:39 | chenson | Target Version | => 3.6.0 |
2014-02-20 15:12 | msmith@cbnco.com | Note Added: 0000489 | |
2014-02-20 15:21 | chenson | Note Added: 0000490 | |
2014-02-21 19:31 | msmith@cbnco.com | Note Added: 0000492 | |
2014-02-25 00:52 | chenson | Assigned To | => chenson |
2014-02-25 00:52 | chenson | Status | new => assigned |
2014-02-25 12:45 | chenson | Summary | Overlapping database pull processes after connection outage => Overlapping database pull processes after connection outage - Default pull.thread.per.server.count=1 |
2014-02-25 12:46 | chenson | Note Added: 0000494 | |
2014-02-25 12:48 | chenson | Fixed in Version | => 3.5.18 |
2014-02-25 12:48 | chenson | Target Version | 3.6.0 => 3.5.18 |
2014-02-25 12:48 | chenson | Status | assigned => resolved |
2014-02-25 12:48 | chenson | Resolution | open => fixed |
2014-02-25 13:00 | Changeset attached | => SymmetricDS trunk r7975 | |
2014-02-25 13:00 | Changeset attached | => SymmetricDS 3.5 r7976 | |
2014-03-02 14:49 | chenson | Status | resolved => closed |
2015-07-31 01:49 | chenson | Changeset attached | => SymmetricDS master 52e6a98d |