View Issue Details

IDProjectCategoryView StatusLast Update
0003913SymmetricDSBugpublic2019-04-24 14:25
ReportermmichalekAssigned Tommichalek 
Prioritynormal 
Status closedResolutionfixed 
Product Version3.9.0 
Target Version3.9.19Fixed in Version3.9.19 
Summary0003913: Exception during keep alive can cause concurrent loading of batches
Description1) The source sends batches to the targets.
2) The target starts loading the data.
3) The source quickly times out after 10 seconds.
4) The source tries to send the batches again. First, it is rejected.
5) Some minor IO error happens on the target. (Probably cannot send the keep alive because the client has timed out the HTTP connection) This causes the server to release the client's lock. But the batches sent in step 0000001 are still loading.
6) The client sends the batches again because of the short timeout. This time it gets in because the lock has been broken.
7) The same batches start loading again, leading to inconsistent results on the target.


Example logging where this was reprocued in dev:
2019-04-11 11:31:44,658 INFO [corp-000] [SqlRunner] [sql-runner-402] [store-001] Executing: insert into "DEMO-STORE-001"."PUBLIC"."SALE_TRANSACTION" ("TRAN_ID", "STORE_ID", "WORKSTATION", "DAY", "SEQ") values (3,'301','','',0)
2019-04-11 11:31:46,117 INFO [store-001] [RouterService] [store-001-job-16] Routed 1 data events in 4 ms
2019-04-11 11:31:50,790 INFO [store-001] [PushService] [store-001-push-default-3] Push data sent to corp:000:000
2019-04-11 11:31:50,802 INFO [corp-000] [SymmetricServlet] [qtp524223214-207] Error while processing PUT request for node: 001 at 127.0.0.1 with path: /corp-000/push The message is: Fake exception testing dell case.
2019-04-11 11:31:50,802 WARN [store-001] [PushService] [store-001-push-default-3] Expected but did not receive an ack for batch 000-22. This could be because the batch is corrupt - removing the batch from staging. (org.jumpmind.symmetric.service.impl.AbstractService.readAcks(AbstractService.java:341))
2019-04-11 11:31:55,819 INFO [store-001] [PushService] [store-001-push-default-4] Push data sent to corp:000:000
2019-04-11 11:31:55,819 WARN [corp-000] [ThroughputStatisticManager] [corp-000-dataloader-8] Starting a new process even though the previous 'Service Database Push Load' process had not finished (org.jumpmind.symmetric.statistic.StatisticManager.newProcessInfo(StatisticManager.java:117))
2019-04-11 11:31:55,820 INFO [corp-000] [ThroughputStatisticManager] [corp-000-dataloader-8] Details from the previous process: processType=Service Database Push Load,sourceNodeId=001,targetNodeId=000,queue=default,status=Loading,startTime=Thu Apr 11 11:31:50 EDT 2019
2019-04-11 11:31:55,821 INFO [corp-000] [IncomingBatchService] [corp-000-dataloader-8] Retrying batch 001-22
2019-04-11 11:31:55,830 INFO [corp-000] [SymmetricServlet] [qtp524223214-30] Error while processing PUT request for node: 001 at 127.0.0.1 with path: /corp-000/push The message is: Fake exception testing dell case.
2019-04-11 11:31:55,831 WARN [store-001] [PushService] [store-001-push-default-4] Expected but did not receive an ack for batch 000-22. This could be because the batch is corrupt - removing the batch from staging. (org.jumpmind.symmetric.service.impl.AbstractService.readAcks(AbstractService.java:341))
2019-04-11 11:32:00,810 DEBUG [corp-000] [DefaultDatabaseWriter] [corp-000-dataloader-7] Preparing dml: insert into "DEMO-CORP"."PUBLIC"."SALE_TRANSACTION" ("TRAN_ID", "STORE_ID", "WORKSTATION", "DAY", "SEQ") values (?,?,?,?,?)
2019-04-11 11:32:00,811 DEBUG [corp-000] [DefaultDatabaseWriter] [corp-000-dataloader-7] Submitting data [3, '301', '', '', 0] with types [INTEGER, VARCHAR, VARCHAR, VARCHAR, INTEGER]
2019-04-11 11:32:00,838 INFO [store-001] [PushService] [store-001-push-default-5] Push data sent to corp:000:000
2019-04-11 11:32:00,839 WARN [corp-000] [ThroughputStatisticManager] [corp-000-dataloader-9] Starting a new process even though the previous 'Service Database Push Load' process had not finished (org.jumpmind.symmetric.statistic.StatisticManager.newProcessInfo(StatisticManager.java:117))
2019-04-11 11:32:00,839 INFO [corp-000] [ThroughputStatisticManager] [corp-000-dataloader-9] Details from the previous process: processType=Service Database Push Load,sourceNodeId=001,targetNodeId=000,queue=default,status=Loading,startTime=Thu Apr 11 11:31:55 EDT 2019
2019-04-11 11:32:00,841 INFO [corp-000] [IncomingBatchService] [corp-000-dataloader-9] Skipping batch 001-22
2019-04-11 11:32:00,841 INFO [corp-000] [DataLoaderService] [qtp524223214-17] 1 data and 1 batches loaded during push request from store:001:001.
2019-04-11 11:32:00,844 INFO [store-001] [PushService] [store-001-push-default-5] Pushed data to node corp:000:000. 1 data and 1 batches were processed. (sale_transaction)
2019-04-11 11:32:05,837 DEBUG [corp-000] [DefaultDatabaseWriter] [corp-000-dataloader-8] Preparing dml: insert into "DEMO-CORP"."PUBLIC"."SALE_TRANSACTION" ("TRAN_ID", "STORE_ID", "WORKSTATION", "DAY", "SEQ") values (?,?,?,?,?)
2019-04-11 11:32:05,837 DEBUG [corp-000] [DefaultDatabaseWriter] [corp-000-dataloader-8] Submitting data [3, '301', '', '', 0] with types [INTEGER, VARCHAR, VARCHAR, VARCHAR, INTEGER]
2019-04-11 11:32:05,842 DEBUG [corp-000] [DefaultDatabaseWriter] [corp-000-dataloader-8] Preparing dml: update "DEMO-CORP"."PUBLIC"."SALE_TRANSACTION" set "STORE_ID" = ?, "WORKSTATION" = ?, "DAY" = ?, "SEQ" = ? where "TRAN_ID" = ?
2019-04-11 11:32:05,842 DEBUG [corp-000] [DefaultDatabaseWriter] [corp-000-dataloader-8] Submitting data ['301', '', '', 0, 3] with types [VARCHAR, VARCHAR, VARCHAR, INTEGER, INTEGER]
Steps To Reproduce1. Install 3.9.18 Demo
2. Change http timeout to 1 second
3. Force an exception to be thrown during the send of keep alive from the target DataLoaderService.java:

                        while (!executor.awaitTermination(keepAliveMillis, TimeUnit.MILLISECONDS)) {
                            outWriter.write("1=1&");
                            outWriter.flush();
                            if (true) { // force exception to be thrown here.
                                throw new IOException("Fake exception testing dell case.");
                            }
                        }
TagsNo tags attached.

Relationships

related to 0003575 closedmmichalek Starting a new process even though the previous 'File Sync Push' process had not finished 

Activities

There are no notes attached to this issue.

Related Changesets

SymmetricDS: 3.9 6d82ba32

2019-04-11 13:12:54

mmichalek

Details Diff
0003913: Exception during keep alive can cause concurrent loading of batches
0003913
mod - symmetric-core/src/main/java/org/jumpmind/symmetric/service/impl/DataLoaderService.java Diff File

SymmetricDS: 3.10 b7d38cb0

2019-04-11 13:12:54

mmichalek

Details Diff
0003913: Exception during keep alive can cause concurrent loading of batches
0003913
mod - symmetric-core/src/main/java/org/jumpmind/symmetric/service/impl/DataLoaderService.java Diff File

Issue History

Date Modified Username Field Change
2019-04-11 13:01 mmichalek New Issue
2019-04-11 13:01 mmichalek Status new => assigned
2019-04-11 13:01 mmichalek Assigned To => mmichalek
2019-04-11 13:16 mmichalek Status assigned => resolved
2019-04-11 13:16 mmichalek Resolution open => fixed
2019-04-11 13:16 mmichalek Fixed in Version => 3.9.19
2019-04-11 14:00 mmichalek Changeset attached => SymmetricDS 3.9 6d82ba32
2019-04-11 14:00 mmichalek Changeset attached => SymmetricDS 3.10 b7d38cb0
2019-04-23 11:14 elong Status resolved => closed
2019-04-24 14:25 elong Relationship added related to 0003575