View Issue Details
ID | Project | Category | View Status | Date Submitted | Last Update |
---|---|---|---|---|---|
0002869 | SymmetricDS | Bug | public | 2016-10-19 14:48 | 2017-02-01 01:05 |
Reporter | iberras | Assigned To | chenson | ||
Priority | high | ||||
Status | closed | Resolution | unable to reproduce | ||
Product Version | 3.8.5 | ||||
Summary | 0002869: Memory overflow after EOF | ||||
Description | This is happening all the time when there is an EOF when a reload is occurring. On the client side i see that there is a missing table and that the batch could not be loaded: INFO PushService - Pushed data to node s-serverdevelopment:000:000. 1 data and 1 batches were processed INFO IncomingBatchService - Retrying batch 000-3470 ERROR DataLoaderService - Failed to load batch 000-3470 org.jumpmind.db.sql.SqlException: Could not find the target table EMB_EVOLUCION at org.jumpmind.symmetric.io.data.writer.AbstractDatabaseWriter.write(AbstractDatabaseWriter.java:215) at org.jumpmind.symmetric.io.data.writer.NestedDataWriter.write(NestedDataWriter.java:64) at org.jumpmind.symmetric.model.ProcessInfoDataWriter.write(ProcessInfoDataWriter.java:65) at org.jumpmind.symmetric.io.data.writer.NestedDataWriter.write(NestedDataWriter.java:64) at org.jumpmind.symmetric.io.data.writer.TransformWriter.write(TransformWriter.java:207) at org.jumpmind.symmetric.io.data.DataProcessor.forEachDataInTable(DataProcessor.java:200) at org.jumpmind.symmetric.io.data.DataProcessor.forEachTableInBatch(DataProcessor.java:170) at org.jumpmind.symmetric.io.data.DataProcessor.process(DataProcessor.java:116) at org.jumpmind.symmetric.service.impl.DataLoaderService.loadDataFromTransport(DataLoaderService.java:526) at org.jumpmind.symmetric.service.impl.DataLoaderService.loadDataFromPull(DataLoaderService.java:285) at org.jumpmind.symmetric.service.impl.PullService.execute(PullService.java:126) at org.jumpmind.symmetric.service.impl.NodeCommunicationService$1.run(NodeCommunicationService.java:479) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) And on the server side it gets stuck util there is an out of memory error with this message: NodeId=000,targetNodeId=002,channelId=default,status=Transferring,startTime=Wed Oct 19 11:40:44 ART 2016 2016-10-19 11:40:52,609 INFO [serverdevelopment] [DataLoaderService] [qtp1087019465-20] 1 data and 1 batches loaded during push request from c-serverdevelopment:002:002. 2016-10-19 11:40:52,972 INFO [serverdevelopment] [RouterService] [serverdevelopment-job-6] Routed 1 data events in 263 ms 2016-10-19 11:40:56,447 ERROR [serverdevelopment] [AcknowledgeService] [qtp1087019465-410] The outgoing batch 002-3470 failed: Could not find the target table EMB_EVOLUCION 2016-10-19 11:40:56,639 INFO [serverdevelopment] [DataLoaderService] [qtp1087019465-20] 1 data and 1 batches loaded during push request from c-serverdevelopment:002:002. 2016-10-19 11:40:56,844 INFO [serverdevelopment] [RouterService] [serverdevelopment-job-5] Routed 1 data events in 77 ms 2016-10-19 11:40:58,388 WARN [serverdevelopment] [StatisticManager] [qtp1087019465-395] Starting a new process even though the previous 'Service Database Pull' process had not finished 2016-10-19 11:40:58,388 INFO [serverdevelopment] [StatisticManager] [qtp1087019465-395] Details from the previous process: processType=Service Database Pull,sourceNodeId=000,targetNodeId=002,channelId=default,status=Transferring,startTime=Wed Oct 19 11:40:49 ART 2016 2016-10-19 11:41:01,744 ERROR [serverdevelopment] [AcknowledgeService] [qtp1087019465-410] The outgoing batch 002-3470 failed: Could not find the target table EMB_EVOLUCION 2016-10-19 11:41:01,852 INFO [serverdevelopment] [DataLoaderService] [qtp1087019465-20] 1 data and 1 batches loaded during push request from c-serverdevelopment:002:002. 2016-10-19 11:41:01,957 INFO [serverdevelopment] [RouterService] [serverdevelopment-job-8] Routed 1 data events in 69 ms 2016-10-19 11:41:04,385 WARN [serverdevelopment] [ConcurrentConnectionManager] [qtp1087019465-415] Node '002' Channel 'default' requested a /sync/serverdevelopment/pull connection, but was rejected because it already has one 2016-10-19 11:41:04,385 WARN [serverdevelopment] [StatisticManager] [qtp1087019465-415] Starting a new process even though the previous 'Service Database Pull' process had not finished 2016-10-19 11:41:04,386 INFO [serverdevelopment] [StatisticManager] [qtp1087019465-415] Details from the previous process: processType=Service Database Pull,sourceNodeId=000,targetNodeId=002,channelId=default,status=Transferring,startTime=Wed Oct 19 11:40:58 ART 2016 2016-10-19 11:41:07,642 ERROR [serverdevelopment] [AcknowledgeService] [qtp1087019465-410] The outgoing batch 002-3470 failed: Could not find the target table EMB_EVOLUCION 2016-10-19 11:41:07,662 INFO [serverdevelopment] [DataLoaderService] [qtp1087019465-20] 1 data and 1 batches loaded during push request from c-serverdevelopment:002:002. 2016-10-19 11:41:08,097 INFO [serverdevelopment] [RouterService] [serverdevelopment-job-5] Routed 1 data events in 77 ms 2016-10-19 11:41:10,387 WARN [serverdevelopment] [ConcurrentConnectionManager] [qtp1087019465-418] Node '002' Channel 'default' requested a /sync/serverdevelopment/pull connection, but was rejected because it already has one 2016-10-19 11:41:10,388 WARN [serverdevelopment] [StatisticManager] [qtp1087019465-418] Starting a new process even though the previous 'Service Database Pull' process had not finished 2016-10-19 11:41:10,388 INFO [serverdevelopment] [StatisticManager] [qtp1087019465-418] Details from the previous process: processType=Service Database Pull,sourceNodeId=000,targetNodeId=002,channelId=default,status=Transferring,startTime=Wed Oct 19 11:41:04 ART 2016 2016-10-19 11:41:14,543 ERROR [serverdevelopment] [AcknowledgeService] [qtp1087019465-20] The outgoing batch 002-3470 failed: Could not find the target table EMB_EVOLUCION 2016-10-19 11:41:14,669 INFO [serverdevelopment] [DataLoaderService] [qtp1087019465-410] 1 data and 1 batches loaded during push request from c-serverdevelopment:002:002. 2016-10-19 11:41:15,016 INFO [serverdevelopment] [RouterService] [serverdevelopment-job-7] Routed 1 data events in 366 ms 2016-10-19 11:41:16,395 WARN [serverdevelopment] [ConcurrentConnectionManager] [qtp1087019465-422] Node '002' Channel 'default' requested a /sync/serverdevelopment/pull connection, but was rejected because it already has one 2016-10-19 11:41:16,395 WARN [serverdevelopment] [StatisticManager] [qtp1087019465-422] Starting a new process even though the previous 'Service Database Pull' process had not finished 2016-10-19 11:41:16,395 INFO [serverdevelopment] [StatisticManager] [qtp1087019465-422] Details from the previous process: processType=Service Database Pull,sourceNodeId=000,targetNodeId=002,channelId=default,status=Transferring,startTime=Wed Oct 19 11:41:10 ART 2016 2016-10-19 11:41:22,377 ERROR [serverdevelopment] [AcknowledgeService] [qtp1087019465-20] The outgoing batch 002-3470 failed: Could not find the target table EMB_EVOLUCION 2016-10-19 11:41:22,456 INFO [serverdevelopment] [DataLoaderService] [qtp1087019465-410] 1 data and 1 batches loaded during push request from c-serverdevelopment:002:002. 2016-10-19 11:41:22,648 INFO [serverdevelopment] [RouterService] [serverdevelopment-job-6] Routed 1 data events in 62 ms 2016-10-19 11:41:25,404 WARN [serverdevelopment] [ConcurrentConnectionManager] [qtp1087019465-425] Node '002' Channel 'default' requested a /sync/serverdevelopment/pull connection, but was rejected because it already has one 2016-10-19 11:41:25,405 WARN [serverdevelopment] [StatisticManager] [qtp1087019465-425] Starting a new process even though the previous 'Service Database Pull' process had not finished 2016-10-19 11:41:25,405 INFO [serverdevelopment] [StatisticManager] [qtp1087019465-425] Details from the previous process: processType=Service Database Pull,sourceNodeId=000,targetNodeId=002,channelId=default,status=Transferring,startTime=Wed Oct 19 11:41:16 ART 2016 2016-10-19 11:41:29,364 ERROR [serverdevelopment] [AcknowledgeService] [qtp1087019465-20] The outgoing batch 002-3470 failed: Could not find the target table EMB_EVOLUCION 2016-10-19 11:41:29,713 INFO [serverdevelopment] [DataLoaderService] [qtp1087019465-410] 1 data and 1 batches loaded during push request from c-serverdevelopment:002:002. 2016-10-19 11:41:30,048 INFO [serverdevelopment] [RouterService] [serverdevelopment-job-13] Routed 1 data events in 342 ms Then, after this error, the client cannot do a reload without getting stuck. Any work around? Any solutions? | ||||
Steps To Reproduce | EOF in Reload. | ||||
Tags | No tags attached. | ||||
|
When a batch goes into error, it will block that channel until the error is fixed. If batch 000-3470 is loading table EMB_EVOLUCION, then the table needs to exist, otherwise it will get an error over and over. |
|
But this is happening in the initial load. The table does not exists, its being created when this error occurs. Clearly the table doesnt exist, but symmetric should take care of this when an initial load is occurring. What should be done to avoid this? |
|
It doesn't create the table by default, but you can tell it to do that. Set the parameter initial.load.create.first=true in your engine properties file or in the sym_parameter table. |
|
That parameter is set to true. All the tables are loaded and created correctly, all the time, in all the test i have done. But when i interrupt an initial load and an EOF occurs, this errors starts to happen. |
|
I tried steps of setting initial.load.create.first=true, sending an initial load, and then interrupting it, but everything ran okay for me. I must be missing a step, or maybe the timing needs to be just right. When the error starts to happen, can you take a look at what the batches are for that initial load? They will all have the same load_id. There should be a batch for creating the table (event_type of C) and a batch for sending data to the table (event_type of R). The batches should load in order and the status will be set OK and they are processed. |
|
You can close this. If you leave Symmetric by itself for 3-5 minutes in that "busy" state, the system recovers and starts the initial load again. |
Date Modified | Username | Field | Change |
---|---|---|---|
2016-10-19 14:48 | iberras | New Issue | |
2016-10-26 17:18 | elong | Note Added: 0000892 | |
2016-10-26 17:50 | iberras | Note Added: 0000893 | |
2016-10-26 17:54 | elong | Note Added: 0000894 | |
2016-10-26 17:57 | iberras | Note Added: 0000895 | |
2016-11-02 16:27 | elong | Note Added: 0000898 | |
2017-01-26 14:19 | iberras | Note Added: 0000964 | |
2017-02-01 01:05 | chenson | Status | new => closed |
2017-02-01 01:05 | chenson | Assigned To | => chenson |
2017-02-01 01:05 | chenson | Resolution | open => unable to reproduce |