View Issue Details

IDProjectCategoryView StatusLast Update
0002869SymmetricDSBugpublic2017-02-01 01:05
Reporteriberras Assigned Tochenson  
Priorityhigh 
Status closedResolutionunable to reproduce 
Product Version3.8.5 
Summary0002869: Memory overflow after EOF
DescriptionThis is happening all the time when there is an EOF when a reload is occurring. On the client side i see that there is a missing table and that the batch could not be loaded:

INFO PushService - Pushed data to node s-serverdevelopment:000:000. 1 data and 1 batches were processed
INFO IncomingBatchService - Retrying batch 000-3470
ERROR DataLoaderService - Failed to load batch 000-3470
org.jumpmind.db.sql.SqlException: Could not find the target table EMB_EVOLUCION
    at org.jumpmind.symmetric.io.data.writer.AbstractDatabaseWriter.write(AbstractDatabaseWriter.java:215)
    at org.jumpmind.symmetric.io.data.writer.NestedDataWriter.write(NestedDataWriter.java:64)
    at org.jumpmind.symmetric.model.ProcessInfoDataWriter.write(ProcessInfoDataWriter.java:65)
    at org.jumpmind.symmetric.io.data.writer.NestedDataWriter.write(NestedDataWriter.java:64)
    at org.jumpmind.symmetric.io.data.writer.TransformWriter.write(TransformWriter.java:207)
    at org.jumpmind.symmetric.io.data.DataProcessor.forEachDataInTable(DataProcessor.java:200)
    at org.jumpmind.symmetric.io.data.DataProcessor.forEachTableInBatch(DataProcessor.java:170)
    at org.jumpmind.symmetric.io.data.DataProcessor.process(DataProcessor.java:116)
    at org.jumpmind.symmetric.service.impl.DataLoaderService.loadDataFromTransport(DataLoaderService.java:526)
    at org.jumpmind.symmetric.service.impl.DataLoaderService.loadDataFromPull(DataLoaderService.java:285)
    at org.jumpmind.symmetric.service.impl.PullService.execute(PullService.java:126)
    at org.jumpmind.symmetric.service.impl.NodeCommunicationService$1.run(NodeCommunicationService.java:479)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)



And on the server side it gets stuck util there is an out of memory error with this message:




NodeId=000,targetNodeId=002,channelId=default,status=Transferring,startTime=Wed Oct 19 11:40:44 ART 2016
2016-10-19 11:40:52,609 INFO [serverdevelopment] [DataLoaderService] [qtp1087019465-20] 1 data and 1 batches loaded during push request from c-serverdevelopment:002:002.
2016-10-19 11:40:52,972 INFO [serverdevelopment] [RouterService] [serverdevelopment-job-6] Routed 1 data events in 263 ms
2016-10-19 11:40:56,447 ERROR [serverdevelopment] [AcknowledgeService] [qtp1087019465-410] The outgoing batch 002-3470 failed: Could not find the target table EMB_EVOLUCION
2016-10-19 11:40:56,639 INFO [serverdevelopment] [DataLoaderService] [qtp1087019465-20] 1 data and 1 batches loaded during push request from c-serverdevelopment:002:002.
2016-10-19 11:40:56,844 INFO [serverdevelopment] [RouterService] [serverdevelopment-job-5] Routed 1 data events in 77 ms
2016-10-19 11:40:58,388 WARN [serverdevelopment] [StatisticManager] [qtp1087019465-395] Starting a new process even though the previous 'Service Database Pull' process had not finished
2016-10-19 11:40:58,388 INFO [serverdevelopment] [StatisticManager] [qtp1087019465-395] Details from the previous process: processType=Service Database Pull,sourceNodeId=000,targetNodeId=002,channelId=default,status=Transferring,startTime=Wed Oct 19 11:40:49 ART 2016
2016-10-19 11:41:01,744 ERROR [serverdevelopment] [AcknowledgeService] [qtp1087019465-410] The outgoing batch 002-3470 failed: Could not find the target table EMB_EVOLUCION
2016-10-19 11:41:01,852 INFO [serverdevelopment] [DataLoaderService] [qtp1087019465-20] 1 data and 1 batches loaded during push request from c-serverdevelopment:002:002.
2016-10-19 11:41:01,957 INFO [serverdevelopment] [RouterService] [serverdevelopment-job-8] Routed 1 data events in 69 ms
2016-10-19 11:41:04,385 WARN [serverdevelopment] [ConcurrentConnectionManager] [qtp1087019465-415] Node '002' Channel 'default' requested a /sync/serverdevelopment/pull connection, but was rejected because it already has one
2016-10-19 11:41:04,385 WARN [serverdevelopment] [StatisticManager] [qtp1087019465-415] Starting a new process even though the previous 'Service Database Pull' process had not finished
2016-10-19 11:41:04,386 INFO [serverdevelopment] [StatisticManager] [qtp1087019465-415] Details from the previous process: processType=Service Database Pull,sourceNodeId=000,targetNodeId=002,channelId=default,status=Transferring,startTime=Wed Oct 19 11:40:58 ART 2016
2016-10-19 11:41:07,642 ERROR [serverdevelopment] [AcknowledgeService] [qtp1087019465-410] The outgoing batch 002-3470 failed: Could not find the target table EMB_EVOLUCION
2016-10-19 11:41:07,662 INFO [serverdevelopment] [DataLoaderService] [qtp1087019465-20] 1 data and 1 batches loaded during push request from c-serverdevelopment:002:002.
2016-10-19 11:41:08,097 INFO [serverdevelopment] [RouterService] [serverdevelopment-job-5] Routed 1 data events in 77 ms
2016-10-19 11:41:10,387 WARN [serverdevelopment] [ConcurrentConnectionManager] [qtp1087019465-418] Node '002' Channel 'default' requested a /sync/serverdevelopment/pull connection, but was rejected because it already has one
2016-10-19 11:41:10,388 WARN [serverdevelopment] [StatisticManager] [qtp1087019465-418] Starting a new process even though the previous 'Service Database Pull' process had not finished
2016-10-19 11:41:10,388 INFO [serverdevelopment] [StatisticManager] [qtp1087019465-418] Details from the previous process: processType=Service Database Pull,sourceNodeId=000,targetNodeId=002,channelId=default,status=Transferring,startTime=Wed Oct 19 11:41:04 ART 2016
2016-10-19 11:41:14,543 ERROR [serverdevelopment] [AcknowledgeService] [qtp1087019465-20] The outgoing batch 002-3470 failed: Could not find the target table EMB_EVOLUCION
2016-10-19 11:41:14,669 INFO [serverdevelopment] [DataLoaderService] [qtp1087019465-410] 1 data and 1 batches loaded during push request from c-serverdevelopment:002:002.
2016-10-19 11:41:15,016 INFO [serverdevelopment] [RouterService] [serverdevelopment-job-7] Routed 1 data events in 366 ms
2016-10-19 11:41:16,395 WARN [serverdevelopment] [ConcurrentConnectionManager] [qtp1087019465-422] Node '002' Channel 'default' requested a /sync/serverdevelopment/pull connection, but was rejected because it already has one
2016-10-19 11:41:16,395 WARN [serverdevelopment] [StatisticManager] [qtp1087019465-422] Starting a new process even though the previous 'Service Database Pull' process had not finished
2016-10-19 11:41:16,395 INFO [serverdevelopment] [StatisticManager] [qtp1087019465-422] Details from the previous process: processType=Service Database Pull,sourceNodeId=000,targetNodeId=002,channelId=default,status=Transferring,startTime=Wed Oct 19 11:41:10 ART 2016
2016-10-19 11:41:22,377 ERROR [serverdevelopment] [AcknowledgeService] [qtp1087019465-20] The outgoing batch 002-3470 failed: Could not find the target table EMB_EVOLUCION
2016-10-19 11:41:22,456 INFO [serverdevelopment] [DataLoaderService] [qtp1087019465-410] 1 data and 1 batches loaded during push request from c-serverdevelopment:002:002.
2016-10-19 11:41:22,648 INFO [serverdevelopment] [RouterService] [serverdevelopment-job-6] Routed 1 data events in 62 ms
2016-10-19 11:41:25,404 WARN [serverdevelopment] [ConcurrentConnectionManager] [qtp1087019465-425] Node '002' Channel 'default' requested a /sync/serverdevelopment/pull connection, but was rejected because it already has one
2016-10-19 11:41:25,405 WARN [serverdevelopment] [StatisticManager] [qtp1087019465-425] Starting a new process even though the previous 'Service Database Pull' process had not finished
2016-10-19 11:41:25,405 INFO [serverdevelopment] [StatisticManager] [qtp1087019465-425] Details from the previous process: processType=Service Database Pull,sourceNodeId=000,targetNodeId=002,channelId=default,status=Transferring,startTime=Wed Oct 19 11:41:16 ART 2016
2016-10-19 11:41:29,364 ERROR [serverdevelopment] [AcknowledgeService] [qtp1087019465-20] The outgoing batch 002-3470 failed: Could not find the target table EMB_EVOLUCION
2016-10-19 11:41:29,713 INFO [serverdevelopment] [DataLoaderService] [qtp1087019465-410] 1 data and 1 batches loaded during push request from c-serverdevelopment:002:002.
2016-10-19 11:41:30,048 INFO [serverdevelopment] [RouterService] [serverdevelopment-job-13] Routed 1 data events in 342 ms

Then, after this error, the client cannot do a reload without getting stuck. Any work around? Any solutions?


Steps To ReproduceEOF in Reload.
TagsNo tags attached.

Activities

elong

2016-10-26 17:18

developer   ~0000892

When a batch goes into error, it will block that channel until the error is fixed. If batch 000-3470 is loading table EMB_EVOLUCION, then the table needs to exist, otherwise it will get an error over and over.

iberras

2016-10-26 17:50

reporter   ~0000893

But this is happening in the initial load. The table does not exists, its being created when this error occurs. Clearly the table doesnt exist, but symmetric should take care of this when an initial load is occurring. What should be done to avoid this?

elong

2016-10-26 17:54

developer   ~0000894

It doesn't create the table by default, but you can tell it to do that. Set the parameter initial.load.create.first=true in your engine properties file or in the sym_parameter table.

iberras

2016-10-26 17:57

reporter   ~0000895

That parameter is set to true. All the tables are loaded and created correctly, all the time, in all the test i have done. But when i interrupt an initial load and an EOF occurs, this errors starts to happen.

elong

2016-11-02 16:27

developer   ~0000898

I tried steps of setting initial.load.create.first=true, sending an initial load, and then interrupting it, but everything ran okay for me. I must be missing a step, or maybe the timing needs to be just right.

When the error starts to happen, can you take a look at what the batches are for that initial load? They will all have the same load_id. There should be a batch for creating the table (event_type of C) and a batch for sending data to the table (event_type of R). The batches should load in order and the status will be set OK and they are processed.

iberras

2017-01-26 14:19

reporter   ~0000964

You can close this. If you leave Symmetric by itself for 3-5 minutes in that "busy" state, the system recovers and starts the initial load again.

Issue History

Date Modified Username Field Change
2016-10-19 14:48 iberras New Issue
2016-10-26 17:18 elong Note Added: 0000892
2016-10-26 17:50 iberras Note Added: 0000893
2016-10-26 17:54 elong Note Added: 0000894
2016-10-26 17:57 iberras Note Added: 0000895
2016-11-02 16:27 elong Note Added: 0000898
2017-01-26 14:19 iberras Note Added: 0000964
2017-02-01 01:05 chenson Status new => closed
2017-02-01 01:05 chenson Assigned To => chenson
2017-02-01 01:05 chenson Resolution open => unable to reproduce