View Issue Details

IDProjectCategoryView StatusLast Update
0002657SymmetricDSBugpublic2016-07-22 12:03
Reporterivan.konevAssigned Tochenson 
PrioritynormalSeveritymajorReproducibilityalways
Status closedResolutionfixed 
Product Version3.7.34 
Target Version3.7.36Fixed in Version3.7.36 
Summary0002657: Nodes not cleaning staged resources of any type
DescriptionAfter several crashes, i've investigated that memory dumps contains large portions of batches in StagingManager form. I've enabled logs of this class using
    <category name="org.jumpmind.symmetric.io.stage">
        <priority value="TRACE" />
    </category>
and investigated this:
2016-06-29 17:48:40,822 TRACE [l1-24_25-4_15] [StagingManager] [l1-24_25-4_15-job-4] Cleaning staging area
2016-06-29 17:48:40,822 DEBUG [l1-24_25-4_15] [StagingManager] [l1-24_25-4_15-job-4] Purged 1 staged memory buffers, freeing 0 bytes of memory
2016-06-29 17:48:42,523 TRACE [l1-24-7] [StagingManager] [l1-24-7-job-12] Cleaning staging area
2016-06-29 17:48:42,523 DEBUG [l1-24-7] [StagingManager] [l1-24-7-job-12] Purged 1 staged memory buffers, freeing 0 bytes of memory
2016-06-29 17:48:42,540 TRACE [l1-24-13] [StagingManager] [l1-24-13-job-11] Cleaning staging area
2016-06-29 17:48:42,540 DEBUG [l1-24-13] [StagingManager] [l1-24-13-job-11] Purged 1 staged memory buffers, freeing 0 bytes of memory
2016-06-29 17:48:42,631 TRACE [l1-24_25-1_18] [StagingManager] [l1-24_25-1_18-job-11] Cleaning staging area
2016-06-29 17:48:42,631 DEBUG [l1-24_25-1_18] [StagingManager] [l1-24_25-1_18-job-11] Purged 1 staged memory buffers, freeing 0 bytes of memory
2016-06-29 17:48:43,806 TRACE [l1-25-21] [StagingManager] [l1-25-21-job-12] Cleaning staging area
2016-06-29 17:48:43,806 DEBUG [l1-25-21] [StagingManager] [l1-25-21-job-12] Purged 1 staged memory buffers, freeing 0 bytes of memory
2016-06-29 17:48:45,182 TRACE [l1-24-25] [StagingManager] [l1-24-25-job-11] Cleaning staging area
2016-06-29 17:48:45,183 DEBUG [l1-24-25] [StagingManager] [l1-24-25-job-11] Purged 1 staged memory buffers, freeing 0 bytes of memory
2016-06-29 17:48:45,421 TRACE [l1-24_25-2_17] [StagingManager] [l1-24_25-2_17-job-11] Cleaning staging area
2016-06-29 17:48:45,421 DEBUG [l1-24_25-2_17] [StagingManager] [l1-24_25-2_17-job-11] Purged 1 staged memory buffers, freeing 0 bytes of memory
2016-06-29 17:48:45,659 TRACE [l1-24-11] [StagingManager] [l1-24-11-job-11] Cleaning staging area
2016-06-29 17:48:45,659 DEBUG [l1-24-11] [StagingManager] [l1-24-11-job-11] Purged 1 staged memory buffers, freeing 0 bytes of memory
2016-06-29 17:48:46,576 TRACE [l1-24-9] [StagingManager] [l1-24-9-job-11] Cleaning staging area
2016-06-29 17:48:46,577 DEBUG [l1-24-9] [StagingManager] [l1-24-9-job-11] Purged 1 staged memory buffers, freeing 0 bytes of memory
2016-06-29 17:48:47,032 TRACE [l1-25-24] [StagingManager] [l1-25-24-job-2] Cleaning staging area
2016-06-29 17:48:47,032 DEBUG [l1-25-24] [StagingManager] [l1-25-24-job-2] Purged 1 staged memory buffers, freeing 0 bytes of memory
2016-06-29 17:48:47,670 TRACE [l1-24-6] [StagingManager] [l1-24-6-job-11] Cleaning staging area
2016-06-29 17:48:47,670 DEBUG [l1-24-6] [StagingManager] [l1-24-6-job-11] Purged 1 staged memory buffers, freeing 0 bytes of memory
2016-06-29 17:48:47,825 TRACE [l1-24-12] [StagingManager] [l1-24-12-job-3] Cleaning staging area
2016-06-29 17:48:47,825 DEBUG [l1-24-12] [StagingManager] [l1-24-12-job-3] Purged 1 staged memory buffers, freeing 0 bytes of memory
2016-06-29 17:48:48,173 TRACE [l1-25-23] [StagingManager] [l1-25-23-job-11] Cleaning staging area
2016-06-29 17:48:48,173 DEBUG [l1-25-23] [StagingManager] [l1-25-23-job-11] Purged 1 staged memory buffers, freeing 0 bytes of memory
2016-06-29 17:48:48,476 TRACE [l1-25-20] [StagingManager] [l1-25-20-job-2] Cleaning staging area
2016-06-29 17:48:48,477 DEBUG [l1-25-20] [StagingManager] [l1-25-20-job-2] Purged 1 staged memory buffers, freeing 0 bytes of memory
2016-06-29 17:48:48,852 TRACE [l1-24_25-5_14] [StagingManager] [l1-24_25-5_14-job-11] Cleaning staging area
2016-06-29 17:48:48,853 DEBUG [l1-24_25-5_14] [StagingManager] [l1-24_25-5_14-job-11] Purged 1 staged memory buffers, freeing 0 bytes of memory
2016-06-29 17:48:49,150 TRACE [l1-24_25-3_16] [StagingManager] [l1-24_25-3_16-job-11] Cleaning staging area
2016-06-29 17:48:49,151 DEBUG [l1-24_25-3_16] [StagingManager] [l1-24_25-3_16-job-11] Purged 1 staged memory buffers, freeing 0 bytes of memory
2016-06-29 17:48:49,929 TRACE [l1-24-10] [StagingManager] [l1-24-10-job-2] Cleaning staging area
2016-06-29 17:48:49,930 DEBUG [l1-24-10] [StagingManager] [l1-24-10-job-2] Purged 1 staged memory buffers, freeing 0 bytes of memory
2016-06-29 17:48:50,578 TRACE [l1-25-19] [StagingManager] [l1-25-19-job-4] Cleaning staging area
2016-06-29 17:48:50,579 DEBUG [l1-25-19] [StagingManager] [l1-25-19-job-4] Purged 1 staged memory buffers, freeing 0 bytes of memory
2016-06-29 17:48:50,760 TRACE [l1-25-22] [StagingManager] [l1-25-22-job-4] Cleaning staging area
2016-06-29 17:48:50,760 DEBUG [l1-25-22] [StagingManager] [l1-25-22-job-4] Purged 1 staged memory buffers, freeing 0 bytes of memory

After cleaning up \r stuff from engines, setting ttl to 0 ms(now reverted back, isnt source of this), updating timezones, setting correct old format for .engine file names[group-node.properties] and resetting files permissions found that cleanup returned to normal. DISK streamed batches ALSO wasnt properly deleted, at ll, none of disk streamed baches was deleted after complete/ttl+ready, but used and worked fine, just not deleted after .done state.

2016-06-29 22:42:54,902 TRACE [l1-24-25] [StagingManager] [l1-24-25-job-5] Cleaning staging area
2016-06-29 22:42:54,902 DEBUG [l1-24-25] [StagingManager] [l1-24-25-job-5] Purged 5 staged memory buffers, freeing 790 kbytes of memory
2016-06-29 22:42:54,955 TRACE [l1-24-13] [StagingManager] [l1-24-13-job-5] Cleaning staging area
2016-06-29 22:42:54,956 DEBUG [l1-24-13] [StagingManager] [l1-24-13-job-5] Purged 3 staged memory buffers, freeing 241 kbytes of memory
2016-06-29 22:42:55,126 TRACE [l1-25-21] [StagingManager] [l1-25-21-job-5] Cleaning staging area
2016-06-29 22:42:56,395 TRACE [l1-24-10] [StagingManager] [l1-24-10-job-5] Cleaning staging area
2016-06-29 22:42:57,311 TRACE [l1-24-12] [StagingManager] [l1-24-12-job-3] Cleaning staging area
2016-06-29 22:42:57,311 DEBUG [l1-24-12] [StagingManager] [l1-24-12-job-3] Purged 4 staged memory buffers, freeing 667 kbytes of memory
2016-06-29 22:42:57,316 TRACE [l1-24-6] [StagingManager] [l1-24-6-job-20] Cleaning staging area
2016-06-29 22:42:57,316 DEBUG [l1-24-6] [StagingManager] [l1-24-6-job-20] Purged 8 staged memory buffers, freeing 1469 kbytes of memory
2016-06-29 22:42:58,390 TRACE [l1-24-9] [StagingManager] [l1-24-9-job-16] Cleaning staging area
2016-06-29 22:42:59,441 TRACE [l1-25-24] [StagingManager] [l1-25-24-job-10] Cleaning staging area
2016-06-29 22:42:59,441 DEBUG [l1-25-24] [StagingManager] [l1-25-24-job-10] Purged 8 staged memory buffers, freeing 1198 kbytes of memory
2016-06-29 22:42:59,476 TRACE [l1-25-23] [StagingManager] [l1-25-23-job-6] Cleaning staging area
2016-06-29 22:42:59,781 TRACE [l1-24-11] [StagingManager] [l1-24-11-job-14] Cleaning staging area
2016-06-29 22:42:59,782 DEBUG [l1-24-11] [StagingManager] [l1-24-11-job-14] Purged 4 staged memory buffers, freeing 465 kbytes of memory
2016-06-29 22:42:59,992 TRACE [l1-25-19] [StagingManager] [l1-25-19-job-13] Cleaning staging area
2016-06-29 22:42:59,992 DEBUG [l1-25-19] [StagingManager] [l1-25-19-job-13] Purged 7 staged memory buffers, freeing 852 kbytes of memory

No any security modules was active. selinux off(real, not just config).
Steps To ReproduceProbably, it's caused by improper handling of \r in file name or something around this. Can give dumps, but its containing private data.
TagsNo tags attached.

Activities

ivan.konev

2016-06-29 16:07

reporter  

ivan.konev

2016-06-30 12:29

reporter   ~0000805

It's happening again after setting stream.to.file.ttl.ms=0 to 120000

It's sowewhere around in either StagedResource(wrong creation/update time) or cleanup ttl compare

boolean resourceIsOld = (System.currentTimeMillis() - resource
                            .getLastUpdateTime()) > ttlInMs;
                    if ((resource.getState() == State.DONE || (resource.getState() == State.READY && ttlInMs == 0))
                            && (resourceIsOld || !resource.exists())) {

in Staging Manager. Maybe something like refreshcachejob is updating staged resource ?

ivan.konev

2016-06-30 16:06

reporter   ~0000806

Last edited: 2016-06-30 16:07

View 2 revisions

Fastest way to reproduce - lower memory limit and make initial load not using reload channel, but dataloader, better more BLOB/CLOB data or just sized, but not amount of records. Amount of prepared transferred data will not be cleaned at least from source node and it's will be clearly seen using log settings above. After exhausting allowed java heap memory, symmetric will crash and restart.

It's all on default settings.

chenson

2016-07-13 12:09

administrator   ~0000813

Also wanted to note that common batches are never set to DONE so they would never get cleaned up.

chenson

2016-07-13 12:09

administrator   ~0000814

What do you have stream.to.file.threshold.bytes set to? Something big?

ivan.konev

2016-07-13 12:34

reporter   ~0000815

Something big, 10M or around. Anyway it's in StagedResourceManager, both types - file or memory not cleaned up at all, files keep resource over stream threshold on disk, other memory part not cleaned also, memory - fully in memory => more not streamed large batches, more leaks. Tried large reload batch with 600000 ttl, after this time and .done - file also not deleted.

It's great problem till some version >3.7.28 , persists in latest .33 also, .35 - not tried. I cant find which patch caused that in GIT, last idea was wrong.

ivan.konev

2016-07-13 12:38

reporter   ~0000816

Can't change "failed to reproduce" to 100% also.

ivan.konev

2016-07-13 12:42

reporter   ~0000817

Also tested for OS, SELinux and JRE version - does not matters, still same leak.

chenson

2016-07-13 21:36

administrator   ~0000818

I fixed the common batch issue which started in 3.7.34. Staging should get cleaned now.

Related Changesets

SymmetricDS: 3.7 4a16d95a

2016-07-13 21:28:05

chenson

Details Diff
0002657: Nodes not cleaning staged resources of any type
0002657
mod - symmetric-io/src/main/java/org/jumpmind/symmetric/io/stage/StagingManager.java Diff File

SymmetricDS: 3.8 ac7ba908

2016-07-13 21:28:05

chenson

Details Diff
0002657: Nodes not cleaning staged resources of any type
0002657
mod - symmetric-io/src/main/java/org/jumpmind/symmetric/io/stage/StagingManager.java Diff File

Issue History

Date Modified Username Field Change
2016-06-29 16:07 ivan.konev New Issue
2016-06-29 16:07 ivan.konev File Added: Screenshot 2016-06-29 22.56.37.png
2016-06-30 12:29 ivan.konev Note Added: 0000805
2016-06-30 16:06 ivan.konev Note Added: 0000806
2016-06-30 16:07 ivan.konev Note Edited: 0000806 View Revisions
2016-07-13 12:09 chenson Note Added: 0000813
2016-07-13 12:09 chenson Note Added: 0000814
2016-07-13 12:34 ivan.konev Note Added: 0000815
2016-07-13 12:38 ivan.konev Note Added: 0000816
2016-07-13 12:42 ivan.konev Note Added: 0000817
2016-07-13 21:19 chenson Reproducibility unable to reproduce => always
2016-07-13 21:33 chenson Fixed in Version => 3.7.36
2016-07-13 21:33 chenson Target Version => 3.7.36
2016-07-13 21:34 chenson Assigned To => chenson
2016-07-13 21:34 chenson Status new => assigned
2016-07-13 21:36 chenson Note Added: 0000818
2016-07-13 22:00 chenson Changeset attached => SymmetricDS 3.7 4a16d95a
2016-07-13 22:00 chenson Changeset attached => SymmetricDS 3.8 ac7ba908
2016-07-21 06:51 chenson Status assigned => resolved
2016-07-21 06:51 chenson Resolution open => fixed
2016-07-22 12:03 chenson Status resolved => closed