View Issue Details

IDProjectCategoryView StatusLast Update
0002669SymmetricDSImprovementpublic2019-04-24 13:01
Reporterwoehrl01 Assigned Toelong  
Prioritynormal 
Status closedResolutionno change required 
Product Version3.7.0 
Target Version3.8.2Fixed in Version3.8.2 
Summary0002669: Bad performance for getActiveTriggerHistories() and getTriggers(boolean replaceTokens)
DescriptionAfter profiling our databases I found that the two most expensive queries are coming from SymmetricDS.

They are both in the TriggerRouter.java:

getActiveTriggerHistories()

getTriggers(boolean replaceTokens)

I think both should use a cache.
TagsNo tags attached.

Activities

woehrl01

2016-07-04 07:46

reporter  

worst_queries.png (8,365 bytes)   
worst_queries.png (8,365 bytes)   

elong

2016-07-07 12:09

developer   ~0000807

How long of a time period was used to collect those statistics?

I don't think those methods are called directly, but are called from something like getTriggerRoutersCacheForCurrentNode(), which uses caching to avoid calling those methods. There is a parameter to control how long it lives in the cache, which is cache.trigger.router.time.ms=600000. It defaults to 10 minutes, so you could increase it to reduce the queries. It flushes the cache whenever it syncs the triggers. I increased the parameter and ran sync for a while without seeing those queries being run.

We might be able to change it so that a non-clustered system will cache until the sync triggers is called. I think the cache timeout would still be necessary on a cluster, where sync triggers might run on just one node and the other nodes would still have their cache.

So, see if increasing the parameter helps your system. I will bring up this issue with the team and see if we can cache until sync triggers on non-clustered systems.

woehrl01

2016-07-07 12:57

reporter   ~0000808

The time period is about 5-7 days. I'm using a push/pull, non-clustered setup.

If I looked at the source correctly I think it occures from "DataService.insertHeartbeatEvent"

I'll have a look for increasing the value. Besides increasing the cache timeout, do you have a tip what would be the best idea to analyse this better?

woehrl01

2016-07-07 13:54

reporter   ~0000809

Last edited: 2016-07-07 14:04

View 3 revisions

I just looked deeper into the time period and it occures about every 10 seconds, which lead me to the router job.

The root cause I think is the "insertInitialLoadEvents" if you have some nodes which are initial_load_enabled = 1 but aren't registered yet (sync_enabled = 0).

I think it should return only nodes from findNodesThatAreReadyForInitialLoad which are sync_enabled = 1.

which means adjusting the NodeServiceSqlMap "findNodeSecurityWithLoadEnabledSql" to include a join on sym_node. What do you think?

I'd provide a PR if you want

woehrl01

2016-07-07 14:08

reporter   ~0000810

The other one is coming from ExtractRequestMapper which calls findTriggerRouterById without "false" which causes cache invalidation. If you have pending extract requests.

elong

2016-07-07 20:33

developer   ~0000811

Thanks for digging into it. The analysis can be tough -- I was running from Eclipse and adding a Thread.dumpStack() line wherever it runs those SQL statements so I could see where the calls where coming from.

The calls from findNodesThatAreReadyForInitialLoad should be fixed in 3.8 because it's using a cache of NodeSecurity to find nodes with the initial load flag now (issue 2668 did that I believe).

I see what you mean about ExtractRequestMapper. I can't think of any reason it needs to refresh like that, so I changed it to pass refreshCache=false for 3.8.

woehrl01

2016-07-07 21:17

reporter   ~0000812

Last edited: 2016-07-07 21:49

View 2 revisions

I'm not that convinced that just caching findNodesThatAreReadyForInitialLoad is enough. As returning of nodes which are sync_enabled = 0 is still wrong, in my opinion. And returning any, lead to the call of the expensive query which could even be optimized in returning just bool as it just checks the result for count > 0. What do you think?

woehrl01

2016-07-18 21:05

reporter   ~0000819

I made a PR (https://github.com/JumpMind/symmetric-ds/pull/36) which fixes the second huge sql query from this ticket. This lead to a lot less database resource usages.

Btw don't mind the sync_enabled = 0 thing, as it leads to different queries as long as those are not cached and additionally it leads to a different behaviour.

admin

2016-09-30 14:28

administrator   ~0000872

Re-opening. Was accidentally closed when performing the release.

woehrl01

2018-04-26 18:04

reporter   ~0001170

I think this issue can be closed as the code has changed.

Related Changesets

SymmetricDS: 3.8 8f70151a

2016-07-07 16:16:36

admin

Details Diff
0002669: Bad performance for getActiveTriggerHistories() and
getTriggers(boolean replaceTokens)
Affected Issues
0002669
mod - symmetric-core/src/main/java/org/jumpmind/symmetric/service/impl/DataExtractorService.java Diff File

Issue History

Date Modified Username Field Change
2016-07-04 07:46 woehrl01 New Issue
2016-07-04 07:46 woehrl01 File Added: worst_queries.png
2016-07-07 12:09 elong Note Added: 0000807
2016-07-07 12:57 woehrl01 Note Added: 0000808
2016-07-07 13:54 woehrl01 Note Added: 0000809
2016-07-07 13:55 woehrl01 Note Edited: 0000809 View Revisions
2016-07-07 14:04 woehrl01 Note Edited: 0000809 View Revisions
2016-07-07 14:08 woehrl01 Note Added: 0000810
2016-07-07 20:33 elong Note Added: 0000811
2016-07-07 20:34 elong Assigned To => elong
2016-07-07 20:34 elong Status new => assigned
2016-07-07 20:35 elong Product Version => 3.7.0
2016-07-07 20:35 elong Target Version => 3.8.0
2016-07-07 21:00 admin Changeset attached => SymmetricDS 3.8 8f70151a
2016-07-07 21:17 woehrl01 Note Added: 0000812
2016-07-07 21:49 woehrl01 Note Edited: 0000812 View Revisions
2016-07-18 21:05 woehrl01 Note Added: 0000819
2016-08-08 15:13 admin Target Version 3.8.0 => 3.8.1
2016-08-16 18:17 elong Target Version 3.8.1 => 3.8.2
2016-09-30 01:01 admin Status assigned => closed
2016-09-30 14:28 admin Status closed => new
2016-09-30 14:28 admin Note Added: 0000872
2018-04-26 18:04 woehrl01 Note Added: 0001170
2019-04-24 13:01 elong Status new => closed
2019-04-24 13:01 elong Resolution open => no change required