View Issue Details
ID | Project | Category | View Status | Date Submitted | Last Update |
---|---|---|---|---|---|
0002669 | SymmetricDS | Improvement | public | 2016-07-04 07:46 | 2019-04-24 13:01 |
Reporter | woehrl01 | Assigned To | elong | ||
Priority | normal | ||||
Status | closed | Resolution | no change required | ||
Product Version | 3.7.0 | ||||
Target Version | 3.8.2 | Fixed in Version | 3.8.2 | ||
Summary | 0002669: Bad performance for getActiveTriggerHistories() and getTriggers(boolean replaceTokens) | ||||
Description | After profiling our databases I found that the two most expensive queries are coming from SymmetricDS. They are both in the TriggerRouter.java: getActiveTriggerHistories() getTriggers(boolean replaceTokens) I think both should use a cache. | ||||
Tags | No tags attached. | ||||
|
|
|
How long of a time period was used to collect those statistics? I don't think those methods are called directly, but are called from something like getTriggerRoutersCacheForCurrentNode(), which uses caching to avoid calling those methods. There is a parameter to control how long it lives in the cache, which is cache.trigger.router.time.ms=600000. It defaults to 10 minutes, so you could increase it to reduce the queries. It flushes the cache whenever it syncs the triggers. I increased the parameter and ran sync for a while without seeing those queries being run. We might be able to change it so that a non-clustered system will cache until the sync triggers is called. I think the cache timeout would still be necessary on a cluster, where sync triggers might run on just one node and the other nodes would still have their cache. So, see if increasing the parameter helps your system. I will bring up this issue with the team and see if we can cache until sync triggers on non-clustered systems. |
|
The time period is about 5-7 days. I'm using a push/pull, non-clustered setup. If I looked at the source correctly I think it occures from "DataService.insertHeartbeatEvent" I'll have a look for increasing the value. Besides increasing the cache timeout, do you have a tip what would be the best idea to analyse this better? |
|
I just looked deeper into the time period and it occures about every 10 seconds, which lead me to the router job. The root cause I think is the "insertInitialLoadEvents" if you have some nodes which are initial_load_enabled = 1 but aren't registered yet (sync_enabled = 0). I think it should return only nodes from findNodesThatAreReadyForInitialLoad which are sync_enabled = 1. which means adjusting the NodeServiceSqlMap "findNodeSecurityWithLoadEnabledSql" to include a join on sym_node. What do you think? I'd provide a PR if you want |
|
The other one is coming from ExtractRequestMapper which calls findTriggerRouterById without "false" which causes cache invalidation. If you have pending extract requests. |
|
Thanks for digging into it. The analysis can be tough -- I was running from Eclipse and adding a Thread.dumpStack() line wherever it runs those SQL statements so I could see where the calls where coming from. The calls from findNodesThatAreReadyForInitialLoad should be fixed in 3.8 because it's using a cache of NodeSecurity to find nodes with the initial load flag now (issue 2668 did that I believe). I see what you mean about ExtractRequestMapper. I can't think of any reason it needs to refresh like that, so I changed it to pass refreshCache=false for 3.8. |
|
I'm not that convinced that just caching findNodesThatAreReadyForInitialLoad is enough. As returning of nodes which are sync_enabled = 0 is still wrong, in my opinion. And returning any, lead to the call of the expensive query which could even be optimized in returning just bool as it just checks the result for count > 0. What do you think? |
|
I made a PR (https://github.com/JumpMind/symmetric-ds/pull/36) which fixes the second huge sql query from this ticket. This lead to a lot less database resource usages. Btw don't mind the sync_enabled = 0 thing, as it leads to different queries as long as those are not cached and additionally it leads to a different behaviour. |
|
Re-opening. Was accidentally closed when performing the release. |
|
I think this issue can be closed as the code has changed. |
Date Modified | Username | Field | Change |
---|---|---|---|
2016-07-04 07:46 | woehrl01 | New Issue | |
2016-07-04 07:46 | woehrl01 | File Added: worst_queries.png | |
2016-07-07 12:09 | elong | Note Added: 0000807 | |
2016-07-07 12:57 | woehrl01 | Note Added: 0000808 | |
2016-07-07 13:54 | woehrl01 | Note Added: 0000809 | |
2016-07-07 13:55 | woehrl01 | Note Edited: 0000809 | View Revisions |
2016-07-07 14:04 | woehrl01 | Note Edited: 0000809 | View Revisions |
2016-07-07 14:08 | woehrl01 | Note Added: 0000810 | |
2016-07-07 20:33 | elong | Note Added: 0000811 | |
2016-07-07 20:34 | elong | Assigned To | => elong |
2016-07-07 20:34 | elong | Status | new => assigned |
2016-07-07 20:35 | elong | Product Version | => 3.7.0 |
2016-07-07 20:35 | elong | Target Version | => 3.8.0 |
2016-07-07 21:00 | admin | Changeset attached | => SymmetricDS 3.8 8f70151a |
2016-07-07 21:17 | woehrl01 | Note Added: 0000812 | |
2016-07-07 21:49 | woehrl01 | Note Edited: 0000812 | View Revisions |
2016-07-18 21:05 | woehrl01 | Note Added: 0000819 | |
2016-08-08 15:13 | admin | Target Version | 3.8.0 => 3.8.1 |
2016-08-16 18:17 | elong | Target Version | 3.8.1 => 3.8.2 |
2016-09-30 01:01 | admin | Status | assigned => closed |
2016-09-30 14:28 | admin | Status | closed => new |
2016-09-30 14:28 | admin | Note Added: 0000872 | |
2018-04-26 18:04 | woehrl01 | Note Added: 0001170 | |
2019-04-24 13:01 | elong | Status | new => closed |
2019-04-24 13:01 | elong | Resolution | open => no change required |