View Issue Details
ID | Project | Category | View Status | Date Submitted | Last Update |
---|---|---|---|---|---|
0000962 | SymmetricDS | Bug | public | 2012-12-18 19:29 | 2019-04-26 17:43 |
Reporter | rotten | Assigned To | |||
Priority | high | ||||
Status | new | Resolution | open | ||
Product Version | 3.2.0 | ||||
Summary | 0000962: FK constraint error when syncing configuration change | ||||
Description | Because of the new foreign key constraints, changes to tables in the config channel need to be inserted in the remote nodes in the proper order. They do not appear to be. Simple updates on the registration server cause a backlog of config update failures due to FK constraint issues on the remote node. Further complicating this, is that when this happens the outgoing config batches get stuck in 'LD' status, and never throw an error flag. | ||||
Steps To Reproduce | Set up a simple replication pair, and try toggling the node_group_link around. Bring up a third node, and try adding triggers and trigger_routers and routes for it while replication is still running. Then try updating node_group_link. They all get jammed up. | ||||
Additional Information | I'm testing this with an Oracle 10G registration server, a PostgreSQL 9.1 node, and a MySQL 5.5 node using the production 3.2.0 release. | ||||
Tags | configuration, data sync | ||||
|
Oh, and all of the foreign keys are present on the registration server, so the config updates are being installed in the correct order there. |
|
Were you updating configuration on multiple nodes? Configuration is meant to be managed from a central node. |
|
I am updating the configuration from the registration server (central node). I've pushed changes directly to the other nodes only in experiments to try to clear the FK constraint complaints. When I originally posted this I thought I fully understood what was happening (simply a change ordering problem). Now I am less sure that is exactly the root cause. What I am observing is that some config channel changes seem to get stuck in 'LD' status in sym_outgoing_batch. Sent Count for those changes keeps climbing, seemingly indefinitely. [ Note: all of the JVM's are on the same server, communicating with plain HTTP. I don't believe network issues would cause the repeated send retries. ] At the same time, other config channel changes are showing up, and passing on through without any issues. (including heartbeats) [ but not all, sometimes they get stuck too ] So the later config changes, which would only be successful if the earlier ones hadn't gotten stuck in LD, sometimes throw the FK exceptions. Restarting the nodes and the registration server does not seem to help drain the stuck config channel outgoing batches. This morning I have this suspicion, which I'll test in a few minutes: 1) I'm currently doing push/pull once per second. 2) The "synchronizing triggers" activity, which happens after some config changes, takes 30 seconds to a minute to complete. 3) When a config change comes through that triggers a synchronizing triggers action on the remote node, the subsequent config changes get stuck. 4) Later, after the synchronizing triggers action is done on the remote node, new config channel changes usually go through ok, but the ones that got stuck, stay stuck. I'll back off to 1 minute push/pull times, to give the synchronizing triggers action time to complete, and then try a bunch of config changes and see if any still get stuck. |
|
Slowing down the push/pull from 1 second to 60 seconds seems to have helped prevent config threads from getting stuck in LD status. Next I'll try setting the push/pull thread counts to '1' to see if single threaded works more reliably than the default of '10' (at a 1 second refresh rate). |
|
Single vs multiple threaded did not seem to have the same impact as shortening the refresh rate. I think the problem is related to the repeated 'synchronizing triggers' activities, but I'm still not certain. One thing that confused and really slowed down my tests is this SymmetricDS caveat I'm adding to my operational notes: """ Changing node_group_link while the cluster is running is dangerous. Almost always it will break replication. Once configured, do NOT change node_group_link settings unless you absolutely have to, and when you do, manually change it on all nodes while replication is down. Remove it from from sym_data before starting the cluster back up. """ |
Date Modified | Username | Field | Change |
---|---|---|---|
2012-12-18 19:29 | rotten | New Issue | |
2012-12-18 19:30 | rotten | Note Added: 0000168 | |
2012-12-18 20:48 | chenson | Note Added: 0000169 | |
2012-12-19 13:49 | rotten | Note Added: 0000171 | |
2012-12-19 20:11 | rotten | Note Added: 0000172 | |
2012-12-20 14:28 | rotten | Note Added: 0000176 | |
2014-02-13 15:15 | elong | Priority | normal => high |
2019-04-22 14:42 | elong | Tag Attached: configuration | |
2019-04-22 14:42 | elong | Tag Attached: data sync | |
2019-04-26 17:43 | elong | Summary | config replication trips over itself => FK constraint error when syncing configuration change |