0000962: FK constraint error when syncing configuration change - SymmetricDS - Issue Tracker

ID	Project	Category	View Status	Date Submitted	Last Update

0000962	SymmetricDS	Bug	public	2012-12-18 19:29	2019-04-26 17:43

Reporter	rotten	Assigned To
Priority	high
Status	new	Resolution	open
Product Version	3.2.0

Summary	0000962: FK constraint error when syncing configuration change
Description	Because of the new foreign key constraints, changes to tables in the config channel need to be inserted in the remote nodes in the proper order. They do not appear to be. Simple updates on the registration server cause a backlog of config update failures due to FK constraint issues on the remote node. Further complicating this, is that when this happens the outgoing config batches get stuck in 'LD' status, and never throw an error flag.
Steps To Reproduce	Set up a simple replication pair, and try toggling the node_group_link around. Bring up a third node, and try adding triggers and trigger_routers and routes for it while replication is still running. Then try updating node_group_link. They all get jammed up.
Additional Information	I'm testing this with an Oracle 10G registration server, a PostgreSQL 9.1 node, and a MySQL 5.5 node using the production 3.2.0 release.
Tags	configuration, data sync

rotten 2012-12-18 19:30 reporter ~0000168	Oh, and all of the foreign keys are present on the registration server, so the config updates are being installed in the correct order there.

chenson 2012-12-18 20:48 administrator ~0000169	Were you updating configuration on multiple nodes? Configuration is meant to be managed from a central node.

rotten 2012-12-19 13:49 reporter ~0000171	I am updating the configuration from the registration server (central node). I've pushed changes directly to the other nodes only in experiments to try to clear the FK constraint complaints. When I originally posted this I thought I fully understood what was happening (simply a change ordering problem). Now I am less sure that is exactly the root cause. What I am observing is that some config channel changes seem to get stuck in 'LD' status in sym_outgoing_batch. Sent Count for those changes keeps climbing, seemingly indefinitely. [ Note: all of the JVM's are on the same server, communicating with plain HTTP. I don't believe network issues would cause the repeated send retries. ] At the same time, other config channel changes are showing up, and passing on through without any issues. (including heartbeats) [ but not all, sometimes they get stuck too ] So the later config changes, which would only be successful if the earlier ones hadn't gotten stuck in LD, sometimes throw the FK exceptions. Restarting the nodes and the registration server does not seem to help drain the stuck config channel outgoing batches. This morning I have this suspicion, which I'll test in a few minutes: 1) I'm currently doing push/pull once per second. 2) The "synchronizing triggers" activity, which happens after some config changes, takes 30 seconds to a minute to complete. 3) When a config change comes through that triggers a synchronizing triggers action on the remote node, the subsequent config changes get stuck. 4) Later, after the synchronizing triggers action is done on the remote node, new config channel changes usually go through ok, but the ones that got stuck, stay stuck. I'll back off to 1 minute push/pull times, to give the synchronizing triggers action time to complete, and then try a bunch of config changes and see if any still get stuck.

rotten 2012-12-19 20:11 reporter ~0000172	Slowing down the push/pull from 1 second to 60 seconds seems to have helped prevent config threads from getting stuck in LD status. Next I'll try setting the push/pull thread counts to '1' to see if single threaded works more reliably than the default of '10' (at a 1 second refresh rate).

rotten 2012-12-20 14:28 reporter ~0000176	Single vs multiple threaded did not seem to have the same impact as shortening the refresh rate. I think the problem is related to the repeated 'synchronizing triggers' activities, but I'm still not certain. One thing that confused and really slowed down my tests is this SymmetricDS caveat I'm adding to my operational notes: """ Changing node_group_link while the cluster is running is dangerous. Almost always it will break replication. Once configured, do NOT change node_group_link settings unless you absolutely have to, and when you do, manually change it on all nodes while replication is down. Remove it from from sym_data before starting the cluster back up. """

Date Modified	Username	Field	Change
2012-12-18 19:29	rotten	New Issue
2012-12-18 19:30	rotten	Note Added: 0000168
2012-12-18 20:48	chenson	Note Added: 0000169
2012-12-19 13:49	rotten	Note Added: 0000171
2012-12-19 20:11	rotten	Note Added: 0000172
2012-12-20 14:28	rotten	Note Added: 0000176
2014-02-13 15:15	elong	Priority	normal => high
2019-04-22 14:42	elong	Tag Attached: configuration
2019-04-22 14:42	elong	Tag Attached: data sync
2019-04-26 17:43	elong	Summary	config replication trips over itself => FK constraint error when syncing configuration change