- Published on Sunday, 14 October 2012 10:05
- Written by Eric Long
Replicating the database schema on remote nodes and syncing schema changes to them when tables are altered is possible with SymmetricDS using a couple of simple techniques. First, tables can be created and populated as an initial load of data when the node is registered. Second, table definitions can be sent to nodes as needed to create or alter tables using the data definition language (DDL) of the database.
Creating Tables During Initial Load
The first way to replicate the schema to a remote node is to send the table definitions at the beginning of the initial load. The initial load is the process of sending and loading each table that you've configured for synchronization. Usually the initial load is requested by the user on the command line, but it can also be automatically sent when a new node is registered (see the "auto.reload" property in the manual). To change the behavior of the initial load so that it sends table definitions, set the following property:
With that property in place on your server, you can send an initial load to a remote node, and it will also send table definitions to create the tables. For example, sending an initial load to node with ID of "00001":
Syncing Schema Changes
At some point you need to change your database schema and continue to synchronize data. You may need to create new tables, drop old ones, or alter existing ones. There are 3 steps to syncing schema changes:
- Step 1: Alter Database
- Run the data definition language (DDL) on the central database first. This should be done during a reasonably quiet period without contention. Create, drop, and alter tables, but don't update any data yet.
- Step 2: Sync Triggers
Run the "Sync Triggers" process on SymmetricDS so it can detect the schema change. It will recreate the triggers to match the table definitions. An easy way to do this is to restart the service, or by running the following command:
- Step 3: Send Schema
Run the symadmin command with send-schema subcommand to send the schema change out to remote nodes. This command detects what changes to make to the table and works across different database platforms. You can specify a specific node, like this:
symadmin send-schema --node 00001 mytableOr send it to all nodes in a group, like this:symadmin send-schema --node-group STORES mytable
- Step 3 (Alternative): Send SQL
If you know the specific SQL to run for your database, you can run the symadmin command with send-sql subcommand to send the SQL change out to remote nodes. This subcommand supports sending to a single node or group of nodes. You can specify a specific node, like this:
symadmin send-sql --node 00001 mytable "alter table mytable add newcolumn varchar(100)"
Now that triggers match the tables, you can proceed with updating data. The schema changes are queued in the change data capture log just like other changes, so they will be played back in the same order at other nodes.
- Drop Table
- Because a trigger creates a dependency on the table, you should remove the entry from sym_trigger and run "Sync Triggers" so SymmetricDS can drop the trigger first. During data synchronization, if a node receives data for a table that no longer exists, it is just ignored.
- Create Table
- Remember to add an entry in sym_trigger for change data capture. During data synchronization, if a node receives data for a table that doesn't exist yet, it is just ignored.
- Add Column
- Be careful defining a column with a "default" value or "not null". Some database systems will update each row of the table with the default value, causing all the data to be captured. Either avoid using a "default" or temporarily disable the data capture system while adding the column. A new column that is not nullable can create problems for clients that haven't received the schema update. During data synchronization, if a node receives data that is missing a new column, the value for that column is loaded as null. If the new column is not nullable, the database will reject it and the batch will be in error. Leave new columns nullable until the schema changes roll out completely.
- Drop Column
- During data synchronization, if a node receives data for a column that no longer exists, it is just ignored, and the rest of the row loads normally.
- Modify Column
- Be careful when changing to a different data type, decreasing the size, or adding restrictions. These modifications can be incompatible with data from nodes that haven't upgraded their schema yet.
Understanding how SymmetricDS detects schema change and automatically adjusts to different schemas can help you plan a roll out. Since data synchronization is asynchronous, there may be data changes captured at a node that have not been sent yet. The data may be captured using the old schema, while other nodes have upgraded to the new schema. In some situations, it may be best to ensure there are no outstanding batches before upgrading, but in most cases, SymmetricDS will try to adapt to differences in schema. Here are some of the problems to look out for with a schema upgrade:
For table alterations, the triggers become invalid and prevent data modification until the "Sync Triggers" process is run. If data synchronization gets an error, it will be retried, but applications may not retry errors, so perform table alterations during a maintenance period. Be careful with conversion or upgrade scripts that both alter tables and modify data in one process, because they will get errors from invalid triggers. If the conversion script is being run on each node, and data capture is not needed, the triggers can be dropped during conversion. When SymmetricDS service is started, it detects missing triggers and automatically creates them.
How It Works
Both the send-schema and send-sql subcommands use the sym_data table (the same place where change data is recorded) to get schema changes to remote nodes. This preserves the order of changes, so a schema change is played back first, followed by data changes using the new schema.
For send-schema, the event_type is "C" (for create) and the row_data stores the Torque XML table definition. When the event reaches the remote node, it passes the XML to the Torque library to translate it into data definition language for the specific database. If the target table is missing, it is created. If the table already exists, then it is altered. If an alter is not allowed by the database, the table is backed up, recreated with the new definition, and the data is restored.
For send-sql, the event_type is "S" (for SQL) and the row_data stores the SQL command specified by the user. Be careful to send the right SQL that is compatible with the database at the remote node. The subcommand doesn't check syntax or compatibility, so it is possible to put a batch in error with a typo or invalid syntax.