Chapter 6. Administration

Table of Contents

6.1. Solving Synchronization Issues
6.1.1. Analyzing the Issue - Outgoing Batches
6.1.2. Analyzing the Issue - Incoming Batches
6.1.3. Resolving the Issue - Outgoing Batches
6.1.4. Resolving the Issue - Incoming Batches
6.2. Changing Triggers
6.3. Maintaining multiple sychronization configurations through Grouplets
6.3.1. Grouplet Example
6.4. Re-synchronizing Data
6.5. Changing Configuration
6.6. Logging Configuration
6.7. Java Management Extensions
6.8. Temporary Files
6.9. Database Purging

6.1. Solving Synchronization Issues

By design, whenever SymmetricDS encounters an issue with a synchronization, the batch containing the error is marked as being in an error state, and all subsequent batches for that particular channel to that particular node are held and not synchronized until the error batch is resolved. SymmetricDS will retry the batch in error until the situation creating the error is resolved (or the data for the batch itself is changed).

Analyzing and resolving issues can take place on the outgoing or incoming side. The techniques for analysis are slightly different in the two cases, however, due to the fact that the node with outgoing batch data also has the data and data events associated with the batch in the database. On the incoming node, however, all that is available is the incoming batch header and data present in an incoming error table.

6.1.1. Analyzing the Issue - Outgoing Batches

The first step in analyzing the cause of a failed batch is to locate information about the data in the batch, starting with OUTGOING_BATCH To locate batches in error, use:

select * from sym_outgoing_batch where error_flag=1;

Several useful pieces of information are available from this query:

  • The batch number of the failed batch, available in column BATCH_ID.
  • The node to which the batch is being sent, available in column NODE_ID.
  • The channel to which the batch belongs, available in column CHANNEL_ID. All subsequent batches on this channel to this node will be held until the error condition is resolved.
  • The specific data id in the batch which is causing the failure, available in column FAILED_DATA_ID.
  • Any SQL message, SQL State, and SQL Codes being returned during the synchronization attempt, available in columns SQL_MESSAGE, SQL_STATE, and SQL_CODE, respectively.

Note

Using the error_flag on the batch table, as shown above, is more reliable than using the status column. The status column can change from 'ER' to a different status temporarily as the batch is retried.

Note

The query above will also show you any recent batches that were originally in error and were changed to be manually skipped. See the end of Section 6.1.3, “Resolving the Issue - Outgoing Batches” for more details.

To get a full picture of the batch, you can query for information representing the complete list of all data changes associated with the failed batch by joining DATA and DATA_EVENT, such as:

select * from sym_data where data_id in 
        (select data_id from sym_data_event where batch_id='XXXXXX');

where XXXXXX is the batch id of the failing batch.

This query returns a wealth of information about each data change in a batch, including:

  • The table involved in each data change, available in column TABLE_NAME,
  • The event type (Update [U], Insert [I], or Delete [D]), available in column EVENT_TYPE,
  • A comma separated list of the new data and (optionally) the old data, available in columns ROW_DATA and OLD_DATA, respectively.
  • The primary key data, available in column PK_DATA
  • The channel id, trigger history information, transaction id if available, and other information.

More importantly, if you narrow your query to just the failed data id you can determine the exact data change that is causing the failure:

select * from sym_data where data_id in 
        (select failed_data_id from sym_outgoing_batch where batch_id='XXXXX'
        and node_id='YYYYY');

where XXXXXX is the batch id and YYYYY is the node id of the batch that is failing.

The queries above usually yield enough information to be able to determine why a particular batch is failing. Common reasons a batch might be failing include:

  • The schema at the destination has a column that is not nullable yet the source has the column defined as nullable and a data change was sent with the column as null.
  • A foreign key constraint at the destination is preventing an insertion or update, which could be caused from data being deleted at the destination or the foreign key constraint is not in place at the source.
  • The data size of a column on the destination is smaller than the data size in the source, and data that is too large for the destination has been synced.

6.1.2. Analyzing the Issue - Incoming Batches

Analysis using an incoming batch is different than that of outgoing batches. For incoming batches, you will rely on two tables, INCOMING_BATCH and INCOMING_ERROR. The first step in analyzing the cause of an incoming failed batch is to locate information about the batch, starting with INCOMING_BATCH To locate batches in error, use:

select * from sym_incoming_batch where error_flag=1;

Several useful pieces of information are available from this query:

  • The batch number of the failed batch, available in column BATCH_ID. Note that this is the batch number of the outgoing batch on the outgoing node.
  • The node the batch is being sent from, available in column NODE_ID.
  • The channel to which the batch belongs, available in column CHANNEL_ID. All subsequent batches on this channel from this node will be held until the error condition is resolved.
  • The data_id that was being processed when the batch failed, available in column FAILED_DATA_ID.
  • Any SQL message, SQL State, and SQL Codes being returned during the synchronization attempt, available in columns SQL_MESSAGE, SQL_STATE, and SQL_CODE, respectively.

For incoming batches, we do not have data and data event entries in the database we can query. We do, however, have a table, INCOMING_ERROR, which provides some information about the batch.

select * from sym_incoming_error 
            where batch_id='XXXXXX' and node_id='YYYYY';

where XXXXXX is the batch id and YYYYY is the node id of the failing batch.

This query returns a wealth of information about each data change in a batch, including:

  • The table involved in each data change, available in column TARGET_TABLE_NAME,
  • The event type (Update [U], Insert [I], or Delete [D]), available in column EVENT_TYPE,
  • A comma separated list of the new data and (optionally) the old data, available in columns ROW_DATA and OLD_DATA, respectively,
  • The column names of the table, available in column COLUMN_NAMES,
  • The primary key column names of the table, available in column PK_COLUMN_NAMES,

6.1.3. Resolving the Issue - Outgoing Batches

Once you have decided upon the cause of the issue, you'll have to decide the best course of action to fix the issue. If, for example, the problem is due to a database schema mismatch, one possible solution would be to alter the destination database in such a way that the SQL error no longer occurs. Whatever approach you take to remedy the issue, once you have made the change, on the next push or pull SymmetricDS will retry the batch and the channel's data will start flowing again.

If you have instead decided that the batch itself is wrong, or does not need synchronized, or you wish to remove a particular data change from a batch, you do have the option of changing the data associated with the batch directly.

Warning

Be cautious when using the following two approaches to resolve synchronization issues. By far, the best approach to solving a synchronization error is to resolve what is truly causing the error at the destination database. Skipping a batch or removing a data id as discussed below should be your solution of last resort, since doing so results in differences between the source and destination databases.

Now that you've read the warning, if you still want to change the batch data itself, you do have several options, including:

  • Causing SymmetricDS to skip the batch completely. This is accomplished by setting the batch's status to 'OK', as in:
    update sym_outgoing_batch set status='OK' where batch_id='XXXXXX'
    where XXXXXX is the failing batch. On the next pull or push, SymmetricDS will skip this batch since it now thinks the batch has already been synchronized. Note that you can still distinguish between successful batches and ones that you've artificially marked as 'OK', since the error_flag column on the failed batch will still be set to '1' (in error).
  • Removing the failing data id from the batch by deleting the corresponding row in DATA_EVENT. Eliminating the data id from the list of data ids in the batch will cause future synchronization attempts of the batch to no longer include that particular data change as part of the batch. For example:
    delete from sym_data_event where batch_id='XXXXXX' and data_id='YYYYYY'
    where XXXXXX is the failing batch and YYYYYY is the data id to longer be included in the batch.

6.1.4. Resolving the Issue - Incoming Batches

For batches in error, from the incoming side you'll also have to decide the best course of action to fix the issue. Incoming batch errors that are in conflict can by fixed by taking advantage of two columns in INCOMING_ERROR which are examined each time batches are processed. The first column, resolve_data if filled in will be used in place of row_data. The second column, resolve_ignore if set will cause this particular data item to be ignored and batch processing to continue. This is the same two columns used when a manual conflict resolution strategy is chosen, as discussed in Section 4.10, “Conflict Detection and Resolution”.

6.2. Changing Triggers

A trigger row may be updated using SQL to change a synchronization definition. SymmetricDS will look for changes each night or whenever the Sync Triggers Job is run (see below). For example, a change to place the table price_changes into the price channel would be accomplished with the following statement:

update SYM_TRIGGER
set channel_id = 'price',
    last_update_by = 'jsmith',
    last_update_time = current_timestamp
where source_table_name = 'price_changes';

All configuration should be managed centrally at the registration node. If enabled, configuration changes will be synchronized out to client nodes. When trigger changes reach the client nodes the Sync Triggers Job will run automatically.

Centrally, the trigger changes will not take effect until the Sync Triggers Job runs. Instead of waiting for the Sync Triggers Job to run overnight after making a Trigger change, you can invoke the syncTriggers() method over JMX or simply restart the SymmetricDS server. A complete record of trigger changes is kept in the table TRIGGER_HIST, which was discussed in Section 5.2.3, “Sync Triggers Job”.

6.3. Maintaining multiple sychronization configurations through Grouplets

As you probably know by now, SymmetricDS stores its single configuration centrally and distributes it to all nodes. By default, a trigger-router is in effect for all nodes in the source node group or target node group. Triggers will be established on each node that is a member of the source node, and changes will be routed to all relevant nodes that are members of the target node group. If, for example, the router routes to "all" nodes, "all" means every node that is in the target node group. This is the default behavior of SymmetricDS.

Once in production, however, you will likely find you need or want to make configuration changes to triggers and routers as new features are rolled out to your network of SymmetricDS nodes. You may, for example, wish to "pilot" a new configuration, containing new synchronizations, only on specific nodes initially, and then increase the size of the pilot over time. SymmetricDS' does provide the ability to specify that only particular trigger-router combinations are applicableto particular nodes for this purpose. It does this by allowing you to define an arbitray collection of nodes, called a "grouplet", and then choosing which trigger-routers apply to the normal set of nodes (the default behavior) and which apply just to nodes in one or more "grouplets". This allows you, essentially, to filter the list of nodes that would otherwise be included as source nodes and/or target nodes. Through the use of grouplets, you can, for example, specify a subset of nodes on which a given trigger would be created. It also allows you to specify a subset of the normal set of nodes a change would be routed to. This behaviour is in addition to, and occurs before, any subsetting or filtering the router might otherwise do.

In its simplest form, a grouplet is just an arbitrary collection of nodes. To define a grouplet, you start by creating a grouplet with a unique id, a description, and a link policy, as defined in GROUPLET. To defined which nodes are members of (or are not members of) a grouplet, you provide a list of external ids of the nodes in GROUPLET_LINK. How those external ids are used varies based on the grouplet link policy. The grouplet_link_policy can be either I or E, representing an "inclusive" list of nodes or an "exclusive" list of nodes, respectively. In the case of "inclusive", you'll be listing each external id to be included in the grouplet. In the case of exclusive, all nodes will be included in the grouplet except ones which have an external id in the list of external ids.

Once you have defined your grouplet and which nodes are members of a grouplet, you can tie a grouplet to a given trigger-router through the use of TRIGGER_ROUTER_GROUPLET. If a particular trigger-router does not appear in this table, SymmetricDS behaves as normal. If, however, an entry for a particular trigger-router appears in this table, the default behavior is overridden based on the grouplet_id and applies_when settings. The grouplet id provides the node list, and the applies_when indicates whether the grouplet nodes are to be used to filter the source node list, the target node list, or both (settings are "S", "T", and "B", respectively). Nodes that survive the filtering process on as a source will have a trigger defined, and nodes that survive the filtering process as a target are eligible nodes that can be routed to.

6.3.1. Grouplet Example

At this point, an example would probably be useful. Picture the case where you have 100 retail stores (each containing one database, and each a member of the "store" node group) and a central office database (external id of corp, and a member of the "corp" node group ). You wish to pilot two new trigger and routers for a new feature on your point-of-sale software (one which moves data from corp to store, and one which moves data from store to corp), but you only want the triggers to be installed on 10 specific stores that represent your "pilot" stores. In this case, the simplest approach would be to define a grouplet with, say, a grouplet id of "pilot". We'd use a grouplet link policy of "inclusive", and list each of the 10 external ids in the GROUPLET_LINK table.

For the trigger-router meant to send data from corp to store, we'd create an entry in TRIGGER_ROUTER_GROUPLET for our grouplet id of "pilot", and we'd specify "T" (target) as the applies-when setting. In this way, the source node list is not filtered, but the target node list used during routing will filter the potential target nodes to just our pilot stores. For the trigger-router meant to send data from a pilot store back to corp, we would have the grouplet apply when the node is in the source node list (i.e., applies_when will be "S"). This will cause the trigger to only be created for stores in the pilot list and not other stores.

An important thing to mention in this example: Since your grouplet only included the store nodes, you can't simply specify "both" for the applies when setting. For the corp-to-store trigger, for example, if you had said "both", no trigger would have been installed in corp since the grouplet nodes represent all possible source nodes as well as target nodes, and "corp" is not in the list! The same is true for the store to corp trigger-router as well. You could, however, use "both" as the applies when if you had included the "corp" external id in with the list of the 10 pilot store external ids.

6.4. Re-synchronizing Data

There may be times where you find you need to re-send or re-synchronize data when the change itself was not captured. This could be needed, for example, if the data changes occurred prior to SymmetricDS placing triggers on the data tables themselves, or if the data at the destination was accidentally deleted, or for some other reason. Two approaches are commonly taken to re-send the data, both of which are discussed below.

Important

Be careful when re-sending data using either of these two techniques. Be sure you are only sending the rows you intend to send and, more importantly, be sure to re-send the data in a way that won't cause foreign key constraint issues at the destination. In other words, if more than one table is involved, be sure to send any tables which are referred to by other tables by foreign keys first. Otherwise, the channel's synchronization will block because SymmetricDS is unable to insert or update the row because the foreign key relationship refers to a non-existent row in the destination!

One possible approach would be to "touch" the rows in individual tables that need re-sent. By "touch", we mean to alter the row data in such a way that SymmetricDS detects a data change and therefore includes the data change in the batching and synchronizing steps. Note that you have to change the data in some meaningful way (e.g., update a time stamp); setting a column to its current value is not sufficient (by default, if there's not an actual data value change SymmetricDS won't treat the change as something which needs synched.

A second approach would be to take advantage of SymmetricDS built-in functionality by simulating a partial "initial load" of the data. The approach is to manually create "reload" events in DATA for the necessary tables, thereby resending the desired rows for the given tables. Again, foreign key constraints must be kept in mind when creating these reload events. These reload events are created in the source database itself, and the necessary table, trigger-router combination, and channel are included to indicate the direction of synchronization.

To create a reload event, you create a DATA row, using:

  • data_id: null
  • table_name: name of table to be sent
  • event_type: 'R', for reload
  • row_data: a "where" clause (minus the word 'where') which defines the subset of rows from the table to be sent. To send all rows, one can use 1=1 for this value.
  • pk_data: null
  • old_data: null
  • trigger_hist_id: use the id of the most recent entry (i.e., max(trigger_hist_id) ) in TRIGGER_HIST for the trigger-router combination for your table and router.
  • channel_id: the channel in which the table is routed
  • transaction_id: pick a value, for example '1'
  • source_node_id: null
  • external_data: null
  • create_time: current_timestamp

By way of example, take our retail hands-on tutorial covered in Chapter 2, Quick Start Tutorial. Let's say we need to re-send a particular sales transaction from the store to corp over again because we lost the data in corp due to an overzealous delete. For the tutorial, all transaction-related tables start with sale_, use the sale_transaction channel, and are routed using the store_corp_identity router. In addition, the trigger-routers have been set up with an initial load order based on the necessary foreign key relationships (i.e., transaction tables which are "parents" have a lower initial load order than those of their "children"). An insert statement that would create the necessary "reload" events (three in this case, one for each table) would be as follows (where MISSING_ID is changed to the needed transaciton id):

       
insert into sym_data (
    select null, t.source_table_name, 'R', 'tran_id=''MISSING-ID''', null, null,
            h.trigger_hist_id, t.channel_id, '1', null, null, current_timestamp
        from sym_trigger t inner join sym_trigger_router tr on
            t.trigger_id=tr.trigger_id inner join sym_trigger_hist h on
            h.trigger_hist_id=(select max(trigger_hist_id) from sym_trigger_hist
                where trigger_id=t.trigger_id) 
    where channel_id='sale_transaction' and
        tr.router_id like 'store_corp_identity' and 
        (t.source_table_name like 'sale_%') 
    order by tr.initial_load_order asc);
    

This insert statement generates three rows, one for each configured sale table. It uses the most recent trigger history id for the corresponding table. Finally, it takes advantage of the initial load order for each trigger-router to create the three rows in the correct order (the order corresponding to the order in which the tables would have been initial loaded).

6.5. Changing Configuration

The configuration of your system as defined in the sym_* tables may be modified at runtime. By default, any changes made to the sym_* tables (with the exception of sym_node) should be made at the registration server. The changes will be synchronized out to the leaf nodes by SymmetricDS triggers that are automatically created on the tables.

If this behavior is not desired, the feature can be turned off using a parameter. Custom triggers may be added to the sym_* tables when the auto syncing feature is disabled.

6.6. Logging Configuration

The standalone SymmetricDS installation uses Log4J for logging. The configuration file is conf/log4j.xml. The log4j.xml file has hints as to what logging can be enabled for useful, finer-grained logging.

There is a command line option to turn on preconfigured debugging levels. When the --debug option is used the conf/debug-log4j.xml is used instead of log4j.xml.

SymmetricDS proxies all of its logging through SLF4J. When deploying to an application server or if Log4J is not being leveraged, then the general rules for for SLF4J logging apply.

6.7. Java Management Extensions

Monitoring and administrative operations can be performed using Java Management Extensions (JMX). SymmetricDS uses MX4J to expose JMX attributes and operations that can be accessed from the built-in web console, Java's jconsole, or an application server. By default, the web management console can be opened from the following address:

http://localhost:31416/

In order to use JConsole, you must enable the JVM. You can edit the startup scripts to set the following system parameters.

          -Dcom.sun.management.jmxremote.port=31417 
          -Dcom.sun.management.jmxremote.authenticate=false 
          -Dcom.sun.management.jmxremote.ssl=false
          

More details about enabling JMX for JConsole can be found here.

Using the Java jconsole command, SymmetricDS is listed as a local process named SymmetricLauncher. In jconsole, SymmetricDS appears under the MBeans tab under then name defined by the engine.name property. The default value is SymmetricDS.

The management interfaces under SymmetricDS are organized as follows:

  • Node - administrative operations

  • Parameters - access to properties set through the parameter service

6.8. Temporary Files

SymmetricDS creates temporary extraction and data load files with the CSV payload of a synchronization when the value of the stream.to.file.threshold.bytes SymmetricDS property has been reached. Before reaching the threshold, files are streamed to/from memory. The default threshold value is 32,767 bytes. This feature may be turned off by setting the stream.to.file.enabled property to false.

SymmetricDS creates these temporary files in the directory specified by the java.io.tmpdir Java System property.

The location of the temporary directory may be changed by setting the Java System property passed into the Java program at startup. For example,

  -Djava.io.tmpdir=/home/.symmetricds/tmp
        

6.9. Database Purging

Purging is the act of cleaning up captured data that is no longer needed in SymmetricDS's runtime tables. Data is purged through delete statements by the Purge Job. Only data that has been successfully synchronized will be purged. Purged tables include:

The purge job is enabled by the start.purge.job SymmetricDS property. The timing of the three purge jobs (incoming, outgoing, and data gaps) is controlled by a cron expression as specified by the following properties: job.purge.outgoing.cron, job.purge.incoming.cron, and job.purge.datagaps.cron. The default is 0 0 0 * * *, or once per day at midnight.

Important

SymmetricDS utilizes Spring's CRON support, which includes seconds as the first parameter. This differs from the typical Unix-based implementation, where the first parameter is usually minutes. For example, */15 * * * * * means every 15 seconds, not every 15 minutes. See Spring's documentation for more details.

Two retention period properties indicate how much history SymmetricDS will retain before purging. The purge.retention.minutes property indicates the period of history to keep for synchronization tables. The default value is 5 days. The statistic.retention.minutes property indicates the period of history to keep for statistics. The default value is also 5 days.

The purge properties should be adjusted according to how much data is flowing through the system and the amount of storage space the database has. For an initial deployment it is recommended that the purge properties be kept at the defaults, since it is often helpful to be able to look at the captured data in order to triage problems and profile the synchronization patterns. When scaling up to more nodes, it is recomended that the purge parameters be scaled back to 24 hours or less.