Chapter 1. Introduction

Table of Contents

1.1. What is SymmetricDS?
1.2. Background
1.3. SymmetricDS Features
1.3.1. Notification Schemes
1.3.2. Two-Way Table Synchronization
1.3.3. Data Channels
1.3.4. Transaction Awareness
1.3.5. Data Filtering and Rerouting
1.3.6. HTTP(S) Transport
1.3.7. Remote Management
1.4. System Requirements
1.5. What's new in SymmetricDS 2

This User Guide will introduce both basic and advanced concepts in the configuration of SymmetricDS. By the end of this chapter, you will have a better understanding of SymmetricDS' capabilities, and many of its basic concepts.

1.1. What is SymmetricDS?

SymmetricDS is an asynchronous data replication software package that supports multiple subscribers and bi-directional synchronization. It uses web and database technologies to replicate tables between relational databases, in near real time if desired. The software was designed to scale for a large number of databases, work across low-bandwidth connections, and withstand periods of network outage. The software can be installed as a standalone process, as a web application in a Java application server, or it can be embedded into another Java application.

A single installation of SymmetricDS attached to a target database is called a node. A node is initialized by a properties file and is configured by inserting configuration data into a series of database tables. It then creates database triggers on the application tables to be synchronized so that database events are captured for delivery to other SymmetricDS nodes.

In most databases, the transaction id is also captured by the database triggers so that the insert, update, and delete events can be replicated transactionally via the transport layer to other nodes. The transport layer is typically a CSV protocol over HTTP or HTTPS.

SymmetricDS supports synchronization across different database platforms through the concept of Database Dialects. A Database Dialect is an abstraction layer that SymmetricDS uses to insulate the main synchronization logic from database-specific implementation details.

SymmetricDS is extendable through extension points. Extension points are custom, reusable Java code that are configured via XML. Extension points hook into key points in the life-cycle of a synchronization to allow custom behavior to be injected. Extension points allow custom behavior such as: publishing data to other sources, transforming data, and taking different actions based on the content or status of a synchronization.

1.2. Background

The idea of SymmetricDS was born from a real-world need. Several of the original developers were, several years ago, implementing a commercial Point of Sale (POS) system for a large retailer. The development team came to the conclusion that the software available for trickling back transactions to corporate headquarters (frequently known as the 'central office' or 'general office') did not meet the project needs. The list of project requirements made finding the ideal solution difficult:

  • Sending and receiving data with up to 2000 stores during peak holiday loads.

  • Supporting one database platform at the store and a different one at the central office.

  • Synchronizing some data in one direction, and other data in both directions.

  • Filtering out sensitive data and re-routing it to a protected database.

  • Preparing the store database with an initial load of data from the central office.

The team ultimately created a custom solution that met the requirements and led to a successful project. From this work came the knowledge and experience that SymmetricDS benefits from today.

1.3. SymmetricDS Features

At a high level, SymmetricDS comes with a number of features that you are likely to need or want when doing data synchronization. A majority of these features were created as a direct result of real-world use of SymmetricDS in production settings.

1.3.1. Notification Schemes

After a change to the database is recorded, the SymmetricDS nodes interested in the change are notified. Change notification is configured to perform either a push (trickle-back) or a pull (trickle-poll) of data. When several nodes target their changes to a central node, it is efficient to push the changes instead of waiting for the central node to pull from each source node. If the network configuration protects a node with a firewall, a pull configuration could allow the node to receive data changes that might otherwise be blocked using push. The frequency of the change notification is configurable and defaults to once per minute.

1.3.2. Two-Way Table Synchronization

In practice, much of the data in a typical synchronization requires synchronization in just one direction. For example, a retail store sends its sales transactions to a central office, and the central office sends its stock items and pricing to the store. Other data may synchronize in both directions. For example, the retail store sends the central office an inventory document, and the central office updates the document status, which is then sent back to the store. SymmetricDS supports bi-directional or two-way table synchronization and avoids getting into update loops by only recording data changes outside of synchronization.

1.3.3. Data Channels

SymmetricDS supports the concept of channels of data. Data synchronization is defined at the table (or table subset) level, and each managed table can be assigned to a channel that helps control the flow of data. A channel is a category of data that can be enabled, prioritized and synchronized independently of other channels. For example, in a retail environment, users may be waiting for inventory documents to update while a promotional sale event updates a large number of items. If processed in order, the item updates would delay the inventory updates even though the data is unrelated. By assigning changes to the item tables to an item channel and inventory tables' changes to an inventory channel, the changes are processed independently so inventory can get through despite the large amount of item data.

Channels are discussed in more detail in Section 3.5, “Choosing Data Channels”.

1.3.4. Transaction Awareness

Many databases provide a unique transaction identifier associated with the rows that are committed together as a transaction. SymmetricDS stores the transaction identifier, along with the data that changed, so it can play back the transaction exactly as it occurred originally. This means the target database maintains the same transactional integrity as its source. Support for transaction identification for supported databases is documented in the appendix of this guide.

1.3.5. Data Filtering and Rerouting

Using SymmetricDS, data can be filtered as it is recorded, extracted, and loaded.

  • Data routing is accomplished by assigning a router type to a ROUTER configuration. Routers are responsible for identifying what target nodes captured changes should be delivered to. Custom routers are possible by providing a class implementing IDataRouter.

  • As data changes are loaded in the target database, a class implementing IDataLoaderFilter can change the data in a column or route it somewhere else. One possible use might be to route credit card data to a secure database and blank it out as it loads into a centralized sales database. The filter can also prevent data from reaching the database altogether, effectively replacing the default data loading process.

  • Columns can be excluded from synchronization so they are never recorded when the table is changed. As data changes are loaded into the target database, a class implementing IColumnFilter can remove a column altogether from the synchronization. For example, an employee table may be synchronized to a retail store database, but the employee's password is only synchronized on the initial insert.

  • As data changes are extracted from the source database, a class implementing the IExtractorListener interface is called to filter data or route it somewhere else. By default, SymmetricDS provides a handler that transforms and streams data as CSV. Optionally, an alternate implementation may be provided to take some other action on the extracted data.

1.3.6. HTTP(S) Transport

By default, SymmetricDS uses web-based HTTP or HTTPS in a style called Representation State Transfer (REST). It is lightweight and easy to manage. A series of filters are also provided to enforce authentication and to restrict the number of simultaneous synchronization streams. The ITransportManager interface allows other transports to be implemented.

1.3.7. Remote Management

Administration functions are exposed through Java Management Extensions (JMX) and can be accessed from the Java JConsole or through an application server. Functions include opening registration, reloading data, purging old data, and viewing batches. A number of configuration and runtime properties are available to be viewed as well.

SymmetricDS also provides functionality to send SQL events through the same synchronization mechanism that is used to send data. The data payload can be any SQL statement. The event is processed and acknowledged just like any other event type.

1.4. System Requirements

SymmetricDS is written in Java 5 and requires a Java SE Runtime Environment (JRE) or Java SE Development Kit (JDK) version 5.0 or above.

Any database with trigger technology and a JDBC driver has the potential to run SymmetricDS. The database is abstracted through a Database Dialect in order to support specific features of each database. The following Database Dialects have been included with this release:

  • MySQL version 5.0.2 and above

  • Oracle version 8.1.7 and above

  • PostgreSQL version 8.2.5 and above

  • Sql Server 2005

  • HSQLDB 1.8

  • H2 1.x

  • Apache Derby and above

  • IBM DB2 9.5

  • Firebird 2.0 and above

See Appendix C, Database Notes, for compatibility notes and other details for your specific database.

1.5. What's new in SymmetricDS 2

SymmetricDS 2 builds upon the existing SymmetricDS 1.x software base and incorporates a number of architectural changes and performance improvements. If you are brand new to SymmetricDS, you can safely skip this section. If you have used SymmetricDS 1.x in the past, this section summarizes the key differences you will encounter when moving to SymmetricDS 2.

The first significant architectural change involves SymmetricDS's use of triggers. In 1.x, triggers capture and record data changes as well as the nodes to which the changes must be applied as row inserts into the DATA_EVENT table. Thus, the number of row-inserts grows linearly with the number of client nodes. This can lead to an obvious performance issue as the number of nodes increases. In addition, the problem is made worse at times due to synchronizing nodes updating the same DATA_EVENT table as part of the batching process while the row-inserts are being created.

In SymmetricDS 2, triggers capture only data changes, not the node-specific details. The node-specific row-inserts are replaced with a new routing mechanism that does both the routing and the batching of data on one thread. Thus, the real-time inserts into DATA_EVENT by applications using synchronized tables have been eliminated, and database performance is therefore improved. The database contention on DATA_EVENT has also been eliminated, since the router job is the only thread inserting data into that table. The only other access to the DATA_EVENT table is from selects by synchronizing nodes.

As a result of these changes, we gain the following benefits:

  • Synchronizing client nodes will spend less time connected to a server node,
  • Applications updating database tables that are being synchronized to a large number of nodes will not degrade in performance as more nodes are added, and
  • There should be almost no database contention on the data_event table, unlike the possible contention in 1.X.

Because routing no longer takes place in the SymmetricDS database triggers, a new mechanism for routing was needed. In SymmetricDS 1.x, the node_select expression was used for specifying the desired data routing. It was a SQL expression that qualified the insert into DATA_EVENT from the SymmetricDS triggers. In SymmetricDS 2 there is a new extension point called the data router. Data routers are configured in the router table with a router_type and a router_expression. Several different routers have been provided to serve the majority of users' routing needs, but the framework is in place for a SymmetricDS programmer to develop domain- or application-specific routers. See Section 4.6.2, “Router” for a complete list of provided routers.

Since the routing and capturing of data are now performed with two separate mechanisms, the two concepts have been separated into separate configuration tables in the database, with a join table (TRIGGER_ROUTER) specifying the relationships between routing (ROUTER) and capturing of data (TRIGGER). This solves a long standing issue with some databases which only allow one trigger per table. On those database platforms, we can now route data in multiple directions since we only require one SymmetricDS trigger to capture data. This also helps performance in those scenarios, since we only capture the data once instead of once per routing instance.

As part of the new routing job, we have introduced another new extension point to allow more flexibility in the way data events get batched. A batch is the unit by with captured data is sent and committed on target nodes. In SymmetricDS 2, batching is now configured on the channel configuration table. This provides additional flexibility for batching:

  • Batching can have the traditional SymmetricDS 1.x behavior of batching up to a max batch size, but never breaking on a database transaction boundary.
  • Batching can be completely tied to a database transaction. One batch per database transaction.
  • Batching can ignore database transactions altogether and always batch based on a max batch size.

Another significant change to note in SymmetricDS 2 is the removal of the incoming and outgoing batch history tables. This change was made because it was found that over 95% of the time the statistics the end user truly wanted to see were those for the most recent synchronization attempt, not to mention that the outgoing batch history table was difficult to query. The most valuable information in the batch history tables, the batch statistics, have been moved over to the batch tables themselves. The statistics in the batch tables now always represent the latest synchronization attempt.