Table of Contents
In the previous Chapter we presented a high level introduction to some basic concepts in SymmetricDS, some of the high-level features, and a tutorial demonstrating a basic, working example of SymmetricDS in action. This chapter will focus on the key considerations and decisions one must make when planning a SymmetricDS implementation. As needed, basic concepts will be reviewed or introduced throughout this Chapter. By the end of the chapter you should be able to proceed forward and implement your planned design. This Chapter will intentionally avoid discussing the underlying database tables that capture the configuration resulting from your analysis and design process. Implementation of your design, along with discussion of the tables backing each concept, is covered in Chapter 4, Configuration .
When needed, we will rely on an example of a typical use of SymmetricDS in retail situations. This example retail deployment of SymmetricDS might include many point-of-sale workstations located at stores that may have intermittent network connection to a central location. These workstations might have point-sale-software that uses a local relational database. The database is populated with items, prices and tax information from a centralized database. The point-of-sale software looks up item information from the local database and also saves sale information to the same database. The persisted sales need to be propagated back to the centralized database.
A node is a single instance of SymmetricDS. It can be thought of as a proxy for a database which manages the synchronization of data to and/or from its database. For our example retail application, the following would be SymmetricDS nodes:
Each node of SymmetricDS can be either embedded in another application, run stand-alone, or even run in the background as a service. If desired, nodes can be clustered to help disperse load if they send and/or receive large volumes of data to or from a large number of nodes.
Individual nodes are easy to identify when planning your implementation. If a database exists in your domain that needs to send or receive data, there needs to be a corresponding SymmetricDS instance (a node) responsible for managing the synchronization for that database.
Nodes in SymmetricDS are organized into an overall node network, with connections based on what data needs to be synchronized where. The exact organization of your nodes will be very specific to your synchronization goals. As a starting point, lay out your nodes in diagram form and draw connections between nodes to represent cases in which data is to flow in some manner. Think in terms of what data is needed at which node, what data is in common to more than one node, etc. If it is helpful, you could also show data flow into and out of external systems. As you will discover later, SymmetricDS can publish data changes from a node as well using JMS.
Our retail example, as shown in Figure 3.1 , represents a tree hierarchy with a single central office node connected by lines to one or more children nodes (the POS workstations). Information flows from the central office node to an individual register and vice versa, but never flows between registers.
More complex organization can also be used. Consider, for example, if the same retail example is expanded to include store servers in each store to perform tasks such as opening the store for the day, reconciling registers, assigning employees, etc. One approach to this new configuration would be to create a three-tier hierarchy (see Figure 3.2 ). The highest tier, the centralized database, connects with each store server's database. The store servers, in turn, communicate with the individual point-of-sale workstations at the store. In this way data from each register could be accumulated at the store server, then sent on to the central office. Similarly, data from the central office can be staged in the store server and then sent on to each register, filtering the register's data based on which register it is.
One final example, show in Figure 3.3 , again extending our original two-tier retail use case, would be to organize stores by "region" in the world. This three tier architecture would introduce new regional servers (and corresponding regional databases) which would consolidate information specific to stores the regional server is responsible for. The tiers in this case are therefore the central office server, regional servers, and individual store registers.
These are just three common examples of how one might organize nodes in SymmetricDS. While the examples above were for the retail industry, the organization, they could apply to a variety of application domains.
Once the organization of your SymmetricDS nodes has been chosen, you will need to group your nodes based on which nodes share common functionality. This is accomplished in SymmetricDS through the concept of a Node Group . Frequently, an individual tier in your network will represent one Node Group. Much of SymmetricDS' functionality is specified by Node Group and not an individual node. For example, when it comes time to decide where to route data captured by SymmetricDS, the routing is configured by Node Group .
For the examples above, we might define Node Groups of:
Considerable thought should be given to how you define the Node Groups. Groups should be created for each set of nodes that synchronize common tables in a similar manner. Also, give your Node Groups meaningful names, as they will appear in many, many places in your implementation of SymmetricDS.
Note that there are other mechanisms in SymmetricDS to route to individual nodes or smaller subsets of nodes within a Node Group, so do not choose Node Groups based on needing only subsets of data at specific nodes. For example, although you could, you would not want to create a Node Group for each store even though different tax rates need to be routed to each store. Each store needs to synchronize the same tables to the same groups, so 'store' would be a good choice for a Node Group.
Now that Node Groups have been chosen, the next step in planning is to document the individual links between Node Groups. These Node Group Links establish a source Node Group, a target Node Group, and a data event action , namely whether the data changes are pushed or pulled . The push method causes the source Node Group to connect to the target, while a pull method causes it to wait for the target to connect to it.
For our retail store example, there are two Node Group Links defined. For the first link, the "store" Node Group pushes data to the "corp" central office Node Group. The second defines a "corp" to "store" link as a pull. Thus, the store nodes will periodically pull data from the central office, but when it comes time to send data to the central office a store node will do a push.
When SymmetricDS captures data changes in the database, the changes are captured in the order in which they occur. In addition, that order is preserved when synchronizing the data to other nodes. Frequently, however, you will have cases where you have different "types" of data with differing priorities. Some data might, for example, need priority for synchronization despite the normal order of events. For example, in a retail environment, users may be waiting for inventory documents to update while a promotional sale event updates a large number of items.
SymmetricDS supports this by allowing tables being synchronized to be grouped together into Channels of data. A number of controls to the synchronization behavior of SymmetricDS are controlled at the Channel level. For example, Channels provide a processing order when synchronizing, a limit on the amount of data that will be batched together, and isolation from errors in other channels. By categorizing data into channels and assigning them to TRIGGER s, the user gains more control and visibility into the flow of data. In addition, SymmetricDS allows for synchronization to be enabled, suspended, or scheduled by Channels as well. The frequency of synchronization can also be controlled at the channel level.
Choosing Channels is fairly straightforward and can be changed over time, if needed. Think about the differing "types" of data present in your application, the volume of data in the various types, etc. What data is considered must-have and can't be delayed due to a high volume load of another type of data? For example, you might place employee-related data, such as clocking in or out, on one channel, but sales transactions on another. We will define which tables belong to which channels in the next sections.
Be sure that, when defining Channels, all tables related by foreign keys are included in the same channel.
Avoid deadlocks! If client database transactions include tables that update common rows along with different rows, then concurrent synchronization can cause database deadlocks. You can avoid this by using channels to segregate those tables that cause the deadlocks.
At this point, you have designed the node-related aspects of your implementation, namely choosing nodes, grouping the nodes based on functionality, defining which node groups send and receive data to which others (and by what method). You have defined data Channels based on the types and priority of data being synchronized. The largest remaining task prior to starting your implementation is to define and document what data changes are to be captured (by defining SymmetricDS Triggers ), to decide to which node(s) the data changes are to be routed to, and to decide which trigger applies to which router and under what conditions. We will also, in this section, discuss the concept of an initial load of data into a SymmetricDS node.
SymmetricDS uses database triggers to capture and record changes to be synchronized to other nodes. Based on the configuration you provide, SymmetricDS creates the needed database triggers automatically for you. There is a great deal of flexibility in terms of defining the exact conditions under which a data change is captured. SymmetricDS triggers are defined in a table named TRIGGER . Each trigger you define is for a particular table associated. Each trigger can also specify:
As you define your triggers, consider which data changes are relevant to your application and which ones ar not. Consider under what special conditions you might want to route data, as well. For our retail example, we likely want to have triggers defined for updating, inserting, and deleting pricing information in the central office so that the data can be routed down to the stores. Similarly, we need triggers on sales transaction tables such that sales information can be sent back to the central office.
The triggers that have been defined in the previous section only define when data changes are to be captured for synchronization. They do not define where the data changes are to be sent to. Routers, plus a mapping between Triggers and Routers ( TRIGGER_ROUTER ), define the process for determining which nodes receive the data changes.
Before we discuss Routers and Trigger Routers, we should probably take a break and discuss the process SymmetricDS uses to keep track of the changes and routing. As we stated, SymmetricDS relies on auto-created database triggers to capture and record relevant data changes into a table, the DATA table. After the data is captured, a background process chooses the nodes that the data will be synchronized to. This is called routing and it is performed by the Routing Job. Note that the Routing Job does not actually send any data. It just organizes and records the decisions on where to send data in a "staging" table called DATA_EVENT and OUTGOING_BATCH .
Now we are ready to discuss Routers. The router itself is what defines the configuration of where to send a data change. Each Router you define can be associated with or assigned to any number of Triggers through a join table that defines the relationship. Routers are defined in the SymmetricDS table named ROUTER . For each router you define, you will need to specify:
For now, do not worry about the specific routing types. They will be covered later. For your design simply make notes of the information needed and decisions to determine the list of nodes to route to. You will find later that there is incredible flexibility and functionality available in routers. For example, you will find you can:
For each of your Triggers (which specify when a data change should be captured), you will need to decide which Router(s) to pair with the Trigger such that the change is routed to the desired target nodes. This needed mapping between Triggers and Routers, found in the table TRIGGER_ROUTER , defines configuration specific to a particular Trigger and Router combination that you need. In addition to defining which triggers map to which routers, the table also has several settings present to define various behaviors, including initial loads and ping back.
SymmetricDS provides the ability to "load" or "seed" a node's database with specific sets of data from its parent node. This concept is known as an Initial Load of data and is used to start off most synchronization scenarios. The Trigger Router mapping defines how initial loads can occur, so now is a good time to plan how your Initial Loads will work. Using our retail example, consider a new store being opened. Initially, you would like to pre-populate a store database with all the item, pricing, and tax data for that specific store. This is achieved through an initial load. As part of your planning, be sure to consider which tables, if any, will need to be loaded initially. SymmetricDS can also perform an initial load on a table with just a subset of data. Initial Loads are further discussed in Section 184.108.40.206, “Initial Loads”.
When routing data, SymmetricDS by default checks each data change and will not route a data change back to a node if it originated the change to begin with. This prevents the possibility of data changes resulting in an infinite loop of changes under certain circumstances. You may find that, for some reason, you need SymmetricDS to go ahead and send the data back to the originating node - a "ping back". As part of the planning process, consider whether you have a special case for needing ping back. Ping Back control is further discussed in Section 220.127.116.11, “Enabling "Ping Back"”.
Our final step in planning an implementation of SymmetricDS involves deciding how a new node is connected to, or registered with a parent node for the first time.
The following are some options on ways you might register nodes:
SymmetricDS also provides the ability to transform synchronized data instead of simply synchronizing it. Your application might, for example require a particular column in your source data to be mapped to two different target tables with possibly different column names. Or, you might need to "merge" one or more columns of data from two indepdentent tables into one table on the target. Or, you may want to set default column values on a target table based on a particular event on the source database. All of these operations, and many more, can be accomplished using SymmetricDS' transformation capabilities.
As you plan your SymmetricDS implementation, make notes of cases where a data transformation is needed. Include details such as when the transformation might occur (is it only on an insert, or a delete?), which tables or columns play a part, etc. Complete details of all the transformation features, including how to configure a transformation, are discussed in Section 4.8, “Transforming Data”.
As a final step to planning an implementation, consider for a moment cases in which the same data may be modified at nearly the same time at more than one node. For example, can data representing a customer be modified at both a central office and a store location? Conflict detection is the act of determining if an insert, update or delete is in "conflict" due to the target data row not being consistent with the data at the source prior to the insert/update/delete. Conflict resolution is the act of figuring out what to do when a conflict is detected. Both detection and resolution behaviour can be configured and customized in a number of ways. For example, a conflict can be "detected" based solely on a single column which has been modified to a different value, or a row can be considered in conflict if any data in the row has been changed from what was expected, even if the column that has been changed was still expected. There are also numerous ways to resolve the conflict, such as referencing a timestamp column and choosing whichever edit was "most recent" or perhaps causing the conflict to cause the channel to go into error until a manual resolution takes place. A set of conflict detection / resolution rules is configured for a given node group link, but you can set the rules to be for a given channel or for a given table in a channel.
For the purpose of planning your implementation, make a list of all tables that could have data being modified at more than one node at the same time. For each table, think through what should happen in each case if such an event occurs. If the tables on a given channel all have the same set of conflict resolution and detection rules, then you might be able to configure the rules for the channel instead of a series of table-level detections and resolutions. Complete details on how to configure conflict resolution and detection are discussed further in Section 4.10, “Conflict Detection and Resolution”.