What is Replicant?

jargonauts
Nov 29, 2017
4 min read

We provided a teaser of Replicant in our previous post. Let’s do a deep dive in this post.

The salient features of Replicant are as follows

Real-time Continuous replication - Replicant enables continuous replication in real time. Given a new record that is ingested in the source data management system, it is immediately extracted, transformed, and populates in the destination system immediately in real time.
Fully recoverable and highly consistent - The system provides extreme consistency while providing real-time replication. If both source and destination systems provide strong immediate consistency semantics, for example, if both source and destination systems are RDBMS systems, the system guarantees the same during data replication. The system is also designed to be fully recoverable, i.e., it neither causes data loss nor populates duplicate data at the destination systems in events of restarts after crash
Capable to handle relational, semi-structured, unstructured data - The system has been designed to support replication of purely relational, semi-structured, unstructured data of multiple formats.
Scalable to handle high data volumes under high velocity - The system has been built to handle concurrent high volume and high velocity data ingestion from multiple sources to multiple destinations
Designed to handle replication between homogenous as well as heterogenous data management systems without compromising on the characteristics mentioned above - The system can replicate from all permutations and combinations of source and destination data management systems, be it traditional RDBMS, NoSQL, NewSQL systems, distributed file systems, analytic platforms, be it deployed on premise or on public cloud
Last but not the least, it does all above with consumer-friendly manageability

The central component of Replicant is the Replicant Transporter component. Besides Transporter, replicant consists of an Extractor server and an Applier server. As the name suggests, the extractor server executes multiple instances of Replicant Extractor one for each source data management system. Similarly, the Replicant Applier server executes multiple instances of Replicant Applier, one for each destination data management system. The infrastructure of the Extractor and Applier is generic and extends easily to suit the needs of any data management system, be it traditional RDBMS, NoSQL DB, filesystem stores, etc. The details of the Extractor, Transporter and Applier are as follows

Replicant Extractor (RE) - The Extractor consists of a library of multiple change data capture libraries implemented for each source data management system supported by Replicant. Each replicant extractor instance executes an implementation of Change Data Capture library specific to the source data system. The extractor process maintains an in-memory marker (REM) that identifies the position in the source data management system change stream, upto which the stream is known to have been extracted and successfully submitted to the transporter component. For transactional source systems, the RE associates unique transaction id and within-transaction sequence id to a change event. The extractor occasionally persists this marker in an underlying persistent store. When RE instance crashes, the control brain detects the crash and restarts the instance. It reads from the persistent store the last stored REM information and uses it to start reading the change stream from the source database. This crash recovery behavior remains the same irrespective of the source data management system. For non-transactional filesystem stores, the Extractor polls frequently for newly ingested content and applies the same recovery mechanisms.

Replicant Transporter (RT) - The RT runs as a service with the Replicant system. It is designed as a set of replicated time-ordered persistent queue for change stream events dumped by RE. Change events from an RE instance are affined to specific RT queues with the invariant that events in a single RT queue are required to be applied in the same sequence as they have been generated. When an user starts Replicant instance for a given source data management system and a target data management system, a set of RT queues gets assigned to the RE and the applier instance. RT provides an application program interface enabling applier instances to poll for events from the assigned queues maintaining queue specific event order

Replicant Applier (RA) - The RA instance is responsible for polling unapplied changes received from the RT and applying them at the destination data management system. The RA instance keeps accumulating all changes within a transaction boundary (for non transactional sources, each individual change implies a transaction boundary), constructs application statements corresponding to the changes, identifies the end of the transaction context, adds a new recovery related statement at the end of each transaction before executing the commit statement for the transaction at the destination data management system. The new statement is an insert statement into a separate Replicant recovery table that holds the metadata information required by Replicant and acts as the marker up to which change events have been successfully applied to the destination data management system..

The RA process may receive already applied changes from RT. It uses the marker information to detect and ignore already applied change events. When RA process crashes and restarts, it needs to detect the current state of the destination db, i.e., the transaction upto which the changes have been applied. It queries the Replicant metadata table and obtains the marker. The marker allows for ‘exactly-once’ semantics, i.e., it guarantees that a change originating from the source is applied exactly once in the destination. In case of non-transactional destination systems, since there can be a lack of consistency between the application of the most recent change and the application of the marker, the recovery table also contains a checksum of the before image and afterimage of the record or file. For every single record change, the RA instance first updates the Replicant recovery and then applies the change to the destination db. When RA restarts, in addition to the recovery steps performed in the generic case, it also needs to identify if the last change has been applied to the destination table. It uses the row-qualifying column information from the metatable to fetch the row, computes the checksum and matches it with the value in the metatable.

REPLICANT

Is GDPR the new Y2K?

Enabling new class of data-intensive apps

Break Free of Legacy Databases

What is Replicant?

Comments