Author: Harish Vadaparty
Staying abreast of changes within databases is often essential, even non-negotiable, for maintaining data integrity and optimizing business processes. In this post, Hexon Global full-stack engineer Harish Vadaparty explains how and when to implement Change Data Capture (CDC) to keep your data sources in sync.

What Is Change Data Capture?

Change Data Capture tracks changes in a database (inserts, updates, deletes) in real time. It’s essential for systems that rely on up-to-the-minute data or need to replicate data across multiple systems – or even if you simply want to keep things in sync.

Methods To Implement CDC

  • Database Triggers: Custom triggers in database log changes to a separate table.
  • Log-Based CDC: Captures changes from the database’s transaction log.
  • Polling-Based CDC: Regularly checks the database for changes. Not the most efficient but straightforward.
  • ELT/ETL Tools: Some tools offer CDC as part of their data integration features.
  • Database Replication: A form of CDC where data is copied to another database.

When Should One Use CDC?

Change Data Capture is most useful for the following cases:

  • Real-Time Analytics: Keeping data analysis prompt and accurate.
  • Event based systems: Keeping track of events in real time – for example, an action like a user changing their name triggers an action.
  • Data Replication: Syncing data across systems to ensure consistency.
  • Microservices Architectures: Maintaining data integrity across services.

Where Hexon Global Uses CDC most

On a regular basis, Hexon Global’s highest usage of CDC happens in these use cases:

  • Fraud Detection Financial transactions are monitored instantly for anomalies, as we have an app wallet that takes care of all the transactions along with a third-party payment gateway.
  • Inventory Management:Real-time syncing between online and physical store inventories. The majority of sales for one of our clients, Mastbazaar, still happens in stores. In such a case, we use CDC to keep a regular track of inventory.

The Different Ways To Implement CDC

Database Triggers Example

This is a way we implemented the CDC mechanism for tracking changes in stock and price info in our PostgreSQL database

Step 1: Create the Audit Table

First, let’s create an audit table that will store changes made to the products table. This includes stock and price changes among other potential modifications.

Step 2: Create the Trigger Function

The trigger function, audit_products, captures the before and after states of a row for insert, update, and delete operations, and logs this information to the product_changes table.

Step 3: Create Triggers for Each Operation

Finally, set up triggers on the products table to automatically call audit_products upon any insert, update, or delete operation. This ensures that any change to a product’s stock or price is captured.

In this type of setup, every insert, update, or delete operation on the products table will trigger the audit_products function, capturing the details of the operation in the product_changes audit table. This includes all fields of the products table, allowing us to track how stock and price change over time along with any other modifications to product data.

Polling-Based CDC: A Simple Approach

Polling involves scripting a regular check for new or changed data in the database. It’s easy but can be resource-heavy, if not implemented optimally.
Here is a simple approach we used to sync sales orders from one db to another for analytics purposes. This can work both cross-db and cross-server. For instance, you can use it to sync between Oracle and Postgres. This script needs to be run as a cron job or a scheduled task.

Step 1: Database Setup

First, create or modify a table to store the last sync timestamp. This table will hold a single record indicating the last time the sync was successfully completed.

Step 2: Python Script for Syncing Sales Data

This script fetches the last sync timestamp from the sync_log table, uses it to fetch new or updated sales records, processes them, and then updates the sync_log with the current timestamp.

We can ensure minimal data transfer and also do the needed transformations before syncing the data to the other db.

Log-Based CDC With Kafka

Using Kafka Connect for log-based CDC involves less intrusion, directly streaming database changes to Kafka.

Using solutions like Airbyte and PeerDB

If you are doing this for a large amount of data and a lot of tables, you might want to use a solution like Airbyte or PeerDB. These tools are built to handle the complexities of CDC and data replication at scale, so you don’t have to reinvent the wheel.

Do you need a second look at your databases?

Hexon Global has a decade-long track record of helping companies scale software architecture and cloud challenges. If you would like us to identify solutions for your business, reach out to us over here.

Contact Us

Get in touch

Understand how we help our clients in various situations and how we can be of service to you!