Building an app with change data capture tool

Student: Jorge Enrique Castañeda Centurion

Course: Advancing Database Topics I

Teacher: CUADROS QUIROGA, PATRICK JOSE

Introduction

Change Data Capture (CDC) has become a key technology for companies looking to manage data more efficiently and in real time. This methodology allows you to track and capture specific changes, such as insertions, updates, and deletions, as soon as they occur on a computer system. This facilitates immediate synchronization of data, ensuring that databases are always up-to-date and available without the need for costly and time-consuming full scans. The benefits of CDC not only include improved data reliability and consistency, but also a positive impact on customer experience as businesses can respond more quickly to changing market needs and maintain smooth operations.

In 2024, CDC tools offer a wide variety of options, each designed to address different usage scenarios, from data integration between microservices to synchronization with large-scale data warehouses. These tools are essential in environments where a constant flow of up-to-date data is required, allowing companies to avoid performance bottlenecks and optimize their replication processes. By focusing solely on changes, CDC tools minimize the need for full database replications, significantly reducing computational costs and improving operational efficiency. This makes them an essential solution for organizations that depend on fast, accurate and accessible data to make strategic decisions in real time.

Example

First, we are going to need Docker and Java

We must create a docker-compose.yml file to configure the necessary services.

version: '3.8'
services:
  zookeeper:
    image: confluentinc/cp-zookeeper:latest
    environment:
      ZOOKEEPER_CLIENT_PORT: 2181

  kafka:
    image: confluentinc/cp-kafka:latest
    depends_on:
      - zookeeper
    environment:
      KAFKA_BROKER_ID: 1
      KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
      KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://kafka:9092
      KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1

  mysql:
    image: mysql:8.0
    environment:
      MYSQL_ROOT_PASSWORD: root
      MYSQL_USER: debezium
      MYSQL_PASSWORD: dbz
      MYSQL_DATABASE: inventory
    ports:
      - "3306:3306"

  debezium:
    image: debezium/connect:latest
    ports:
      - "8083:8083"
    environment:
      BOOTSTRAP_SERVERS: kafka:9092
      GROUP_ID: 1
      CONFIG_STORAGE_TOPIC: debezium_config
      OFFSET_STORAGE_TOPIC: debezium_offset
      STATUS_STORAGE_TOPIC: debezium_status
    depends_on:
      - kafka
      - mysql

  nifi:
    image: apache/nifi:latest
    ports:
      - "8080:8080"

Once the MySQL container is running, create a table to capture the changes.

CREATE TABLE products (
  id INT PRIMARY KEY AUTO_INCREMENT,
  name VARCHAR(255) NOT NULL,
  description VARCHAR(255),
  price DECIMAL(10,2),
  updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP
);

Insert some initial data.

INSERT INTO products (name, description, price) VALUES 
('Product1', 'Description1', 10.00),
('Product2', 'Description2', 20.00);

Debezium uses connectors to monitor databases. Use its REST API to register the MySQL connector.

curl -X POST -H "Content-Type: application/json" --data '{
  "name": "mysql-connector",
  "config": {
    "connector.class": "io.debezium.connector.mysql.MySqlConnector",
    "tasks.max": "1",
    "database.hostname": "mysql",
    "database.port": "3306",
    "database.user": "debezium",
    "database.password": "dbz",
    "database.server.id": "184054",
    "database.server.name": "dbserver1",
    "database.include.list": "inventory",
    "table.include.list": "inventory.products",
    "database.history.kafka.bootstrap.servers": "kafka:9092",
    "database.history.kafka.topic": "schema-changes.inventory"
  }
}' http://localhost:8083/connectors

For the flow test Insert or update records in the MySQL products table

INSERT INTO products (name, description, price) VALUES ('Product3', 'Description3', 30.00);
UPDATE products SET price = 25.00 WHERE id = 1;

CONCLUSION

Implementing a Change Data Capture (CDC) solution is an effective strategy for managing and synchronizing data in real time, allowing companies to stay agile and competitive in a data-driven environment. Setting up tools like MySQL, Kafka, Debezium, and NiFi offers a robust infrastructure that detects data changes, processes them efficiently, and synchronizes them with target systems without the need for full replications, saving time and computational resources.

The success of this implementation is measured by verifying each component: from capturing changes in the source database to streaming events through Kafka and integrating with target systems such as data warehouses or custom flows in NiFi. Once change events (inserts, updates, deletions) are reflected in real time on the target systems, you can ensure that the solution is working correctly. This not only ensures data consistency but also improves the reliability of critical business processes.

Overall, CDC tools are indispensable for any organization that values ​​the accuracy and timeliness of its data. By allowing systems to operate without delays or bottlenecks, these technologies strengthen a company's ability to respond to changes in the environment and customer needs. With proper setup and regular testing, businesses can take full advantage of this methodology to optimize their data flows and make informed strategic decisions in real time.

BIBLIOGRAPHY

14 Best Change Data Capture (CDC) Tools in 2024. (s. f.). Matillion. https://www.matillion.com/learn/blog/change-data-capture-tools