Building an app with change data capture tool
Student: Jorge Enrique Castañeda Centurion
Course: Advancing Database Topics I
Teacher: CUADROS QUIROGA, PATRICK JOSE
Introduction
Change Data Capture (CDC) has become a key technology for companies looking to manage data more efficiently and in real time. This methodology allows you to track and capture specific changes, such as insertions, updates, and deletions, as soon as they occur on a computer system. This facilitates immediate synchronization of data, ensuring that databases are always up-to-date and available without the need for costly and time-consuming full scans. The benefits of CDC not only include improved data reliability and consistency, but also a positive impact on customer experience as businesses can respond more quickly to changing market needs and maintain smooth operations.
In 2024, CDC tools offer a wide variety of options, each designed to address different usage scenarios, from data integration between microservices to synchronization with large-scale data warehouses. These tools are essential in environments where a constant flow of up-to-date data is required, allowing companies to avoid performance bottlenecks and optimize their replication processes. By focusing solely on changes, CDC tools minimize the need for full database replications, significantly reducing computational costs and improving operational efficiency. This makes them an essential solution for organizations that depend on fast, accurate and accessible data to make strategic decisions in real time.
Example
First, we are going to need Docker and Java
We must create a docker-compose.yml file to configure the necessary services.
version: '3.8'
services:
zookeeper:
image: confluentinc/cp-zookeeper:latest
environment:
ZOOKEEPER_CLIENT_PORT: 2181
kafka:
image: confluentinc/cp-kafka:latest
depends_on:
- zookeeper
environment:
KAFKA_BROKER_ID: 1
KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://kafka:9092
KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
mysql:
image: mysql:8.0
environment:
MYSQL_ROOT_PASSWORD: root
MYSQL_USER: debezium
MYSQL_PASSWORD: dbz
MYSQL_DATABASE: inventory
ports:
- "3306:3306"
debezium:
image: debezium/connect:latest
ports:
- "8083:8083"
environment:
BOOTSTRAP_SERVERS: kafka:9092
GROUP_ID: 1
CONFIG_STORAGE_TOPIC: debezium_config
OFFSET_STORAGE_TOPIC: debezium_offset
STATUS_STORAGE_TOPIC: debezium_status
depends_on:
- kafka
- mysql
nifi:
image: apache/nifi:latest
ports:
- "8080:8080"
Once the MySQL container is running, create a table to capture the changes.
CREATE TABLE products (
id INT PRIMARY KEY AUTO_INCREMENT,
name VARCHAR(255) NOT NULL,
description VARCHAR(255),
price DECIMAL(10,2),
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP
);
Insert some initial data.
INSERT INTO products (name, description, price) VALUES
('Product1', 'Description1', 10.00),
('Product2', 'Description2', 20.00);
Debezium uses connectors to monitor databases. Use its REST API to register the MySQL connector.
curl -X POST -H "Content-Type: application/json" --data '{
"name": "mysql-connector",
"config": {
"connector.class": "io.debezium.connector.mysql.MySqlConnector",
"tasks.max": "1",
"database.hostname": "mysql",
"database.port": "3306",
"database.user": "debezium",
"database.password": "dbz",
"database.server.id": "184054",
"database.server.name": "dbserver1",
"database.include.list": "inventory",
"table.include.list": "inventory.products",
"database.history.kafka.bootstrap.servers": "kafka:9092",
"database.history.kafka.topic": "schema-changes.inventory"
}
}' http://localhost:8083/connectors
For the flow test Insert or update records in the MySQL products table
INSERT INTO products (name, description, price) VALUES ('Product3', 'Description3', 30.00);
UPDATE products SET price = 25.00 WHERE id = 1;
CONCLUSION
Implementing a Change Data Capture (CDC) solution is an effective strategy for managing and synchronizing data in real time, allowing companies to stay agile and competitive in a data-driven environment. Setting up tools like MySQL, Kafka, Debezium, and NiFi offers a robust infrastructure that detects data changes, processes them efficiently, and synchronizes them with target systems without the need for full replications, saving time and computational resources.
The success of this implementation is measured by verifying each component: from capturing changes in the source database to streaming events through Kafka and integrating with target systems such as data warehouses or custom flows in NiFi. Once change events (inserts, updates, deletions) are reflected in real time on the target systems, you can ensure that the solution is working correctly. This not only ensures data consistency but also improves the reliability of critical business processes.
Overall, CDC tools are indispensable for any organization that values the accuracy and timeliness of its data. By allowing systems to operate without delays or bottlenecks, these technologies strengthen a company's ability to respond to changes in the environment and customer needs. With proper setup and regular testing, businesses can take full advantage of this methodology to optimize their data flows and make informed strategic decisions in real time.
BIBLIOGRAPHY
14 Best Change Data Capture (CDC) Tools in 2024. (s. f.). Matillion. https://www.matillion.com/learn/blog/change-data-capture-tools