Introduction to Data Engineering with Grafana and Docker
José Rafael Gutierrez
3 months ago
Introduction
In e-commerce environments where thousands of transactions are processed per second, ensuring system performance is crucial. A data pipeline processes these transactions and ensures that information flows smoothly from databases to applications.
To ensure that systems are operating efficiently, it is essential to monitor key performance metrics, such as the Transactions Per Second (TPS), CPU usage, and memory usage. In this article, we will see how to integrate Python, SQL, Prometheus PushGateway, and Grafana using Docker to collect, process, and visualize metrics in real time.
Step 1: Setting Up the MySQL Database with Docker
1.1 Command to Run MySQL in Docker
We start by running a MySQL container with Docker to store transactions:
docker run --name mysql-db -e MYSQL_ROOT_PASSWORD=secret -e MYSQL_DATABASE=transactions -p 3306:3306 -d mysql:latest
Security: In a production environment, it is recommended to use environment variables or a password manager instead of passing the password directly in the command.
1.2 Create the Transactions Table
We create a basic table to store sample transactions:
CREATE TABLE transactions (
id INT AUTO_INCREMENT PRIMARY KEY,
user_id INT,
product_id INT,
amount DECIMAL(10, 2),
status VARCHAR(20),
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
Insert sample transactions:
INSERT INTO transactions (user_id, product_id, amount, status)
VALUES (1, 101, 99.99, 'completed'),
(2, 102, 45.50, 'pending'),
(3, 103, 150.00, 'failed');
Step 2: Processing Data with Python
2.1 Install Dependencies
We install the necessary libraries to connect Python with MySQL and handle the data:
pip install mysql-connector-python pandas prometheus_client
2.2 Extract Transactions and Calculate TPS
The following Python script connects to the MySQL database, extracts the completed transactions, and calculates TPS (Transactions Per Second) statistics. It also exposes these metrics to Prometheus PushGateway so they can be monitored by Grafana.
Improved Python Code:
import mysql.connector
import pandas as pd
from prometheus_client import CollectorRegistry, Gauge, push_to_gateway
import time
def get_transactions_data():
try:
# MySQL Connection
conn = mysql.connector.connect(
host="localhost",
user="root",
password="secret",
database="transactions"
)
cursor = conn.cursor()
# Query completed transactions
query = "SELECT created_at FROM transactions WHERE status = 'completed';"
cursor.execute(query)
# Fetch results and convert to DataFrame
result = cursor.fetchall()
df = pd.DataFrame(result, columns=['created_at'])
# Calculate Transactions Per Second (TPS)
df['created_at'] = pd.to_datetime(df['created_at'])
df = df.set_index('created_at').resample('S').size()
# Calculate TPS statistics
tps_avg = df.mean()
tps_min = df.min()
tps_max = df.max()
print(f"Average TPS: {tps_avg:.2f}, Min TPS: {tps_min}, Max TPS: {tps_max}")
# Expose TPS metrics to Prometheus PushGateway
registry = CollectorRegistry()
g = Gauge('transactions_per_second', 'Real-time TPS', registry=registry)
g.set(tps_avg)
push_to_gateway('localhost:9091', job='transaction_job', registry=registry)
except mysql.connector.Error as err:
print(f"Error: {err}")
finally:
cursor.close()
conn.close()
# Run function
get_transactions_data()
Code Explanation:
-
TPS Calculation: The script calculates TPS statistics (average, minimum, and maximum) using
pandas
and prints them to the console. -
Prometheus PushGateway: We use
prometheus_client
to send TPS metrics to the Prometheus PushGateway, which exposes them to Prometheus. -
pdb for debugging: If debugging is needed, add
import pdb; pdb.set_trace()
anywhere in the code.
Step 3: System Monitoring with Prometheus and Grafana
3.1 What is Prometheus PushGateway?
Prometheus PushGateway allows applications like Python to send custom metrics (such as TPS) to Prometheus. This is useful for jobs that are not continuously running but need to send metrics at irregular intervals.
Setting Up Prometheus PushGateway with Docker:
docker run -d --name pushgateway -p 9091:9091 prom/pushgateway
3.2 Installing Prometheus and Grafana using Docker
We start containers for Prometheus and Grafana:
# Run Prometheus
docker run -d --name prometheus -p 9090:9090 prom/prometheus
# Run Grafana
docker run -d --name=grafana -p 3000:3000 grafana/grafana
Access Grafana at http://localhost:3000
(credentials: admin/admin).
3.3 Adding Prometheus as a Data Source in Grafana
- Log in to Grafana.
- Go to Configuration > Data Sources.
- Add a new data source and select Prometheus.
- Enter the URL of Prometheus (
http://localhost:9090
). - Click Save & Test.
3.4 Prometheus Queries for Monitoring
Query to monitor TPS from PushGateway:
transactions_per_second
This query returns the value of TPS exposed by Prometheus PushGateway.
Query to monitor Docker memory usage:
container_memory_usage_bytes{container_name="mysql-db"}
Step 4: Creating a Dashboard in Grafana
We will now create a dashboard in Grafana to visualize TPS and memory usage metrics.
4.1 Visualizing Memory Usage:
- Go to Dashboards > New Dashboard.
- Add a new panel and select the Prometheus query to monitor the memory usage of the MySQL container:
container_memory_usage_bytes{container_name="mysql-db"}
4.2 Visualizing TPS from PushGateway:
- Create a new panel in the same dashboard.
- Use the query
transactions_per_second
to visualize TPS in real-time.
Scalability and Performance
Scaling with Docker
To scale the system, you can run multiple instances of MySQL or any other service and distribute the load using load balancers like NGINX or HAProxy.
Check running containers:
docker ps
This command shows all active containers. You can also check logs with:
docker logs <container_name>
Handling Load in Prometheus
Prometheus can handle large volumes of data by distributing the load across multiple scrapers and storing time-series data on fast disks such as SSDs.
Conclusion
Data engineering is not just about moving and processing information; it’s also about ensuring that systems operate efficiently. With this integration of Python, Prometheus PushGateway, and Grafana using Docker, we have created a complete system to monitor TPS and resource usage in real-time.
With these tools, you can optimize system performance and detect issues before they affect critical operations. Additionally, scalability is guaranteed by using Docker containers, which make deploying multiple instances and distributing load easier.
Next Steps:
- Set up automatic alerts in Grafana to receive notifications when TPS or memory usage exceeds established limits.
- Expand monitoring with more custom metrics.