Performance Monitoring¶
Monitoring system performance and identifying bottlenecks.
System Resources¶
CPU Monitoring¶
Real-time monitoring:
CPU usage by process:
Service-specific CPU:
systemctl status mb-rpcd | grep CPU
systemctl status mb-filter | grep CPU
systemctl status clamd@scan | grep CPU
Historical CPU usage:
Memory Monitoring¶
Current memory usage:
Detailed memory breakdown:
Top memory consumers:
Service memory usage:
systemctl status mb-rpcd | grep Memory
systemctl status clamd@scan | grep Memory
systemctl status redis-server | grep Memory
Memory pressure:
# Check for OOM killer activity
dmesg | grep -i "out of memory"
grep -i "killed process" /var/log/syslog
Disk I/O Monitoring¶
Disk usage:
Inode usage:
Real-time disk I/O:
Top disk I/O processes:
Disk performance:
# Test read speed
sudo hdparm -t /dev/sda
# Test write speed
dd if=/dev/zero of=/tmp/test bs=1M count=1024 conv=fdatasync
rm /tmp/test
Network Monitoring¶
Network interfaces:
Active connections:
Connection by service:
Network throughput:
Bandwidth usage:
Email Processing Metrics¶
Queue Monitoring¶
Queue depth:
Queue age:
Messages per hour:
Processing time:
Filter Performance¶
Scan times:
sudo tail -n 1000 /var/log/mailborder/mb-filter.log | \
grep "scan_time" | \
awk '{sum+=$NF; count++} END {print "Average:", sum/count, "ms"}'
ClamAV performance:
Rspamd performance:
Processing Statistics¶
Daily email volume:
Spam detection rate:
Virus detection:
Database Performance¶
Connection Monitoring¶
Active connections:
Connection pool status:
Max connections:
Query Performance¶
Slow queries:
Query cache:
Recent slow queries:
Table Performance¶
Table sizes:
SELECT
table_name,
ROUND(((data_length + index_length) / 1024 / 1024), 2) AS size_mb,
table_rows,
ROUND((data_length / 1024 / 1024), 2) AS data_mb,
ROUND((index_length / 1024 / 1024), 2) AS index_mb
FROM information_schema.TABLES
WHERE table_schema = 'mailborder'
ORDER BY (data_length + index_length) DESC
LIMIT 10;
Fragmentation:
SELECT
table_name,
ROUND((data_free / 1024 / 1024), 2) AS fragmented_mb
FROM information_schema.TABLES
WHERE table_schema = 'mailborder'
AND data_free > 0
ORDER BY data_free DESC;
Lock Monitoring¶
Table locks:
Lock waits:
Redis Performance¶
Memory Usage¶
Memory stats:
Key count:
Eviction stats:
Performance Metrics¶
Operations per second:
Hit rate:
Latency:
Slow Commands¶
Enable slow log:
View slow commands:
Web Interface Performance¶
Response Times¶
Test endpoint:
Multiple requests:
for i in {1..10}; do
curl -o /dev/null -s -w '%{time_total}\n' https://localhost/
done | awk '{sum+=$1; count++} END {print "Average:", sum/count, "seconds"}'
PHP-FPM Metrics¶
Pool status:
Key metrics: - Active processes - Idle processes - Max children reached - Slow requests
Process list:
Nginx Metrics¶
Active connections:
Access log analysis:
# Request rate
sudo awk '{print $4}' /var/log/nginx/access.log | \
cut -d: -f2 | sort | uniq -c
# Response times
sudo awk '{print $NF}' /var/log/nginx/access.log | \
sort -n | tail -n 100
Service Health Checks¶
Automated Monitoring¶
Create health check script:
sudo tee /usr/local/bin/mb-health-check.sh << 'EOF'
#!/bin/bash
# Service checks
for service in mb-rpcd mb-filter mb-milter clamd@scan redis-server nginx php8.2-fpm; do
if ! systemctl is-active --quiet $service; then
echo "CRITICAL: $service is not running"
fi
done
# Disk space
DISK_USAGE=$(df -h / | awk 'NR==2 {print $5}' | sed 's/%//')
if [ $DISK_USAGE -gt 90 ]; then
echo "WARNING: Disk usage at ${DISK_USAGE}%"
fi
# Memory
MEM_USAGE=$(free | awk 'NR==2 {printf "%.0f", $3*100/$2}')
if [ $MEM_USAGE -gt 90 ]; then
echo "WARNING: Memory usage at ${MEM_USAGE}%"
fi
# Queue depth
QUEUE_SIZE=$(postqueue -p | tail -n 1 | awk '{print $5}')
if [ ! -z "$QUEUE_SIZE" ] && [ $QUEUE_SIZE -gt 100 ]; then
echo "WARNING: Mail queue at $QUEUE_SIZE messages"
fi
# Database
if ! mysqladmin ping -h localhost &>/dev/null; then
echo "CRITICAL: Database not responding"
fi
# Redis
if ! redis-cli ping &>/dev/null; then
echo "CRITICAL: Redis not responding"
fi
echo "All checks passed"
EOF
sudo chmod +x /usr/local/bin/mb-health-check.sh
Run health check:
Schedule periodic checks:
# Add to crontab
echo "*/5 * * * * /usr/local/bin/mb-health-check.sh | logger -t mb-health" | sudo crontab -
Performance Alerting¶
Email Alerts¶
Configure alerts:
sudo tee /etc/mailborder/alerts.conf << 'EOF'
# Alert thresholds
CPU_THRESHOLD=80
MEM_THRESHOLD=85
DISK_THRESHOLD=90
QUEUE_THRESHOLD=200
# Alert email
ALERT_EMAIL="admin@example.com"
EOF
Alert script:
sudo tee /usr/local/bin/mb-alert.sh << 'EOF'
#!/bin/bash
source /etc/mailborder/alerts.conf
CPU=$(top -bn1 | grep "Cpu(s)" | awk '{print $2}' | cut -d'%' -f1)
if [ ${CPU%.*} -gt $CPU_THRESHOLD ]; then
echo "CPU usage at $CPU%" | mail -s "Mailborder Alert: High CPU" $ALERT_EMAIL
fi
MEM=$(free | awk 'NR==2 {printf "%.0f", $3*100/$2}')
if [ $MEM -gt $MEM_THRESHOLD ]; then
echo "Memory usage at $MEM%" | mail -s "Mailborder Alert: High Memory" $ALERT_EMAIL
fi
DISK=$(df -h / | awk 'NR==2 {print $5}' | sed 's/%//')
if [ $DISK -gt $DISK_THRESHOLD ]; then
echo "Disk usage at $DISK%" | mail -s "Mailborder Alert: Low Disk Space" $ALERT_EMAIL
fi
EOF
sudo chmod +x /usr/local/bin/mb-alert.sh
Integration with Monitoring Systems¶
Nagios/Icinga:
# Install NRPE plugin
sudo apt install nagios-nrpe-server
# Configure check
sudo tee -a /etc/nagios/nrpe.cfg << 'EOF'
command[check_mailborder]=/usr/local/bin/mb-health-check.sh
EOF
sudo systemctl restart nagios-nrpe-server
Prometheus:
# Install node_exporter
sudo apt install prometheus-node-exporter
# Custom metrics endpoint
sudo tee /usr/local/bin/mb-metrics.sh << 'EOF'
#!/bin/bash
echo "# HELP mailborder_queue_size Current mail queue size"
echo "# TYPE mailborder_queue_size gauge"
QUEUE=$(postqueue -p | tail -n 1 | awk '{print $5}')
echo "mailborder_queue_size ${QUEUE:-0}"
EOF
Performance Baselines¶
Establish Baselines¶
Collect baseline metrics:
# CPU baseline
echo "CPU Baseline: $(top -bn1 | grep "Cpu(s)" | awk '{print $2}')" >> /var/log/mailborder/baseline.log
# Memory baseline
echo "Memory Baseline: $(free -m | awk 'NR==2 {print $3}')" >> /var/log/mailborder/baseline.log
# Processing rate
RATE=$(sudo mb-filter-stats --rate)
echo "Processing Rate: $RATE emails/hour" >> /var/log/mailborder/baseline.log
Compare to baseline:
Performance Dashboards¶
Create Simple Dashboard¶
Web-based status page:
sudo tee /var/www/html/status.php << 'EOF'
<?php
header('Content-Type: application/json');
$status = [
'services' => [
'mb-rpcd' => shell_exec('systemctl is-active mb-rpcd'),
'mb-filter' => shell_exec('systemctl is-active mb-filter'),
'database' => shell_exec('mysqladmin ping 2>&1'),
],
'queue_size' => (int) shell_exec('postqueue -p | tail -n 1 | awk \'{print $5}\''),
'cpu_usage' => (float) shell_exec('top -bn1 | grep "Cpu(s)" | awk \'{print $2}\' | cut -d\'%\' -f1'),
'memory_pct' => (int) shell_exec('free | awk \'NR==2 {printf "%.0f", $3*100/$2}\''),
'disk_usage' => (int) shell_exec('df -h / | awk \'NR==2 {print $5}\' | sed \'s/%//\''),
];
echo json_encode($status, JSON_PRETTY_PRINT);
?>
EOF
Access dashboard:
Tuning Recommendations¶
Based on metrics:
- High CPU usage:
- Increase worker processes
- Enable caching
- Optimize database queries
-
Consider hardware upgrade
-
High memory usage:
- Reduce cache sizes
- Limit concurrent processes
- Optimize mb-rpcd fork limit
-
Add more RAM
-
High disk I/O:
- Move logs to separate disk
- Optimize database writes
- Use SSD for database
-
Increase buffer sizes
-
Slow email processing:
- Increase filter workers
- Tune ClamAV limits
- Optimize Rspamd configuration
-
Review policy complexity
-
Database bottlenecks:
- Add indexes
- Optimize queries
- Increase connection pool
- Archive old data