Skip to content

Performance Monitoring

Monitoring system performance and identifying bottlenecks.

System Resources

CPU Monitoring

Real-time monitoring:

top -b -n 1 | head -n 20
htop  # Interactive view

CPU usage by process:

ps aux --sort=-%cpu | head -n 10

Service-specific CPU:

systemctl status mb-rpcd | grep CPU
systemctl status mb-filter | grep CPU
systemctl status clamd@scan | grep CPU

Historical CPU usage:

sar -u 1 10  # 10 samples, 1 second apart

Memory Monitoring

Current memory usage:

free -m

Detailed memory breakdown:

cat /proc/meminfo

Top memory consumers:

ps aux --sort=-%mem | head -n 10

Service memory usage:

systemctl status mb-rpcd | grep Memory
systemctl status clamd@scan | grep Memory
systemctl status redis-server | grep Memory

Memory pressure:

# Check for OOM killer activity
dmesg | grep -i "out of memory"
grep -i "killed process" /var/log/syslog

Disk I/O Monitoring

Disk usage:

df -h

Inode usage:

df -i

Real-time disk I/O:

iostat -x 1 10

Top disk I/O processes:

iotop -o  # Only show active processes

Disk performance:

# Test read speed
sudo hdparm -t /dev/sda

# Test write speed
dd if=/dev/zero of=/tmp/test bs=1M count=1024 conv=fdatasync
rm /tmp/test

Network Monitoring

Network interfaces:

ip -s link

Active connections:

netstat -tunapl | grep ESTABLISHED | wc -l

Connection by service:

ss -tunap | grep ":25 "    # SMTP
ss -tunap | grep ":443 "   # HTTPS
ss -tunap | grep ":11332 " # Rspamd

Network throughput:

iftop -i eth0
nload eth0

Bandwidth usage:

vnstat -l  # Live traffic
vnstat -d  # Daily statistics

Email Processing Metrics

Queue Monitoring

Queue depth:

sudo postqueue -p | tail -n 1

Queue age:

sudo find /var/spool/postfix/deferred -type f -mtime +1 | wc -l

Messages per hour:

sudo grep "status=sent" /var/log/mail.log | \
  grep "$(date +'%b %_d %H')" | wc -l

Processing time:

sudo grep "delay=" /var/log/mail.log | \
  awk '{print $7}' | sort -n | tail -n 10

Filter Performance

Scan times:

sudo tail -n 1000 /var/log/mailborder/mb-filter.log | \
  grep "scan_time" | \
  awk '{sum+=$NF; count++} END {print "Average:", sum/count, "ms"}'

ClamAV performance:

sudo grep "clamd" /var/log/mailborder/mb-filter.log | \
  grep "time" | tail -n 20

Rspamd performance:

curl -s http://localhost:11334/stat | jq

Processing Statistics

Daily email volume:

sudo mb-filter-stats --daily

Spam detection rate:

sudo mb-filter-stats --spam-rate --last 24h

Virus detection:

sudo grep "FOUND" /var/log/clamav/clamav.log | wc -l

Database Performance

Connection Monitoring

Active connections:

sudo mysql -e "SHOW PROCESSLIST"

Connection pool status:

sudo mysql -e "SHOW STATUS LIKE 'Threads_%'"

Max connections:

sudo mysql -e "SHOW VARIABLES LIKE 'max_connections'"

Query Performance

Slow queries:

sudo mysql -e "SHOW GLOBAL STATUS LIKE 'Slow_queries'"

Query cache:

sudo mysql -e "SHOW STATUS LIKE 'Qcache%'"

Recent slow queries:

sudo tail -n 50 /var/log/mysql/slow.log

Table Performance

Table sizes:

SELECT
  table_name,
  ROUND(((data_length + index_length) / 1024 / 1024), 2) AS size_mb,
  table_rows,
  ROUND((data_length / 1024 / 1024), 2) AS data_mb,
  ROUND((index_length / 1024 / 1024), 2) AS index_mb
FROM information_schema.TABLES
WHERE table_schema = 'mailborder'
ORDER BY (data_length + index_length) DESC
LIMIT 10;

Fragmentation:

SELECT
  table_name,
  ROUND((data_free / 1024 / 1024), 2) AS fragmented_mb
FROM information_schema.TABLES
WHERE table_schema = 'mailborder'
  AND data_free > 0
ORDER BY data_free DESC;

Lock Monitoring

Table locks:

sudo mysql -e "SHOW OPEN TABLES WHERE In_use > 0"

Lock waits:

sudo mysql -e "SHOW STATUS LIKE 'Table_locks_waited'"

Redis Performance

Memory Usage

Memory stats:

redis-cli INFO memory

Key count:

redis-cli DBSIZE

Eviction stats:

redis-cli INFO stats | grep evicted

Performance Metrics

Operations per second:

redis-cli INFO stats | grep instantaneous_ops_per_sec

Hit rate:

redis-cli INFO stats | grep keyspace_hits
redis-cli INFO stats | grep keyspace_misses

Latency:

redis-cli --latency
redis-cli --latency-history

Slow Commands

Enable slow log:

redis-cli CONFIG SET slowlog-log-slower-than 10000  # 10ms
redis-cli CONFIG SET slowlog-max-len 128

View slow commands:

redis-cli SLOWLOG GET 10

Web Interface Performance

Response Times

Test endpoint:

curl -o /dev/null -s -w '%{time_total}\n' https://localhost/

Multiple requests:

for i in {1..10}; do
  curl -o /dev/null -s -w '%{time_total}\n' https://localhost/
done | awk '{sum+=$1; count++} END {print "Average:", sum/count, "seconds"}'

PHP-FPM Metrics

Pool status:

curl -s http://localhost/fpm-status

Key metrics: - Active processes - Idle processes - Max children reached - Slow requests

Process list:

curl -s http://localhost/fpm-status?full

Nginx Metrics

Active connections:

curl -s http://localhost/nginx_status

Access log analysis:

# Request rate
sudo awk '{print $4}' /var/log/nginx/access.log | \
  cut -d: -f2 | sort | uniq -c

# Response times
sudo awk '{print $NF}' /var/log/nginx/access.log | \
  sort -n | tail -n 100

Service Health Checks

Automated Monitoring

Create health check script:

sudo tee /usr/local/bin/mb-health-check.sh << 'EOF'
#!/bin/bash

# Service checks
for service in mb-rpcd mb-filter mb-milter clamd@scan redis-server nginx php8.2-fpm; do
  if ! systemctl is-active --quiet $service; then
    echo "CRITICAL: $service is not running"
  fi
done

# Disk space
DISK_USAGE=$(df -h / | awk 'NR==2 {print $5}' | sed 's/%//')
if [ $DISK_USAGE -gt 90 ]; then
  echo "WARNING: Disk usage at ${DISK_USAGE}%"
fi

# Memory
MEM_USAGE=$(free | awk 'NR==2 {printf "%.0f", $3*100/$2}')
if [ $MEM_USAGE -gt 90 ]; then
  echo "WARNING: Memory usage at ${MEM_USAGE}%"
fi

# Queue depth
QUEUE_SIZE=$(postqueue -p | tail -n 1 | awk '{print $5}')
if [ ! -z "$QUEUE_SIZE" ] && [ $QUEUE_SIZE -gt 100 ]; then
  echo "WARNING: Mail queue at $QUEUE_SIZE messages"
fi

# Database
if ! mysqladmin ping -h localhost &>/dev/null; then
  echo "CRITICAL: Database not responding"
fi

# Redis
if ! redis-cli ping &>/dev/null; then
  echo "CRITICAL: Redis not responding"
fi

echo "All checks passed"
EOF

sudo chmod +x /usr/local/bin/mb-health-check.sh

Run health check:

sudo /usr/local/bin/mb-health-check.sh

Schedule periodic checks:

# Add to crontab
echo "*/5 * * * * /usr/local/bin/mb-health-check.sh | logger -t mb-health" | sudo crontab -

Performance Alerting

Email Alerts

Configure alerts:

sudo tee /etc/mailborder/alerts.conf << 'EOF'
# Alert thresholds
CPU_THRESHOLD=80
MEM_THRESHOLD=85
DISK_THRESHOLD=90
QUEUE_THRESHOLD=200

# Alert email
ALERT_EMAIL="admin@example.com"
EOF

Alert script:

sudo tee /usr/local/bin/mb-alert.sh << 'EOF'
#!/bin/bash
source /etc/mailborder/alerts.conf

CPU=$(top -bn1 | grep "Cpu(s)" | awk '{print $2}' | cut -d'%' -f1)
if [ ${CPU%.*} -gt $CPU_THRESHOLD ]; then
  echo "CPU usage at $CPU%" | mail -s "Mailborder Alert: High CPU" $ALERT_EMAIL
fi

MEM=$(free | awk 'NR==2 {printf "%.0f", $3*100/$2}')
if [ $MEM -gt $MEM_THRESHOLD ]; then
  echo "Memory usage at $MEM%" | mail -s "Mailborder Alert: High Memory" $ALERT_EMAIL
fi

DISK=$(df -h / | awk 'NR==2 {print $5}' | sed 's/%//')
if [ $DISK -gt $DISK_THRESHOLD ]; then
  echo "Disk usage at $DISK%" | mail -s "Mailborder Alert: Low Disk Space" $ALERT_EMAIL
fi
EOF

sudo chmod +x /usr/local/bin/mb-alert.sh

Integration with Monitoring Systems

Nagios/Icinga:

# Install NRPE plugin
sudo apt install nagios-nrpe-server

# Configure check
sudo tee -a /etc/nagios/nrpe.cfg << 'EOF'
command[check_mailborder]=/usr/local/bin/mb-health-check.sh
EOF

sudo systemctl restart nagios-nrpe-server

Prometheus:

# Install node_exporter
sudo apt install prometheus-node-exporter

# Custom metrics endpoint
sudo tee /usr/local/bin/mb-metrics.sh << 'EOF'
#!/bin/bash
echo "# HELP mailborder_queue_size Current mail queue size"
echo "# TYPE mailborder_queue_size gauge"
QUEUE=$(postqueue -p | tail -n 1 | awk '{print $5}')
echo "mailborder_queue_size ${QUEUE:-0}"
EOF

Performance Baselines

Establish Baselines

Collect baseline metrics:

# CPU baseline
echo "CPU Baseline: $(top -bn1 | grep "Cpu(s)" | awk '{print $2}')" >> /var/log/mailborder/baseline.log

# Memory baseline
echo "Memory Baseline: $(free -m | awk 'NR==2 {print $3}')" >> /var/log/mailborder/baseline.log

# Processing rate
RATE=$(sudo mb-filter-stats --rate)
echo "Processing Rate: $RATE emails/hour" >> /var/log/mailborder/baseline.log

Compare to baseline:

# Current vs baseline comparison
sudo tail -n 20 /var/log/mailborder/baseline.log

Performance Dashboards

Create Simple Dashboard

Web-based status page:

sudo tee /var/www/html/status.php << 'EOF'
<?php
header('Content-Type: application/json');

$status = [
    'services' => [
        'mb-rpcd' => shell_exec('systemctl is-active mb-rpcd'),
        'mb-filter' => shell_exec('systemctl is-active mb-filter'),
        'database' => shell_exec('mysqladmin ping 2>&1'),
    ],
    'queue_size' => (int) shell_exec('postqueue -p | tail -n 1 | awk \'{print $5}\''),
    'cpu_usage' => (float) shell_exec('top -bn1 | grep "Cpu(s)" | awk \'{print $2}\' | cut -d\'%\' -f1'),
    'memory_pct' => (int) shell_exec('free | awk \'NR==2 {printf "%.0f", $3*100/$2}\''),
    'disk_usage' => (int) shell_exec('df -h / | awk \'NR==2 {print $5}\' | sed \'s/%//\''),
];

echo json_encode($status, JSON_PRETTY_PRINT);
?>
EOF

Access dashboard:

curl -s https://localhost/status.php | jq

Tuning Recommendations

Based on metrics:

  1. High CPU usage:
  2. Increase worker processes
  3. Enable caching
  4. Optimize database queries
  5. Consider hardware upgrade

  6. High memory usage:

  7. Reduce cache sizes
  8. Limit concurrent processes
  9. Optimize mb-rpcd fork limit
  10. Add more RAM

  11. High disk I/O:

  12. Move logs to separate disk
  13. Optimize database writes
  14. Use SSD for database
  15. Increase buffer sizes

  16. Slow email processing:

  17. Increase filter workers
  18. Tune ClamAV limits
  19. Optimize Rspamd configuration
  20. Review policy complexity

  21. Database bottlenecks:

  22. Add indexes
  23. Optimize queries
  24. Increase connection pool
  25. Archive old data

See Also