Disaster Recovery¶
Comprehensive disaster recovery planning and procedures for Mailborder.
Disaster Recovery Planning¶
Recovery Objectives¶
RTO (Recovery Time Objective): Time within which services must be restored.
- Critical: < 1 hour (email processing)
- High: < 4 hours (web interface, reporting)
- Medium: < 24 hours (historical logs)
- Low: < 72 hours (archived data)
RPO (Recovery Point Objective): Maximum acceptable data loss.
- Critical: < 15 minutes (email queue, database)
- High: < 1 hour (configuration, user data)
- Medium: < 24 hours (logs, statistics)
- Low: < 7 days (archives)
Disaster Scenarios¶
Hardware Failure: - Server hardware failure - Disk failure - Network failure - Power outage
Software Failure: - Database corruption - Configuration errors - Service crashes - Kernel panic
Security Incidents: - Ransomware attack - Data breach - DDoS attack - Unauthorized access
Natural Disasters: - Fire, flood, earthquake - Extended power outage - Building evacuation - ISP outage
Human Error: - Accidental deletion - Configuration mistakes - Unauthorized changes - Data corruption
Backup Strategy¶
Backup Types and Schedule¶
Full System Backup:
Database Backup:
Configuration Backup:
Incremental Backup:
Offsite Backup¶
Remote Backup Server:
# Configure remote backup
sudo tee /etc/mailborder/backup-remote.conf << 'EOF'
[remote]
enabled = true
host = backup.example.com
user = backup
path = /backups/mailborder/
method = rsync
encryption = true
retention_days = 90
EOF
Automated offsite sync:
sudo tee /usr/local/bin/mb-backup-offsite.sh << 'EOF'
#!/bin/bash
SOURCE="/var/backups/mailborder/"
DEST="backup@backup.example.com:/backups/mailborder/"
# Sync with encryption
rsync -avz --delete \
-e "ssh -i /root/.ssh/backup_key" \
"$SOURCE" "$DEST"
if [ $? -eq 0 ]; then
logger "Offsite backup successful"
else
logger "Offsite backup FAILED"
echo "Offsite backup failed" | mail -s "Backup Alert" admin@example.com
fi
EOF
sudo chmod +x /usr/local/bin/mb-backup-offsite.sh
Schedule offsite backup:
# Daily at 3 AM (after full backup)
echo "0 3 * * * /usr/local/bin/mb-backup-offsite.sh" | sudo crontab -
Cloud Backup¶
AWS S3 Backup:
sudo apt install awscli
# Configure AWS credentials
aws configure
# Backup script
sudo tee /usr/local/bin/mb-backup-s3.sh << 'EOF'
#!/bin/bash
BACKUP_FILE="/tmp/mailborder-$(date +%Y%m%d-%H%M%S).tar.gz"
S3_BUCKET="s3://mailborder-backups"
# Create backup
mb-backup --full --output "$BACKUP_FILE"
# Encrypt
gpg --encrypt --recipient admin@example.com "$BACKUP_FILE"
# Upload to S3
aws s3 cp "${BACKUP_FILE}.gpg" "$S3_BUCKET/" \
--storage-class STANDARD_IA
# Cleanup
rm -f "$BACKUP_FILE" "${BACKUP_FILE}.gpg"
# Lifecycle management (delete after 90 days)
aws s3api put-bucket-lifecycle-configuration \
--bucket mailborder-backups \
--lifecycle-configuration file:///etc/mailborder/s3-lifecycle.json
EOF
sudo chmod +x /usr/local/bin/mb-backup-s3.sh
S3 Lifecycle Policy:
{
"Rules": [
{
"Id": "DeleteOldBackups",
"Status": "Enabled",
"Prefix": "",
"Expiration": {
"Days": 90
},
"Transitions": [
{
"Days": 30,
"StorageClass": "GLACIER"
}
]
}
]
}
Backup Verification¶
Automated Verification:
sudo tee /usr/local/bin/mb-backup-verify.sh << 'EOF'
#!/bin/bash
LATEST_BACKUP=$(ls -t /var/backups/mailborder/full-*.tar.gz | head -n 1)
if [ -z "$LATEST_BACKUP" ]; then
echo "ERROR: No backup found" | mail -s "Backup Verification Failed" admin@example.com
exit 1
fi
# Test integrity
if ! tar tzf "$LATEST_BACKUP" > /dev/null 2>&1; then
echo "ERROR: Backup $LATEST_BACKUP is corrupted" | \
mail -s "Backup Verification Failed" admin@example.com
exit 1
fi
# Test restore to temp location
TEMP_DIR=$(mktemp -d)
tar xzf "$LATEST_BACKUP" -C "$TEMP_DIR" etc/mailborder/mailborder.conf
if [ ! -f "$TEMP_DIR/etc/mailborder/mailborder.conf" ]; then
echo "ERROR: Backup incomplete" | mail -s "Backup Verification Failed" admin@example.com
rm -rf "$TEMP_DIR"
exit 1
fi
rm -rf "$TEMP_DIR"
echo "Backup verification successful: $LATEST_BACKUP" | logger
EOF
sudo chmod +x /usr/local/bin/mb-backup-verify.sh
Weekly verification:
Recovery Procedures¶
Complete System Failure¶
Total system loss - rebuild from scratch:
Step 1: Prepare new server
# Install base OS (Ubuntu/Debian)
# Configure network
# Set hostname
sudo hostnamectl set-hostname mailborder.example.com
# Update system
sudo apt update && sudo apt upgrade -y
Step 2: Install Mailborder
# Add repository
wget -O - https://repo.mailborder.com/gpg.key | sudo apt-key add -
echo "deb https://repo.mailborder.com/debian stable main" | \
sudo tee /etc/apt/sources.list.d/mailborder.list
# Install
sudo apt update
sudo apt install mailborder
Step 3: Stop services
Step 4: Restore from backup
# Download latest backup
scp backup@backup.example.com:/backups/mailborder/latest.tar.gz /tmp/
# Or from S3
aws s3 cp s3://mailborder-backups/latest.tar.gz.gpg /tmp/
gpg --decrypt /tmp/latest.tar.gz.gpg > /tmp/latest.tar.gz
# Restore
sudo tar xzf /tmp/latest.tar.gz -C /
# Fix permissions
sudo chown -R mailborder:mailborder /etc/mailborder
sudo chown -R www-data:www-data /srv/mailborder
sudo chown -R clamav:clamav /var/lib/clamav
Step 5: Restore database
# Restore database from backup
gunzip < /tmp/db-backup.sql.gz | sudo mysql mailborder
# Verify
sudo mysql mailborder -e "SELECT COUNT(*) FROM mb_users"
Step 6: Configure network settings
# Update IP addresses if changed
sudo nano /etc/mailborder/mailborder.conf
sudo nano /etc/postfix/main.cf
# Update DNS if hostname changed
# Update SSL certificates if hostname changed
Step 7: Start services
Step 8: Verify functionality
sudo mb-status
sudo mb-doctor
# Test email flow
echo "Test" | mail -s "Recovery Test" test@example.com
# Test web interface
curl -k https://localhost/
# Check logs
sudo journalctl -u mb-rpcd -n 50
sudo tail -f /var/log/mail.log
Step 9: Update external records
# Update MX records if IP changed
# Update SPF records if needed
# Update monitoring systems
# Notify users of any changes
Database Failure¶
Database corruption or complete loss:
Step 1: Stop services
Step 2: Backup current state (if possible)
Step 3: Remove corrupted database
Step 4: Reinitialize MySQL
Step 5: Restore database
# Create database
sudo mysql -e "CREATE DATABASE mailborder"
sudo mysql -e "CREATE USER 'mailborder'@'localhost' IDENTIFIED BY 'password'"
sudo mysql -e "GRANT ALL ON mailborder.* TO 'mailborder'@'localhost'"
# Restore from backup
gunzip < /var/backups/mailborder/db-latest.sql.gz | sudo mysql mailborder
Step 6: Verify and start services
sudo mysql mailborder -e "SHOW TABLES"
sudo systemctl start mb-rpcd mb-filter mb-milter
sudo mb-status
Configuration Corruption¶
Configuration files corrupted or accidentally deleted:
Step 1: Stop affected services
Step 2: Restore configuration
# Extract from backup
sudo tar xzf /var/backups/mailborder/config-latest.tar.gz \
-C / etc/mailborder/ etc/postfix/ etc/rspamd/
Step 3: Verify configuration
Step 4: Restart services
Individual Service Failure¶
Single service crashed or corrupted:
mb-rpcd failure:
# Check logs
sudo journalctl -u mb-rpcd -n 100
# Restart service
sudo systemctl restart mb-rpcd
# If still failing, restore binary
sudo apt reinstall mailborder
sudo systemctl restart mb-rpcd
ClamAV failure:
# Check signature database
sudo sigtool --check /var/lib/clamav/daily.cvd
# Restore signatures from backup
sudo cp /var/backups/clamav/*.cvd /var/lib/clamav/
sudo chown clamav:clamav /var/lib/clamav/*.cvd
sudo systemctl restart clamd@scan
Postfix failure:
# Check configuration
sudo postfix check
# Restore from backup
sudo tar xzf /var/backups/mailborder/config-latest.tar.gz -C / etc/postfix/
sudo postfix reload
# If queue corrupted
sudo postsuper -r ALL
Ransomware Attack¶
System infected with ransomware:
Step 1: Immediate isolation
# Disconnect from network
sudo ip link set eth0 down
# Stop all services
sudo systemctl stop mailborder-*
sudo systemctl stop nginx php8.2-fpm mysql redis-server
Step 2: Assess damage
# Check for encrypted files
find /etc /srv /var -name "*.encrypted" -o -name "*.locked"
# Check for ransom notes
find / -name "README_DECRYPT.txt" -o -name "DECRYPT_INSTRUCTIONS.txt"
# Document everything
ls -lR /etc/mailborder > /tmp/damage-assessment.txt
ls -lR /var/lib/mysql >> /tmp/damage-assessment.txt
Step 3: Wipe and rebuild
# DO NOT PAY RANSOM
# Wipe system completely
# Rebuild from clean OS installation
# Follow "Complete System Failure" procedure above
Step 4: Security audit
# After restoration, identify entry point
# Review access logs, authentication logs
# Update all passwords and keys
# Apply security patches
# Implement additional security measures
Data Center Outage¶
Complete data center failure:
Failover to secondary site:
Step 1: Activate DR site
Step 2: Update DNS
Step 3: Restore latest data
# Sync latest backup from offsite storage
rsync -avz backup.example.com:/backups/mailborder/latest/ /var/backups/mailborder/
# Restore database to DR site
gunzip < /var/backups/mailborder/db-latest.sql.gz | sudo mysql mailborder
Step 4: Notify users
Step 5: Monitor and maintain
# Monitor DR site
sudo mb-status
watch 'postqueue -p'
# Keep synchronizing data until primary is restored
High Availability Setup¶
Active-Passive Failover¶
Primary server monitoring:
sudo tee /usr/local/bin/mb-ha-monitor.sh << 'EOF'
#!/bin/bash
PRIMARY="192.168.1.100"
SECONDARY="192.168.1.101"
# Check primary
if ! ping -c 3 $PRIMARY > /dev/null 2>&1; then
# Primary down, activate secondary
ssh root@$SECONDARY "/usr/local/bin/mb-ha-activate.sh"
# Update DNS
nsupdate << NSEOF
server dns.example.com
update delete mailborder.example.com A
update add mailborder.example.com 300 A $SECONDARY
send
NSEOF
# Alert
echo "Primary failed, switched to secondary" | \
mail -s "HA Failover" admin@example.com
fi
EOF
sudo chmod +x /usr/local/bin/mb-ha-monitor.sh
Secondary activation script:
sudo tee /usr/local/bin/mb-ha-activate.sh << 'EOF'
#!/bin/bash
# Sync latest data
rsync -avz primary:/var/backups/mailborder/latest/ /var/backups/mailborder/
# Restore if needed
# tar xzf /var/backups/mailborder/latest.tar.gz -C /
# Start services
systemctl start mb-rpcd mb-filter mb-milter
systemctl start mysql redis-server nginx
# Verify
mb-status
EOF
sudo chmod +x /usr/local/bin/mb-ha-activate.sh
Geographic Redundancy¶
Multi-region deployment:
Region 1 (Primary):
- mailborder-us.example.com
- MX Priority: 10
Region 2 (Secondary):
- mailborder-eu.example.com
- MX Priority: 20
Region 3 (Tertiary):
- mailborder-as.example.com
- MX Priority: 30
DNS Configuration:
example.com. MX 10 mailborder-us.example.com.
example.com. MX 20 mailborder-eu.example.com.
example.com. MX 30 mailborder-as.example.com.
Testing and Drills¶
Regular DR Testing¶
Monthly Test:
# Test backup restoration to isolated environment
# Verify all services start correctly
# Test email flow
# Document any issues
Quarterly Drill:
# Full DR exercise
# Simulate primary site failure
# Activate DR site
# Verify all functionality
# Measure RTO achievement
# Document lessons learned
Annual Exercise:
# Complete disaster simulation
# Include all stakeholders
# Test communication procedures
# Verify documentation accuracy
# Update DR plan based on results
DR Test Checklist¶
Pre-Test:
□ Notify stakeholders
□ Backup current state
□ Prepare test environment
□ Review procedures
□ Assign roles
During Test:
□ Document start time
□ Follow DR procedures
□ Note any deviations
□ Record issues encountered
□ Measure recovery times
Post-Test:
□ Verify functionality
□ Document completion time
□ Assess success/failure
□ Identify improvements
□ Update procedures
□ Generate report
DR Documentation¶
Maintain Documentation¶
Critical information to document:
Contact Information:
Primary Contact: admin@example.com, +1-555-0100
Secondary Contact: backup@example.com, +1-555-0101
Vendor Support: support@mailborder.com, +1-800-MAILBRD
Data Center: datacenter@provider.com, +1-555-0200
System Information:
Primary Server: 192.168.1.100 (mailborder.example.com)
DR Server: 192.168.1.101 (dr-mailborder.example.com)
Backup Server: backup.example.com
Database: MariaDB 10.11
OS: Ubuntu 24.04 LTS
Access Credentials:
Store securely in password manager:
- Root passwords
- Database passwords
- API keys
- SSL certificates
- Backup encryption keys
Procedures:
- Step-by-step recovery procedures
- Service startup order
- Verification steps
- Rollback procedures
- Escalation paths
Keep Documentation Current¶
Update after: - Configuration changes - System updates - Personnel changes - Contact updates - Successful DR tests - Actual disaster recovery
Review frequency: - Monthly: Contact information - Quarterly: Procedures - Annually: Complete review