CtrlOps
|Docs
Product Modules

Infrastructure Details

Monitoring CPU, RAM, disk, and server performance metrics

Real-time monitoring of your server's health and performance. Track CPU usage, memory consumption, disk space, and network activity at a glance.

Technical Implementation

CtrlOps uses a high-frequency polling mechanism to keep your infrastructure metrics up-to-date without overloading the server or the desktop application.

Data Collection Lifecycle

  1. Rust Backend Polling: The Tauri backend uses the sysinfo crate and specialized shell commands (like top, df -h, and free -m) to gather raw system data.
  2. Periodic Interval: In an active session, metrics are polled every 2-5 seconds (configurable in settings).
  3. IPC Event Emitters: Once polled, the Rust backend emits an event (e.g., infra:update-stats) containing a JSON payload of all CPU, RAM, and Disk metrics.
  4. React Frontend Subscription: The UI listens for these emitters and updates the dashboard charts in real-time, using a virtualized data layer to maintain 60fps performance even during high activity.

Service Health Logic

For services like Nginx, MySQL, or Docker, CtrlOps specifically monitors the systemd status:

  • Active: Service is running and reporting healthy.
  • Inactive/Failed: Service is stopped or has crashed. CtrlOps parses the exit-code and journalctl logs to provide immediate troubleshooting context.

Dashboard Overview

The Infrastructure panel gives you instant visibility into:

┌─────────────────────────────────────────────────────────────┐
│  Server Health — web-server-01                              │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  💻 CPU Usage                    🧠 Memory Usage            │
│  ████████░░ 45%                  ██████░░░░ 62%             │
│  8 cores active                  4.2 GB / 8 GB              │
│                                                             │
│  💾 Disk Usage                   🌐 Network                 │
│  ██████████ 78%                  ↓ 2.4 MB/s               │
│  234 GB / 300 GB                 ↑ 890 KB/s               │
│                                                             │
│  ⚡ Load Average                🔄 Processes                │
│  0.52 0.48 0.61                  142 running              │
│  (1m 5m 15m)                     3 zombie                 │
│                                                             │
└─────────────────────────────────────────────────────────────┘

CPU Monitoring

Understanding CPU Metrics

CPU Usage (%):

  • 0-50%: Healthy, plenty of headroom
  • 50-80%: Moderate load, monitor trends
  • 80-95%: High load, investigate
  • 95%+: Critical, immediate attention needed

Load Average: Three numbers representing average load over 1, 5, and 15 minutes.

Rule of thumb:

  • Below 1.0 per CPU core = good
  • Above 1.0 per CPU core = overloaded

Example: On a 4-core server:

  • Load 2.0 = 50% utilized (healthy)
  • Load 4.0 = 100% utilized (busy)
  • Load 8.0 = 200% utilized (overloaded, queue forming)

CPU Breakdown

CtrlOps shows:

  • User processes: Your applications
  • System processes: OS kernel tasks
  • I/O wait: Waiting for disk/network
  • Steal time: (VMs only) Time stolen by hypervisor

Troubleshooting High CPU

Step 1: Identify the culprit

# Show processes by CPU usage
htop

# Or in CtrlOps terminal
ps aux --sort=-%cpu | head -10

Step 2: Analyze

  • Is it a legitimate process?
  • Has resource usage spiked suddenly?
  • Is it consuming more than expected?

Step 3: Take action

  • Optimize the application
  • Add more CPU resources
  • Kill runaway processes (carefully!)
  • Schedule heavy tasks for off-peak

Memory Monitoring

Memory Metrics Explained

Total Memory: Physical RAM installed Used Memory: Currently allocated Free Memory: Completely unused Cached: Frequently accessed data (can be freed) Buffers: Disk cache (can be freed)

Available Memory = Free + Cached + Buffers (This is what matters!)

When to Worry

Memory usage patterns:

ScenarioStatusAction
< 50% used✅ HealthyNone needed
50-80% used⚠️ MonitorWatch trends
80-95% used🔶 WarningInvestigate soon
> 95% used🔴 CriticalImmediate action

Out of Memory (OOM)

When memory is exhausted:

  • System becomes sluggish
  • Services may crash
  • Linux OOM killer terminates processes

CtrlOps alerts you before this happens!

Finding Memory Hogs

# Top memory consumers
ps aux --sort=-%mem | head -10

# Memory usage by process
pmap <process_id>

Common culprits:

  • Memory leaks in applications
  • Too many worker processes
  • Large database queries
  • Unoptimized code

Disk Usage

Disk Space Metrics

Total: Physical disk capacity Used: Space consumed Available: Free space Reserved: System reserved (usually 5%)

Disk Partitions

Typical layout:

Filesystem      Size  Used  Avail  Use%  Mounted on
/dev/sda1        30G   12G   16G   43%   /
/dev/sda2       100G   67G   28G   71%   /var
/dev/sdb1       500G   45G  430G   10%   /mnt/data

Key directories:

  • / — Root filesystem (OS, applications)
  • /var — Logs, databases, variable data
  • /home — User files
  • /tmp — Temporary files

Disk Health

CtrlOps monitors:

  • Disk I/O — Read/write operations per second
  • Latency — Time for disk operations
  • Throughput — Data transfer rate
  • Errors — SMART errors, bad sectors

Disk failures are often preceded by increased error rates. Monitor SMART data regularly.

Cleaning Up Disk Space

Find large files:

# Largest files in /var
sudo du -h /var | sort -rh | head -20

# Old log files
find /var/log -name "*.log" -mtime +30 -size +100M

# Docker cleanup
docker system prune -a

Safe to delete:

  • Old logs (> 30 days)
  • Package caches (apt clean)
  • Temporary files
  • Rotated backups

Network Monitoring

Network Metrics

Bandwidth Usage:

  • Incoming (RX): Data downloaded
  • Outgoing (TX): Data uploaded

Measured in:

  • Bytes per second (B/s)
  • Kilobytes per second (KB/s)
  • Megabytes per second (MB/s)

Packet Statistics:

  • Packets received/transmitted
  • Errors
  • Dropped packets
  • Collisions

Network Troubleshooting

High latency:

# Check connection to external site
ping google.com

# Trace route
mtr google.com

Slow transfers:

  • Check bandwidth utilization
  • Look for packet loss
  • Verify network interface speed
  • Check for throttling

Bandwidth Monitoring

CtrlOps tracks:

  • Daily/weekly/monthly usage
  • Peak usage times
  • Per-process network usage
  • Unusual traffic patterns

Process Monitoring

Process States

Running: Currently executing Sleeping: Waiting for something (I/O, timer) Stopped: Paused (Ctrl+Z) Zombie: Finished but not cleaned up Defunct: Similar to zombie

Important Processes to Watch

System-critical:

  • sshd — SSH daemon
  • systemd — Init system
  • cron — Scheduled tasks
  • rsyslogd — Logging

Application-specific:

  • Web server (nginx, Apache)
  • Database (MySQL, PostgreSQL)
  • Application workers
  • Background jobs

Kill vs Terminate

Soft kill (SIGTERM):

kill <pid>

Politely asks process to shut down. Allows cleanup.

Hard kill (SIGKILL):

kill -9 <pid>

Forcefully terminates. Use when soft kill fails.

SIGKILL doesn't allow cleanup. May leave behind temp files or corrupt data.

Historical Data

Performance Graphs

CtrlOps keeps historical data for:

  • CPU usage (hourly, daily, weekly)
  • Memory consumption
  • Disk I/O
  • Network traffic

Useful for:

  • Identifying trends
  • Capacity planning
  • Troubleshooting past issues
  • Proving SLA compliance

Setting Up Alerts

Configure alerts for:

  • CPU > 80% for 5 minutes
  • Memory > 90%
  • Disk < 10% free
  • Load average > number of cores
  • Network errors increasing

Notification channels:

  • Email
  • Slack
  • SMS (critical alerts)
  • Webhook

Performance Optimization

CPU Optimization

  1. Identify inefficient processes
  2. Optimize database queries
  3. Use caching (Redis, Memcached)
  4. Scale horizontally (add servers)
  5. Upgrade hardware (more/faster cores)

Memory Optimization

  1. Find memory leaks (valgrind, profiling)
  2. Reduce buffer sizes
  3. Implement swapping (not ideal, but buys time)
  4. Add more RAM
  5. Optimize application code

Disk Optimization

  1. Use SSD instead of HDD
  2. Implement RAID for redundancy/speed
  3. Partition wisely
  4. Monitor inode usage
  5. Schedule regular cleanup

Network Optimization

  1. Enable compression
  2. Use CDN for static content
  3. Implement caching
  4. Upgrade bandwidth
  5. Optimize payload sizes

Reporting & Analytics

Weekly Reports

Automated email with:

  • Average resource usage
  • Peak usage times
  • Trends vs previous week
  • Recommendations

Capacity Planning

Based on trends, predict when you'll need:

  • More CPU cores
  • Additional RAM
  • Larger disks
  • More bandwidth

Example:

"Current growth rate suggests disk will be full in 45 days. Consider archiving old logs or expanding storage."

Summary

Monitor these key metrics:

  • CPU Load Average — Should be < number of cores
  • Memory Available — Don't let it hit zero
  • Disk Free Space — Keep 15%+ free
  • Network Latency — Should be stable
  • Process Health — Watch for zombies, high CPU/memory

Proactive monitoring catches issues before they become outages. Set up alerts for thresholds before they become critical.

On this page