Linux Troubleshooting and Debugging Guide
Table of Contents
- System Monitoring Commands
- CPU Issues
- Memory Issues
- Disk I/O Issues
- Disk Space Issues
- Networking
- Boot and Startup Issues
- Services and Processes
- Storage and Filesystems
- Permissions and Access
- Log Analysis
- Package Management
- Network Interfaces -- Full Setup Reference
- Quick Reference: Symptom to Command
- Essential Command Reference
- Production Debugging Tips
- Resources
A comprehensive Linux troubleshooting guide covering boot issues, CPU and memory problems, disk I/O, networking, services, permissions, and every essential debugging command with practical examples. Most Linux problems are not mysterious. They are misconfigurations, resource exhaustion, or broken dependencies -- and they leave evidence in logs, metrics, and system state. This guide gives you the tools and mental model to find that evidence systematically. The troubleshooting loop: These are your first tools every time something feels wrong. From the GRUB rescue prompt: After booting into the system: Always check the obvious first: is the service running? is the disk full? is the port already in use? Change one thing at a time. Multiple simultaneous changes make it impossible to know what fixed the problem. Read the error message carefully. Linux error messages are precise. "No space left on device" and "Permission denied" mean exactly what they say. Test your fix before declaring success. Restart the service. Reproduce the original failure scenario. Confirm it no longer happens. Write down what you did. Incidents repeat. The runbook you write today saves you hours at 3am next month.OBSERVE -- what is the current state?
EVENTS -- what changed? (logs, dmesg, journal)
DIAGNOSE -- form a hypothesis
TEST -- verify it (commands, isolation)
RESOLVE -- apply fix
VERIFY -- confirm it worked
DOCUMENT -- write it down System Monitoring Commands
top and htop
# Real-time process monitor
top
# Inside top:
# P = sort by CPU
# M = sort by memory
# 1 = toggle per-core CPU view
# k = kill process by PID
# q = quit
# Better version with colors and mouse support
htop
# F6 = sort by column
# F9 = kill process
# F5 = tree view
# Sort by CPU from CLI
ps aux --sort=-%cpu | head -20
# Sort by memory
ps aux --sort=-%mem | head -20 Load Average
uptime
# 15:23:42 up 12 days, 3:41, 2 users, load average: 1.23, 0.98, 0.87
# 1min 5min 15min
# Rule: load > number of CPU cores = overloaded
nproc # how many CPU cores you have vmstat -- virtual memory, CPU, I/O overview
vmstat 1 # refresh every 1 second
vmstat 1 10 # 10 samples then exit
# Key columns:
# r = processes waiting for CPU (runqueue)
# b = processes blocked on I/O
# si = swap in (KB/s) -- bad if nonzero
# so = swap out (KB/s) -- bad if nonzero
# wa = CPU time waiting on I/O (%)
# us = user CPU %
# sy = kernel CPU % mpstat -- per-CPU statistics
mpstat -P ALL 1 # all CPUs, 1 second interval
# Shows per-core breakdown -- useful to spot single-threaded bottlenecks CPU Issues
Diagnosing High CPU
# Step 1 -- find the offending process
top # press P to sort by CPU
ps aux --sort=-%cpu | head -10
# Step 2 -- check if it is real CPU work or I/O wait
# In top header line:
# %Cpu(s): 12.5 us, 3.2 sy, 0.0 ni, 5.3 id, 78.9 wa
# ↑ wa = I/O wait
# High wa means disk is the bottleneck, not CPU
# Step 3 -- investigate what a process is doing
strace -p <PID> # system calls (adds overhead, use briefly)
strace -p <PID> -c # summary of syscalls (less overhead)
lsof -p <PID> # files the process has open
cat /proc/<PID>/status # detailed process info
cat /proc/<PID>/cmdline # exact command including args
# Step 4 -- check load vs CPU count
uptime
nproc
# If load/nproc > 2, system is significantly overloaded Fixing High CPU
# Lower priority of a running process (nice: -20 = highest, 19 = lowest)
renice +10 <PID> # make less aggressive
renice -n 19 -p <PID> # minimum priority
# Start a new command at low priority
nice -n 19 command # lowest priority
ionice -c 3 nice -n 19 command # low CPU and low I/O
# Limit CPU usage to a percentage
cpulimit -p <PID> -l 50 # limit to 50% of one core
# Kill gracefully then force if needed
kill <PID>
sleep 10
kill -9 <PID>
# Kill all instances by name
killall process-name
pkill -f "pattern in command"
# Check for cryptominers or malware
ps aux | grep -E '(xmrig|miner|kworker)'
crontab -l
sudo crontab -l
ls -la /etc/cron.*
systemctl list-unit-files --state=enabled | grep -v systemd Memory Issues
Diagnosing Memory Problems
# Overview
free -h
# total used free shared buff/cache available
# Mem: 15Gi 12Gi 500Mi 1Gi 3Gi 2.5Gi
# Swap: 8.0Gi 7.0Gi 1.0Gi
# "available" is what matters -- not "free"
# High swap usage = memory pressure = performance problem
# Top memory consumers
ps aux --sort=-%mem | head -15
# Check if OOM killer fired
dmesg | grep -i "out of memory"
dmesg | grep -i "killed process"
journalctl -xe | grep -i oom
# Watch a process for memory leaks
watch -n 2 'ps -p <PID> -o pid,rss,vsz,cmd'
# RSS (resident set size) growing steadily = likely leak
# Detailed process memory map
pmap -x <PID>
cat /proc/<PID>/smaps | grep -i total
# Check swap devices and usage
swapon --show
cat /proc/sys/vm/swappiness # 60 = default, 10 = prefer RAM Fixing Memory Issues
# Drop page cache (safe -- kernel rebuilds it)
sync
echo 3 | sudo tee /proc/sys/vm/drop_caches
# 1 = page cache, 2 = dentries/inodes, 3 = all
# Refresh swap (frees memory back to RAM -- brief freeze)
sudo swapoff -a && sudo swapon -a
# Reduce swappiness -- prefer RAM over swap
sudo sysctl vm.swappiness=10
echo 'vm.swappiness=10' | sudo tee -a /etc/sysctl.conf
# Add a swap file if none exists
sudo fallocate -l 4G /swapfile
sudo chmod 600 /swapfile
sudo mkswap /swapfile
sudo swapon /swapfile
echo '/swapfile none swap sw 0 0' | sudo tee -a /etc/fstab
# Protect a critical process from OOM killer
echo -1000 | sudo tee /proc/<PID>/oom_score_adj
# -1000 = never kill, 0 = default, 1000 = kill first
# In a systemd service unit file
sudo systemctl edit service-name[Service]
OOMScoreAdjust=-1000
MemoryMax=2G Disk I/O Issues
Diagnosing Disk Bottlenecks
# Is it actually I/O wait?
top
# Look for high "wa" in CPU line -- anything above 30% is significant
# Which disk is saturated?
iostat -x 1 5
# Key columns:
# %util -- time disk was busy (>80% = saturated)
# await -- average request wait time in ms (>20ms for SSD, >50ms HDD = slow)
# r/s -- reads per second
# w/s -- writes per second
# rkB/s -- read throughput KB/s
# wkB/s -- write throughput KB/s
# Which processes are doing the most I/O?
sudo iotop -o # -o = only show active processes
sudo pidstat -d 1 # per-process disk I/O stats every second
# Which files are being accessed?
sudo lsof | grep -E ' REG.*sda'
sudo inotifywait -m -r /path/ # watch for file system events
# Check disk health
sudo smartctl -H /dev/sda # quick health check
sudo smartctl -a /dev/sda # full SMART report
# Watch for: Reallocated_Sector_Ct > 0 = bad sectors
# Watch for: Current_Pending_Sector > 0 = failing sectors
# Check kernel for disk errors
dmesg | grep -i "i/o error"
dmesg | grep -i "ata.*error"
dmesg | grep -i "blk_update_request" Fixing Disk I/O
# Lower I/O priority of a process
sudo ionice -c 3 -p <PID> # idle class
sudo ionice -c 2 -n 7 -p <PID> # best effort, lowest
# In systemd service
sudo systemctl edit service-name[Service]
IOSchedulingClass=idle
Nice=19# Use tmpfs for temp files to avoid disk I/O
sudo mount -t tmpfs -o size=2G tmpfs /var/cache/app
# Permanent via /etc/fstab:
# tmpfs /var/cache/app tmpfs size=2G 0 0
# Tune I/O scheduler
cat /sys/block/sda/queue/scheduler
# For SSD (no scheduling needed)
echo none | sudo tee /sys/block/sda/queue/scheduler
# For HDD with mixed workloads
echo bfq | sudo tee /sys/block/sda/queue/scheduler
# Make scheduler change permanent via udev rule
echo 'ACTION=="add|change", KERNEL=="sd[a-z]", ATTR{queue/rotational}=="0", ATTR{queue/scheduler}="none"' \
| sudo tee /etc/udev/rules.d/60-io-scheduler.rules
# Flush write cache (safe before disk work)
sync Disk Space Issues
# Check disk usage by filesystem
df -h
df -i # check inode usage -- can be "full" even with space left
# Find where space is going
sudo du -sh /* 2>/dev/null | sort -rh | head -15
sudo du -sh /var/log/* 2>/dev/null | sort -rh | head -10
sudo du -sh /var/cache/* 2>/dev/null | sort -rh | head -10
sudo du -sh /home/* 2>/dev/null | sort -rh | head -10
# Find files larger than 100MB
sudo find / -type f -size +100M -exec du -h {} + 2>/dev/null | sort -rh | head -20
# Find files deleted but still held open by processes (still consuming space)
sudo lsof +L1
sudo lsof | grep deleted | sort -nrk 7 | head -15
# Quick cleanup
sudo apt clean && sudo apt autoremove --purge # Debian/Ubuntu
sudo dnf clean all # RHEL/Fedora/AlmaLinux
sudo journalctl --vacuum-size=500M # trim systemd journal
sudo journalctl --vacuum-time=7d
# Truncate a log file without breaking the running process
sudo truncate -s 0 /var/log/large-file.log
# Remove old Docker objects
docker system prune -a
# Remove old snap revisions
snap list --all | awk '/disabled/{print $1, $3}' | \
while read name rev; do sudo snap remove "$name" --revision="$rev"; done Log Rotation
# Check existing config
cat /etc/logrotate.conf
ls /etc/logrotate.d/
# Create custom rotation
sudo nano /etc/logrotate.d/myapp/var/log/myapp/*.log {
daily
rotate 7
compress
delaycompress
missingok
notifempty
create 0640 www-data adm
}sudo logrotate -d /etc/logrotate.d/myapp # dry run
sudo logrotate -f /etc/logrotate.d/myapp # force immediately Networking
Interface and IP Diagnostics
# Show all interfaces and their state
ip link show
# Look for UP vs DOWN vs NO-CARRIER
# Show IP addresses
ip addr show
ip a # short form
ip a show eth0 # specific interface
# Show routing table
ip route show
ip r # short form
# Need: "default via <gateway-ip> dev <interface>"
# Test gateway
ping -c 3 $(ip r | awk '/default/{print $3}')
# Physical link check (wired)
ethtool eth0 | grep "Link detected"
# WiFi status
iwconfig wlan0
nmcli device wifi list
rfkill list # check if radio is blocked
sudo rfkill unblock wifi Connectivity Testing
# ICMP connectivity
ping -c 4 8.8.8.8 # test internet
ping -c 4 192.168.1.1 # test gateway
# DNS test
nslookup google.com
dig google.com
dig google.com @8.8.8.8 # test against specific DNS server
host google.com
# Port test
telnet server-ip 80
nc -zv server-ip 80 # netcat port test
nc -zvu server-ip 53 # UDP port test
curl -v http://server-ip:80 # HTTP test with details
# Trace the route
traceroute google.com
mtr google.com # continuous traceroute (better)
# Test multiple ports at once
nmap -p 22,80,443 server-ip Ports and Listening Services
# Show all listening TCP/UDP ports
sudo ss -tulnp
# -t = TCP, -u = UDP, -l = listening, -n = numeric, -p = show process
sudo netstat -tulnp # older alternative to ss
# What is using port 80?
sudo ss -tlnp | grep :80
sudo lsof -i :80
sudo fuser 80/tcp
# All connections including established
sudo ss -tap
sudo netstat -tap
# Check if port is open from remote
nc -zv remote-host 22 DNS Diagnostics
# Check current DNS config
cat /etc/resolv.conf
resolvectl status # systemd-resolved
# Test DNS resolution
nslookup google.com
nslookup google.com 8.8.8.8 # against specific server
dig google.com
dig @1.1.1.1 google.com # against Cloudflare
# Flush DNS cache
sudo resolvectl flush-caches # systemd-resolved
sudo systemd-resolve --flush-caches # alternative
sudo service nscd restart # if using nscd
# Fix DNS
sudo nano /etc/resolv.conf
# nameserver 8.8.8.8
# nameserver 1.1.1.1
# Persistent via NetworkManager
sudo nmcli con mod "connection-name" ipv4.dns "8.8.8.8 1.1.1.1"
sudo nmcli con up "connection-name"
# Persistent via systemd-resolved
sudo nano /etc/systemd/resolved.conf[Resolve]
DNS=8.8.8.8 1.1.1.1
FallbackDNS=1.0.0.1sudo systemctl restart systemd-resolved Packet Capture
# Capture all traffic on eth0
sudo tcpdump -i eth0
# Capture HTTP traffic
sudo tcpdump -i eth0 port 80
# Capture DNS queries
sudo tcpdump -i eth0 port 53
# Save to file for analysis in Wireshark
sudo tcpdump -i eth0 -w capture.pcap
# Read saved file
tcpdump -r capture.pcap
# Verbose output (show packet contents)
sudo tcpdump -i eth0 -A port 80
# Filter by host
sudo tcpdump -i eth0 host 192.168.1.100 Firewall Diagnostics
# Which firewall is active?
sudo iptables -L -n -v
sudo systemctl status firewalld
sudo ufw status verbose
sudo nft list ruleset
# Allow port 80 -- iptables
sudo iptables -A INPUT -p tcp --dport 80 -j ACCEPT
sudo iptables -A INPUT -p tcp --dport 443 -j ACCEPT
sudo iptables-save | sudo tee /etc/iptables/rules.v4
# Allow service -- firewalld
sudo firewall-cmd --permanent --add-service=http
sudo firewall-cmd --permanent --add-port=8080/tcp
sudo firewall-cmd --reload
sudo firewall-cmd --list-all
# Allow port -- UFW
sudo ufw allow 80/tcp
sudo ufw allow from 192.168.1.0/24 to any port 22
sudo ufw enable
sudo ufw status numbered
# Temporarily test without firewall (caution on remote servers)
sudo iptables -F && sudo iptables -P INPUT ACCEPT Boot and Startup Issues
Boot Diagnostics
# After boot -- analyze startup time
systemd-analyze
systemd-analyze blame # slowest services first
systemd-analyze critical-chain # critical path
systemd-analyze plot > boot.svg # visual graph
# Check what failed
systemctl --failed
systemctl list-jobs # pending jobs
# Boot messages
dmesg | tail -50
dmesg -T | grep -i error # errors with timestamps
dmesg -T | grep -i "fail"
journalctl -b # all logs from current boot
journalctl -b -p err # errors from current boot
journalctl -b -1 # previous boot
journalctl -b -2 # two boots ago GRUB Recovery
ls # list partitions
ls (hd0,gpt2)/ # check partition contents
set root=(hd0,gpt2)
set prefix=(hd0,gpt2)/boot/grub
insmod normal
normalsudo grub-install /dev/sda # reinstall GRUB (whole disk, not partition)
sudo update-grub # regenerate grub.cfg
# For UEFI systems
sudo grub-install --target=x86_64-efi \
--efi-directory=/boot/efi \
--bootloader-id=ubuntu
sudo update-grub
# Check EFI boot entries
efibootmgr -v fstab Issues
# Test all fstab mounts without rebooting
sudo mount -a
# If this fails, the system will fail to boot
# Find device UUIDs (use these in fstab, not /dev/sda names)
blkid
lsblk -f
# Verify fstab syntax
cat /etc/fstab
# Network mounts must include _netdev or they block boot
# //server/share /mnt/share cifs credentials=/etc/.creds,_netdev 0 0 Services and Processes
Service Diagnostics
# Check status
systemctl status service-name
systemctl is-active service-name
systemctl is-enabled service-name
# Full logs for a service
journalctl -u service-name
journalctl -u service-name -f # follow
journalctl -u service-name -n 100 # last 100 lines
journalctl -u service-name --since "1 hour ago"
journalctl -u service-name -p err # errors only
# Check service unit file
systemctl cat service-name
# Check dependencies
systemctl list-dependencies service-name
# Check what is blocking boot
systemd-analyze critical-chain service-name.service
# Test configs before restarting
nginx -t
apache2ctl configtest
sshd -t
mysqld --validate-config Service Management
# Start / stop / restart
sudo systemctl start service-name
sudo systemctl stop service-name
sudo systemctl restart service-name
sudo systemctl reload service-name # reload config without restart
# Enable / disable on boot
sudo systemctl enable service-name
sudo systemctl disable service-name
sudo systemctl enable --now service-name # enable and start immediately
# Override service settings (creates /etc/systemd/system/service.d/override.conf)
sudo systemctl edit service-name
# After editing unit files
sudo systemctl daemon-reload
# Emergency: mask a broken service (can't be started even manually)
sudo systemctl mask service-name
sudo systemctl unmask service-name Process Investigation
# Find process by name
pgrep nginx # returns PIDs
pgrep -a nginx # PIDs and full command
pidof nginx
# Process tree
pstree -p # all processes with PIDs
pstree -p <PID> # subtree from PID
# Detailed process info
cat /proc/<PID>/status
cat /proc/<PID>/cmdline | tr '\0' ' ' # full command with args
cat /proc/<PID>/environ | tr '\0' '\n' # environment variables
ls -la /proc/<PID>/fd | wc -l # number of open file descriptors
# What files does a process have open?
lsof -p <PID>
# What processes have a file open?
lsof /var/log/syslog
fuser /var/log/syslog
# What is holding a port?
sudo lsof -i :80
sudo fuser 80/tcp
sudo ss -tlnp | grep :80 Resource Limits
# Check limits for a process
cat /proc/<PID>/limits
# View systemd limits for a service
systemctl show service-name | grep -i limit
# Set per-user limits
sudo nano /etc/security/limits.conf# username type item value
nginx soft nofile 65536
nginx hard nofile 65536
* soft nproc 4096# Set limits in systemd service
sudo systemctl edit service-name[Service]
LimitNOFILE=65536
LimitNPROC=4096 Storage and Filesystems
Disk and Partition Info
# List block devices
lsblk
lsblk -f # include filesystem type and UUID
# List partitions
sudo fdisk -l
sudo parted -l
# Filesystem usage
df -h # human readable
df -i # inode usage
df -Th # with filesystem type
# Show UUIDs
blkid
blkid /dev/sda1
# Check physical device info
sudo hdparm -I /dev/sda # ATA device info
sudo smartctl -i /dev/sda # SMART device info Filesystem Operations
# Mount / unmount
sudo mount /dev/sda1 /mnt
sudo mount -t ext4 /dev/sda1 /mnt
sudo umount /mnt
sudo umount -l /mnt # lazy unmount (if device busy)
# Bind mount (mount directory to another location)
sudo mount --bind /source /destination
# Check if filesystem is mounted
mount | grep sda1
findmnt /mnt
# Show open files on a filesystem
lsof +D /mnt # useful before unmounting
fuser -vm /mnt Filesystem Check and Repair
# Check BEFORE mounting or on unmounted filesystem
sudo fsck /dev/sda1
sudo fsck -y /dev/sda1 # answer yes to all repairs automatically
sudo fsck -n /dev/sda1 # dry run (no changes)
# ext4 specific
sudo e2fsck -f /dev/sda1 # force check even if clean
sudo tune2fs -l /dev/sda1 # show filesystem parameters
# XFS (must be mounted for repair)
sudo xfs_repair /dev/sda1 # unmounted
sudo xfs_check /dev/sda1 # check only
# Check SMART health
sudo smartctl -H /dev/sda
sudo smartctl -t short /dev/sda && sleep 120 && sudo smartctl -a /dev/sda LVM (Logical Volume Manager)
# List volumes
sudo pvs # physical volumes
sudo vgs # volume groups
sudo lvs # logical volumes
# Extend a logical volume and filesystem
sudo lvextend -L +10G /dev/vg/lv-name
sudo resize2fs /dev/vg/lv-name # ext4
sudo xfs_growfs /mount/point # xfs
# Create snapshot
sudo lvcreate -L 5G -s -n snapshot /dev/vg/lv-name
# Remove old snapshot
sudo lvremove /dev/vg/snapshot Permissions and Access
Reading Permissions
ls -la /path/to/file
# -rw-r--r-- 1 alice developers 1234 Jan 1 12:00 file.txt
# │└──┘└──┘└──┘
# │ │ │ │
# │ │ │ └─ others: r-- (read only)
# │ │ └───── group: r-- (read only)
# │ └───────── owner: rw- (read + write)
# └─────────── type: - = file, d = dir, l = symlink
# Permission bits: r=4, w=2, x=1
# 755 = rwxr-xr-x (owner full, others read+execute)
# 644 = rw-r--r-- (owner read+write, others read)
# 600 = rw------- (owner only)
# 700 = rwx------ (owner only, executable) Fixing Permissions
# Change permissions
chmod 644 file.txt
chmod 755 directory/
chmod -R 755 /path/dir/ # recursive (be careful)
chmod u+x script.sh # add execute for owner
chmod go-w file.txt # remove write from group and others
chmod a+r file.txt # add read for all
# Change ownership
sudo chown user file.txt
sudo chown user:group file.txt
sudo chown -R user:group /path/dir/
# Change group only
sudo chgrp group file.txt Special Attributes
# Check for special flags (immutable, append-only, etc.)
lsattr file.txt
lsattr -d /path/directory/
# Common flags:
# i = immutable (cannot modify or delete)
# a = append-only (can only add data)
# Remove immutable flag
sudo chattr -i file.txt
# Set append-only (useful for log files)
sudo chattr +a /var/log/app.log ACLs (Access Control Lists)
# View ACLs
getfacl file.txt
# Add user ACL
setfacl -m u:username:rw file.txt
# Add group ACL
setfacl -m g:groupname:r file.txt
# Set default ACL on directory (inherits to new files)
setfacl -d -m u:username:rw /path/dir/
# Remove all ACLs
setfacl -b file.txt SELinux
# Check SELinux mode
getenforce
sestatus
# Check file security context
ls -Z /path/to/file
# Check recent SELinux denials
sudo ausearch -m avc -ts recent
sudo sealert -a /var/log/audit/audit.log
# Temporarily disable for testing
sudo setenforce 0 # permissive mode
sudo setenforce 1 # back to enforcing
# Fix file context
sudo restorecon -v /path/to/file
sudo restorecon -Rv /path/dir/ # recursive
# Allow a port for a service
sudo semanage port -a -t http_port_t -p tcp 8080
# Apply a suggested policy from audit log
sudo ausearch -m avc -ts recent | audit2allow -M mymodule
sudo semodule -i mymodule.pp Log Analysis
systemd Journal
# Most useful commands
journalctl -xe # recent errors with explanations
journalctl -f # follow live
journalctl -b # current boot
journalctl -b -1 # previous boot
journalctl -p err # errors only
journalctl -p err -b # errors from current boot
# Filter by service
journalctl -u nginx
journalctl -u nginx -f
journalctl -u nginx --since "1 hour ago"
# Filter by time
journalctl --since "2026-01-01 10:00" --until "2026-01-01 11:00"
journalctl --since "30 minutes ago"
# Filter by PID
journalctl _PID=1234
# Show kernel messages
journalctl -k # kernel messages only
# Disk usage of journal
journalctl --disk-usage
# Clean old journal entries
sudo journalctl --vacuum-size=500M
sudo journalctl --vacuum-time=30d Traditional Log Files
# Real-time monitoring
tail -f /var/log/syslog
tail -f /var/log/nginx/error.log
# Search in log files
grep "ERROR" /var/log/app.log
grep -i "error\|fail\|warn" /var/log/syslog | tail -50
grep -B 5 -A 5 "fatal" /var/log/app.log # context before/after
# Search in compressed logs
zgrep "error" /var/log/syslog.2.gz
zcat /var/log/syslog.1.gz | grep error
# Kernel messages
dmesg
dmesg -T # with human-readable timestamps
dmesg | tail -30
dmesg -T | grep -i error
dmesg -T | grep -i "fail\|error\|warn" | tail -20 Log Locations Reference
Log Location (Debian/Ubuntu) Location (RHEL/AlmaLinux) General system /var/log/syslog/var/log/messagesAuthentication /var/log/auth.log/var/log/secureKernel /var/log/kern.logvia dmesg / journalctl -k Boot /var/log/boot.log/var/log/boot.logCron /var/log/syslog/var/log/cronNginx /var/log/nginx//var/log/nginx/Apache /var/log/apache2//var/log/httpd/MySQL /var/log/mysql//var/log/mariadb/PostgreSQL /var/log/postgresql//var/log/postgresql/SSH /var/log/auth.log/var/log/secure Package Management
Debian / Ubuntu (apt)
# Update and install
sudo apt update
sudo apt install package-name
# Fix broken dependencies
sudo apt --fix-broken install
sudo dpkg --configure -a
# Remove package cleanly
sudo apt remove package-name
sudo apt purge package-name # also removes config files
sudo apt autoremove # remove unused deps
# If apt is locked
sudo lsof /var/lib/dpkg/lock-frontend
# Kill the locking process or wait
# If stuck after crash:
sudo rm /var/lib/dpkg/lock-frontend
sudo rm /var/lib/apt/lists/lock
sudo dpkg --configure -a
# Find what package owns a file
dpkg -S /usr/bin/nginx
# List files in a package
dpkg -L nginx
# Check package info
apt show nginx
apt-cache policy nginx # shows installed vs available version RHEL / AlmaLinux / Fedora (dnf/yum)
sudo dnf install package-name
sudo dnf remove package-name
sudo dnf update
sudo dnf clean all
# Find what package owns a file
rpm -qf /usr/bin/nginx
dnf provides /usr/bin/nginx
# List files in package
rpm -ql nginx Network Interfaces -- Full Setup Reference
netplan (Ubuntu 20.04+)
sudo nano /etc/netplan/01-netcfg.yamlnetwork:
version: 2
renderer: networkd
ethernets:
eth0:
dhcp4: true
nameservers:
addresses: [8.8.8.8, 1.1.1.1]sudo netplan apply
sudo netplan try # apply with automatic rollback after 120s NetworkManager
# List connections
nmcli con show
# Connect to WiFi
nmcli device wifi connect "SSID" password "password"
# Edit DNS on a connection
sudo nmcli con mod "eth0" ipv4.dns "8.8.8.8 1.1.1.1"
sudo nmcli con up "eth0"
# Set static IP
sudo nmcli con mod "eth0" ipv4.method manual \
ipv4.addresses "192.168.1.100/24" \
ipv4.gateway "192.168.1.1"
sudo nmcli con up "eth0" Quick Reference: Symptom to Command
System is Slow
top # 1. check CPU and load
free -h # 2. check memory -- is available near zero?
vmstat 1 5 # 3. check si/so (swap) and wa (I/O wait)
iostat -x 1 3 # 4. check disk -- %util column
df -h # 5. check disk space Cannot Reach a Service
ping server-ip # 1. basic connectivity
ping hostname # 2. if fails = DNS issue
nslookup hostname # 3. DNS debug
sudo ss -tlnp | grep :port # 4. is service listening?
curl -v http://server:port # 5. detailed HTTP test
sudo iptables -L -n | head # 6. firewall check Service Won't Start
systemctl status service # 1. status and recent log lines
journalctl -u service -n 50 # 2. more logs
systemctl cat service # 3. unit file
journalctl -b -p err # 4. all boot errors Cannot Write File
df -h . # 1. disk full?
df -i . # 2. inodes exhausted?
ls -la file # 3. check permissions and owner
lsattr file # 4. immutable flag?
id # 5. who am I? Something Used All Disk Space
df -h # which filesystem?
sudo du -sh /var/log/* | sort -rh | head # logs?
sudo lsof +L1 # deleted files still open?
sudo find / -size +500M -type f 2>/dev/null # big files? Essential Command Reference
System Monitoring
Command What it shows topProcesses, CPU, memory live htopSame but better uptimeLoad averages vmstat 1CPU, memory, swap, I/O overview mpstat -P ALL 1Per-core CPU usage free -hMemory and swap iostat -x 1Disk I/O per device iotop -oDisk I/O per process dstatEverything at once Process
Command What it does ps auxAll processes pgrep -a nameFind PID by name lsof -p <PID>Files open by process strace -p <PID>System calls made by process cat /proc/<PID>/statusDetailed process info kill <PID>Graceful stop kill -9 <PID>Force kill renice +10 <PID>Lower process priority ionice -c 3 -p <PID>Lower I/O priority Disk and Filesystem
Command What it does df -hFilesystem usage df -iInode usage du -sh /pathDirectory size lsblk -fBlock devices with filesystems blkidUUIDs and filesystem types mount / umountMount/unmount fsck /dev/sda1Filesystem check smartctl -H /dev/sdaDisk health check lsof +L1Deleted files still open Network
Command What it does ip aIP addresses ip rRoutes ip linkInterface status ss -tulnpListening ports ping / tracerouteConnectivity dig / nslookupDNS lookup tcpdump -i eth0Packet capture nc -zv host portPort test mtr hostContinuous traceroute Logs
Command What it does journalctl -xeRecent errors with context journalctl -u svcService logs journalctl -b -p errBoot errors dmesg -TKernel messages with timestamp tail -f /var/log/syslogFollow syslog grep -i error /var/log/app.logSearch logs Services
Command What it does systemctl status svcStatus + recent logs systemctl start/stop/restart svcControl service systemctl enable/disable svcBoot behavior systemctl --failedAll failed units systemd-analyze blameSlowest startup services Production Debugging Tips
Resources