🖥️ Cheat Sheet for SysAdmins: The Ultimate Daily Checklist Being a System Administrator is like being the unseen pilot of a jet — everything must run smoothly, securely, and efficiently while users barely notice. ✈️ But with multiple servers, logs, security alerts, and users, it’s easy to miss something critical. That’s where a Daily SysAdmin Checklist comes in handy. 📖 Why a Checklist Matters Ensures consistency across operations Helps catch issues early before they become outages Acts as a time-saver for prioritizing tasks Builds a habit of discipline in managing critical systems ✅ SysAdmin Daily Checklist 🔹 1. System Health Check CPU, memory, and disk usage (top, htop, df -h) Verify system load averages and uptime 🔹 2. Logs Review Scan /var/log/ for unusual activity Look for repeated failed login attempts or service errors 🔹 3. Security Checks Verify firewall rules are intact Ensure intrusion detection alerts (IDS/IPS) are reviewed Run last and who to track user logins 🔹 4. Services & Processes Confirm critical services (DB, web server, etc.) are running Restart failed services immediately 🔹 5. Backups Check if scheduled backups completed successfully Test restore on a sample file or database weekly 🔹 6. Updates & Patches Monitor pending OS and application updates Schedule patching during maintenance windows 🔹 7. Networking Check connectivity (ping, traceroute) Monitor bandwidth usage and abnormal spikes 🔹 8. Automation Jobs Review cron jobs and automation scripts for success/failure Fix any misfired or skipped jobs ⚡ Pro Tips & Tricks 💡 Use monitoring tools like Nagios, Zabbix, or Prometheus for proactive alerts. 💡 Automate log checks with logwatch or SIEM tools. 💡 Maintain a runbook so your team has documented procedures. 💡 Tag systems with priority levels — critical, staging, dev — for smarter monitoring. 🚀 Quick Guide: Handy Commands # Check system health top df -h uptime # Review logs tail -f /var/log/syslog # Check failed login attempts grep "Failed password" /var/log/auth.log 🔎 Takeaway: Being a SysAdmin isn’t just about fixing issues — it’s about building a safety net of habits. A structured daily checklist helps you stay proactive, reduce downtime, and keep systems secure. 💬 Do you follow a daily SysAdmin routine? What’s the first thing you check every morning? Share your tips below ⬇️ 🧠 Read more: https://lnkd.in/g3iFbzdg #SysAdmin #DevOps #Linux #BestPractices #ITOps
SysAdmin Daily Checklist: A Guide to Smooth Operations
More Relevant Posts
-
Windows Server Scenario based Q&A - Disaster Recovery and Backup related 👉 Scenario: “Your Hyper-V host cluster is running out of storage, and one VM is critical. How do you migrate it with minimal downtime?” Answer / Approach: ✅ Use Shared Nothing Live Migration or Live Migration if storage is clustered and supported. ✅ If cluster is in place, move VM to another host with enough free storage. ✅ Alternatively, use Storage Migration Service or Robocopy / Mirror / Backup / Restore in the background, then final cutover. ✅ Use snapshot or checkpoint (if allowed) to reduce final sync time. ✅ Ensure network bandwidth is adequate, monitor during migration. ✅ Validate VM functionality post-migration, then clean up old storage and rebalance. 👉 Scenario: “Cluster fails after restoring nodes from backup.” Approach : ✅Cluster metadata mismatch. ✅Must restore entire cluster, not individual nodes. 👉 Scenario: “Authoritative vs Non-Authoritative AD Restore — difference?” ✅Non-authoritative: AD restored, then updated by replication. ✅Authoritative: Specific objects restored and marked as newer, override replication. 👉Scenario: “DC restore causes USN rollback.” ✅Occurs when snapshot restored improperly. ✅Prevent by disabling DC snapshots. ✅Use proper backup/restore tools. 👉 Scenario: “Hyper-V host crash requires bare-metal recovery.” ✅Use Windows Server Backup or DPM. ✅Boot from recovery media → restore system state. 👉 Scenario: “Cluster fails after restoring nodes from backup.” ✅Cluster metadata mismatch. ✅Must restore entire cluster, not individual nodes. 👉Scenario: “SQL Server VM restored but AD authentication fails.” ✅Machine SID conflict. ✅Rejoin domain or reset computer account. 👉 Scenario: “VM checkpoint left running in production, now corruption.” ✅Never leave production VMs on checkpoints. Merge checkpoint using Hyper-V Manager. 👉 Scenario: “Backups too slow on file server.” ✅Enable VSS hardware provider. ✅Use backup over SMB Direct. ✅Deduplication-aware backup software. 👉 Scenario: “DFS namespace deleted accidentally. How to recover?” ✅Restore from AD DS backup (DFS namespace stored in AD). Or rebuild namespace manually from shares. 👉 Scenario: “Tape backup server can’t back up CSV volumes.” ✅Must use CSV-aware backup agent. Or backup via cluster node owner. 👉Scenario: “VM replica site disaster recovery test fails.” ✅Replication health not monitored. ✅Ensure regular test failover. Keep replica consistent via Hyper-V Replica Broker. #WindowsServer #InterviewPreparation #ScenarioBased #InterviewQ&A #Backup #DisasterRecovey
To view or add a comment, sign in
-
🛡️ SSH Cheat Sheet — Secure Connections, Keys & Config (Free Resource) 🧠💻 Whether you’re managing servers, automating deployments, or securing remote access — mastering SSH (Secure Shell) is essential for every security engineer, sysadmin, or DevOps professional. This visual SSH Cheat Sheet from Ethical Hackers Academy brings together the most-used commands and secure configuration practices — all in one place. 📘 What’s Inside: 🔹 SSH Connections → Connect to servers (default port 22 or custom) → Run remote scripts or compress/download data securely → Specify private keys for multi-host environments 🔹 SSH Keys → Generate and manage RSA keys → Copy keys to remote servers for passwordless access → Convert RSA keys to PPK for use with tools like PuTTY 🔹 SSH Configurations → Change the default port → Disable root & password logins → Enforce public key authentication → Restrict specific users & limit concurrent sessions → Enable logging & disable risky options like port forwarding 🔹 SCP (Secure Copy) → Transfer files between systems securely → Copy folders recursively → Use compression & verbose transfer mode for efficiency 💡 Why It’s Useful: ✅ Strengthens SSH hardening and remote access security ✅ Simplifies file transfer and automation ✅ Provides quick reference for daily operations Perfect for: ✔️ DevOps Engineers ✔️ System Administrators ✔️ Penetration Testers ✔️ Blue Team Analysts 📥 Want a copy of the “SSH Common Commands & Secure Config” Cheat Sheet? Drop a 🔐 in the comments or DM me — I’ll share the file. #SSH #CyberSecurity #Linux #DevOps #SystemAdmin #SecureShell #BlueTeam #EthicalHacking #ServerSecurity #InfoSec #CommandLine #NetworkSecurity #CheatSheet
To view or add a comment, sign in
-
-
Having backups does not guarantee successful recovery. A recent report revealed that 49% of companies failed to recover most of their servers after an incident. The real challenge is not storing data but restoring business operations quickly and accurately. Recovery processes often fail due to unknown dependencies and flawed orchestration. Discover the critical practices that can strengthen your backup and recovery strategy. #DataRecovery #BusinessContinuity
To view or add a comment, sign in
-
Backups are supposed to be our safety net. But what happens when the safety net has holes? Let me tell you one interesting story around backups: In 2017, a routine maintenance task at GitLab turned into a nightmare. A single command, intended for a harmless replica, accidentally struck the primary production database. Hours of critical user data - gone in seconds. “No problem,” you’d think. Just restore from backups. Except… the backups weren’t there. At least, not in the way anyone expected. One by one, their five backup methods revealed a brutal truth: • The main system hadn’t been syncing properly. • Others were outdated. • Some had silently failed without notice. What should have been a simple restore turned into the realization that six hours of user data were lost forever - issues, merge requests, accounts, comments, snippets… gone. The world saw GitLab’s public apology and postmortem, but behind it was a lesson many ignore: 👉 Backups that are never tested aren’t really backups at all. Do you also wonder why backups fail when we actually need them? I’ve written a detailed article on my Substack blog - covering best practices for backups and the common mistakes you must avoid. https://lnkd.in/gUUKUjdS
To view or add a comment, sign in
-
A new vulnerability, CVE-2025-39953, has been identified and resolved in the Linux kernel. This issue was related to the cgroup system, where a hung task could occur during specific testing scenarios. The problem arose when repeatedly mounting and unmounting certain controllers, leading to a hang during root destruction. This was a significant concern for those relying on Linux for critical operations. For security teams, this vulnerability highlights the importance of staying updated with kernel patches and understanding the intricacies of system operations. Business leaders should recognize the potential impact such vulnerabilities can have on system stability and, consequently, on business continuity. Everyday professionals using Linux systems should be aware of these updates to ensure their systems remain secure and efficient. The solution implemented involves splitting the cgroupdestroywq into three separate workqueues. This change helps manage different tasks more effectively, preventing the blocking issues that were previously occurring. By doing so, the system can handle CSS offline operations, resource release, and memory deallocation without interference, ensuring smoother operations. It's crucial for all professionals involved with Linux systems to apply these updates promptly. Regularly reviewing and updating system components can prevent potential disruptions and enhance overall security. How do you ensure your systems are always up-to-date with the latest security patches? https://lnkd.in/dxEGYgfq
To view or add a comment, sign in
-
⚙️ STOP Drag-and-Drop Nightmares: Your Secret Weapon for File Migration is Robocopy After receiving a flood of DMs about slow, failing file copies, it's time to talk about the tool built right into Windows that every SysAdmin, IT Pro, and DevOps engineer needs to master: Robocopy (Robust File Copy). If you've ever wished for a copy process that was faster, smarter, and could survive a network drop, Robocopy is your silent powerhouse. It handles massive folders, mirroring, and automated backups with zero drama. The 4 Essential Robocopy Commands & Use Cases 1. Basic Copy (The Starter) * Goal: To copy everything from a source to a destination, including all files and subfolders, even if they are empty. * Use Case: Initial one-time migrations of a directory structure where you need every file and folder in the source to exist in the destination. * Command: robocopy "C:\Source" "D:\Backup" /E (The /E switch is key here.) 2. Mirror Mode (The Synchronizer) * Goal: To make the destination exactly match the source. This is a perfect synchronization tool. * Use Case: Setting up disaster recovery sites or continuous backup jobs where files that are deleted in the source must also be deleted from the destination to save space and maintain accuracy. * Command: robocopy "C:\Source" "D:\Mirror" /MIR (The /MIR switch is powerful—use with caution!) 3. Incremental Backup (The Scheduler) * Goal: To speed up daily or scheduled backups by only copying files that are new or have been modified since the last run. * Use Case: Running a nightly backup job. You don't want to recopy GBs of data that haven't changed. The /XO switch skips older files, making the job lightning fast. * Command: robocopy "C:\Source" "D:\Backup" /XO (The /XO switch means "eXclude Older.") 4. Retry + Logging (The Peace of Mind Command) * Goal: To ensure copy reliability over an unstable network and create an audit log for compliance or troubleshooting. * Use Case: Migrating data over a WAN or network share where momentary disconnects are possible. This command will retry and record every action. * Command: robocopy "C:\Data" "E:\Backup" /R:3 /W:5 /LOG:BackupLog.txt (**R:3** = Retries 3 times; **W:5** = Waits 5 seconds; **LOG** creates the file.) ⚡ Pro Tip for Automation Don't run these commands manually! Integrate your Robocopy script with Windows Task Scheduler. This lets you handle those massive nightly or weekly migrations silently and on a set schedule. What's your go-to Robocopy switch for complex migrations? Share your favorite argument in the comments! 👇 #Robocopy #Windows #Automation #SysAdmin #DevOps #ITPro #TechTips #Scripting #DataMigration #Productivity
To view or add a comment, sign in
-
Fortify Your Code: 10 Security Practices Every Developer and Builder Must Get Right In today's complex software landscape, a "secure-by-design" approach is non-negotiable. Protecting confidentiality, integrity, and availability requires embedding security throughout the entire lifecycle. Here’s a concise guide to the top five practices for both developers and builders. For Developers: Building Security In 1. Application IAM: Move beyond reusable passwords. Enforce Multi-Factor Authentication (MFA), especially for privileged accounts. Leverage industry standards (OIDC, SAML, OAuth) to centralize access management and maintain detailed audit trails for all activity. 2. Code Repository Security: Protect your crown jewels. Use trusted repositories, enforce strict least-privilege access, and ensure rapid access revocation. Mandate code reviews and auditing workflows for sensitive changes, especially in production code. 3. Secrets Management: Never hardcode credentials. Eliminate default passwords and replace long-lived secrets with temporary ones. Implement a secure, centralized "secrets vault" for storage, rotation, and auditing of all API keys, database passwords, and tokens. 4. Open-Source Dependencies: Proactively manage risk. Maintain an inventory of all open-source components and their known vulnerabilities using automated tools. Define a clear process for patching based on risk and ensure license compliance to avoid legal issues. 5. Static Code Analysis: Find vulnerabilities before they go live. Integrate SAST tools (like AWS CodeGuru) into your CI/CD pipeline to automatically scan code. Triage results to prioritize high-impact fixes and maintain a managed vulnerability inventory. For Builders: Securing the Foundation 1. Infrastructure IAM: Guard the keys to the kingdom. Secure all infrastructure access with MFA and restrict logins to known corporate IPs. Implement a "break-glass" process for production and maintain comprehensive audit trails (e.g., with AWS CloudTrail). 2. CI/CD Pipeline Security: Protect your software supply chain. Apply least-privilege access to the pipeline, secure code integrations with signed commits, and never store secrets within the pipeline—use a dedicated secrets manager. Enable logging and approval workflows for production deployments. (Continue in 1st comment) By integrating these practices, we shift security left and build resilient systems by design. Let's commit to building securely from the first line of code to the final deployment. Transform Partner – Your Strategic Champion for Digital Transformation Image Source: Microsoft
To view or add a comment, sign in
-
-
Fortify Your Code: 10 Security Practices Every Developer and Builder Must Get Right In today's complex software landscape, a "secure-by-design" approach is non-negotiable. Protecting confidentiality, integrity, and availability requires embedding security throughout the entire lifecycle. Here’s a concise guide to the top five practices for both developers and builders. For Developers: Building Security In 1. Application IAM: Move beyond reusable passwords. Enforce Multi-Factor Authentication (MFA), especially for privileged accounts. Leverage industry standards (OIDC, SAML, OAuth) to centralize access management and maintain detailed audit trails for all activity. 2. Code Repository Security: Protect your crown jewels. Use trusted repositories, enforce strict least-privilege access, and ensure rapid access revocation. Mandate code reviews and auditing workflows for sensitive changes, especially in production code. 3. Secrets Management: Never hardcode credentials. Eliminate default passwords and replace long-lived secrets with temporary ones. Implement a secure, centralized "secrets vault" for storage, rotation, and auditing of all API keys, database passwords, and tokens. 4. Open-Source Dependencies: Proactively manage risk. Maintain an inventory of all open-source components and their known vulnerabilities using automated tools. Define a clear process for patching based on risk and ensure license compliance to avoid legal issues. 5. Static Code Analysis: Find vulnerabilities before they go live. Integrate SAST tools (like AWS CodeGuru) into your CI/CD pipeline to automatically scan code. Triage results to prioritize high-impact fixes and maintain a managed vulnerability inventory. For Builders: Securing the Foundation 1. Infrastructure IAM: Guard the keys to the kingdom. Secure all infrastructure access with MFA and restrict logins to known corporate IPs. Implement a "break-glass" process for production and maintain comprehensive audit trails (e.g., with AWS CloudTrail). 2. CI/CD Pipeline Security: Protect your software supply chain. Apply least-privilege access to the pipeline, secure code integrations with signed commits, and never store secrets within the pipeline—use a dedicated secrets manager. Enable logging and approval workflows for production deployments. (Continue in 1st comment) By integrating these practices, we shift security left and build resilient systems by design. Let's commit to building securely from the first line of code to the final deployment. Transform Partner – Your Strategic Champion for Digital Transformation Image Source: CSA
To view or add a comment, sign in
-
-
Advanced Linux and High Availability (HA) interview questions. 🔹1. Are you aware of the configuration of RAID? Yes. RAID (Redundant Array of Independent Disks) can be configured via: • Software RAID: Using mdadm tool (common in Linux). • Example: mdadm --create /dev/md0 --level=1 --raid-devices=2 /dev/sd[b-c] • Hardware RAID: Configured via RAID controller BIOS or vendor-specific tools. 🔹2. Upgrade RHEL 7.9 to RHEL 9.4 Direct in-place upgrade from RHEL 7.9 to 9.4 is not supported. You need to: • Backup everything. • Clean install RHEL 9.4. • Use tools like Leapp for supported upgrades (only 7 to 8 or 8 to 9). • Restore data/configurations post-install. 🔹3 & 4. App running on DC A, stopped due to power — how to run on DC B? Set up High Availability (HA) and Disaster Recovery (DR): • Use Cluster tools (e.g. Pacemaker + Corosync, Red Hat Cluster Suite). • Use Shared storage or replication (e.g., DRBD or rsync with cron). • Use VIP (Virtual IP) that floats between DC A and DC B. • Automate failover using heartbeat/stonith. 🔹5. Which HA tools have you worked on? Examples: • Pacemaker + Corosync – for resource failover. • Keepalived – for VIP management. • HAProxy – load balancing + failover. • DRBD – block-level replication. • GlusterFS – distributed filesystem. 🔹6. Server pings but SSH fails Common causes: • SSH service down: systemctl status sshd • Port blocked by firewall: firewalld or iptables • SSHD misconfig: Check /etc/ssh/sshd_config • Host key issue: Remove old entry from ~/.ssh/known_hosts • SELinux: May block access. 🔹7. df -kh hangs Reasons: • NFS mount is unresponsive • Dead disk/LUN or bad block • Stale mount • Use: df -khT or timeout df -kh • Use mount | grep -v '^/' or lsblk to narrow the cause. 🔹8. Do you know about Daemon services? Yes. • Daemons are background processes, often started at boot (e.g., sshd, httpd). • Managed using Systemd in RHEL7+. 🔹9. If Daemon service file is corrupted, how to recover? • Restore from backup. • Reinstall the package providing the service: rpm -qf /usr/lib/systemd/system/sshd.service # find package yum reinstall openssh-server • Or recreate manually with proper ExecStart. 🔹10. Troubleshooting daemon service • Check status: systemctl status <service> • Logs: journalctl -u <service> • Validate service file: systemd-analyze verify • Restart & test: systemctl restart <service> 🔹11. Types of special permissions • SUID (Set User ID): Executes with file owner’s permissions. • SGID (Set Group ID): Executes with file group’s permissions. • Sticky Bit: Restricts file deletion to owner (used in /tmp). 🔹12. What is UUID and GUID? • UUID (Universally Unique Identifier): 128-bit identifier, often used for identifying filesystems or disks. • GUID is essentially a Microsoft synonym for UUID. hashtag #happylearning hashtag #linux
To view or add a comment, sign in
-
Advanced Linux and High Availability (HA) interview questions. 🔹1. Are you aware of the configuration of RAID? Yes. RAID (Redundant Array of Independent Disks) can be configured via: • Software RAID: Using mdadm tool (common in Linux). • Example: mdadm --create /dev/md0 --level=1 --raid-devices=2 /dev/sd[b-c] • Hardware RAID: Configured via RAID controller BIOS or vendor-specific tools. 🔹2. Upgrade RHEL 7.9 to RHEL 9.4 Direct in-place upgrade from RHEL 7.9 to 9.4 is not supported. You need to: • Backup everything. • Clean install RHEL 9.4. • Use tools like Leapp for supported upgrades (only 7 to 8 or 8 to 9). • Restore data/configurations post-install. 🔹3 & 4. App running on DC A, stopped due to power — how to run on DC B? Set up High Availability (HA) and Disaster Recovery (DR): • Use Cluster tools (e.g. Pacemaker + Corosync, Red Hat Cluster Suite). • Use Shared storage or replication (e.g., DRBD or rsync with cron). • Use VIP (Virtual IP) that floats between DC A and DC B. • Automate failover using heartbeat/stonith. 🔹5. Which HA tools have you worked on? Examples: • Pacemaker + Corosync – for resource failover. • Keepalived – for VIP management. • HAProxy – load balancing + failover. • DRBD – block-level replication. • GlusterFS – distributed filesystem. 🔹6. Server pings but SSH fails Common causes: • SSH service down: systemctl status sshd • Port blocked by firewall: firewalld or iptables • SSHD misconfig: Check /etc/ssh/sshd_config • Host key issue: Remove old entry from ~/.ssh/known_hosts • SELinux: May block access. 🔹7. df -kh hangs Reasons: • NFS mount is unresponsive • Dead disk/LUN or bad block • Stale mount • Use: df -khT or timeout df -kh • Use mount | grep -v '^/' or lsblk to narrow the cause. 🔹8. Do you know about Daemon services? Yes. • Daemons are background processes, often started at boot (e.g., sshd, httpd). • Managed using Systemd in RHEL7+. 🔹9. If Daemon service file is corrupted, how to recover? • Restore from backup. • Reinstall the package providing the service: rpm -qf /usr/lib/systemd/system/sshd.service # find package yum reinstall openssh-server • Or recreate manually with proper ExecStart. 🔹10. Troubleshooting daemon service • Check status: systemctl status <service> • Logs: journalctl -u <service> • Validate service file: systemd-analyze verify • Restart & test: systemctl restart <service> 🔹11. Types of special permissions • SUID (Set User ID): Executes with file owner’s permissions. • SGID (Set Group ID): Executes with file group’s permissions. • Sticky Bit: Restricts file deletion to owner (used in /tmp). 🔹12. What is UUID and GUID? • UUID (Universally Unique Identifier): 128-bit identifier, often used for identifying filesystems or disks. • GUID is essentially a Microsoft synonym for UUID. hashtag #happylearning hashtag #linux
To view or add a comment, sign in