Senior Site Reliability Engineer (SRE / Infrastructure)
Role Overview
We’re hiring a Senior SRE to build and scale the infrastructure behind a high-growth, production system. You’ll ensure reliability, performance, and scalability as the platform grows from early traction to large-scale usage.
This role focuses on designing resilient systems, improving observability, and automating operations so engineering teams can move quickly and safely.
What You’ll Do
Own reliability, scalability, and performance of production systems
Build and manage cloud infrastructure (primarily AWS/GCP + Linux)
Design and operate Kubernetes clusters and containerized workloads
Improve CI/CD pipelines and deployment workflows
Lead incident response, on-call practices, and root cause analysis
Build observability systems (monitoring, logging, alerting)
Partner with engineers to design resilient systems (databases, pipelines, async systems)
Automate infrastructure and operational workflows using IaC
Requirements
5+ years in SRE, DevOps, or infrastructure-focused engineering
Strong experience with cloud platforms (AWS/GCP) and Infrastructure as Code (e.g., Terraform)
Production experience with Kubernetes
Experience with monitoring/observability tools (e.g., Prometheus, ELK, Datadog)
Strong understanding of distributed systems, networking, and reliability best practices
Comfortable coding/scripting (e.g., Python, Go, or similar)
Nice to Have
Experience scaling high-availability systems
Familiarity with CI/CD and modern deployment strategies (canary, blue/green)
Background in data pipelines, async systems, or large-scale applications
Exposure to Go, Rust, C++, or TypeScript
Interest in applying AI to infrastructure or operations
If you want this even tighter (like a LinkedIn post or a 6-line “we only want killers” version), I can compress it further—but this is about as short as you can go without losing signal.
Seniority level
Mid-Senior level
Employment type
Full-time
Job function
Information Technology
Industries
Staffing and Recruiting
Referrals increase your chances of interviewing at The Cypress Group by 2x