Vimo

Senior Site Reliability Engineer

Vimo California, United States

Save

Vimo provided pay range

This range is provided by Vimo. Your actual pay will be based on your skills and experience — talk with your recruiter to learn more.

Base pay range

$150,000.00/yr - $200,000.00/yr

Direct message the job poster from Vimo

Vimo® started as the “Expedia” of health insurance and has evolved into a leader in transforming government IT infrastructure with its proven SaaS and AI technology. Our innovative approach to health insurance shopping and enrollment has expanded beyond exchanges, and we are now reinventing how states administer safety net programs such as Medicaid, SNAP (food stamps), child care, and unemployment insurance. With our cutting-edge technology, we are helping agencies serve more people, faster, and transforming healthcare service delivery as we know it.


We are looking for a Senior Site Reliability Engineer (SRE) to join our Vimo team.


About The Role

As a Senior Site Reliability Engineer, you will be at the intersection of software engineering and systems operations, ensuring that Vimo’s production platform is reliable, performant, and scalable. Our systems power health insurance exchanges, Medicaid enrollment, and other safety-net programs for state governments—meaning the services you keep running directly affect millions of people’s access to critical benefits. You will design and build automation, define and enforce SLOs, respond to and learn from incidents, and continuously reduce operational toil. You’ll work closely with development, infrastructure, and security teams to embed reliability into every layer of the stack.


Responsibilities

  • Design, build, and maintain the tooling, automation, and infrastructure that keeps Vimo’s production services highly available and performant.
  • Define, implement, and monitor Service Level Objectives (SLOs), Service Level Indicators (SLIs), and error budgets for critical services.
  • Build and improve CI/CD pipelines to enable safe, fast, and repeatable deployments with automated rollback capabilities.
  • Develop and operate comprehensive observability solutions—monitoring, logging, tracing, and alerting—using tools such as Loki, Prometheus, Grafana, ELK/OpenSearch, and PagerDuty.
  • Lead incident response efforts: triage production issues in real time, coordinate cross-team resolution, and author thorough blameless postmortems with actionable follow-ups.
  • Identify and eliminate toil through automation; build self-healing mechanisms and runbook-driven remediation.
  • Manage and optimize cloud infrastructure on AWS (EC2, EKS, RDS, S3, VPC, CloudFront, Route 53, Lambda) with a focus on automation, cost efficiency and security.
  • Implement and maintain infrastructure-as-code using Terraform, and manage container orchestration with Kubernetes/EKS.
  • Perform capacity planning and load testing to ensure systems can handle peak enrollment periods and traffic surges.
  • Collaborate with application engineering teams on architecture reviews, resilience patterns (circuit breakers, retries, graceful degradation), and production readiness reviews.
  • Contribute to disaster recovery planning and testing, including automated failover and multi-region strategies.
  • Support compliance and security requirements (HIPAA, FedRAMP, SOC 2) by ensuring infrastructure controls are in place and auditable.
  • Participate in a 24/7 on-call rotation and continuously improve on-call processes to reduce alert fatigue and mean time to resolution (MTTR).
  • Qualifications

    Basic Qualifications/Skills

    • Bachelor’s degree in Computer Science, Engineering, or a related field, or equivalent practical experience.
    • 5+ years of experience in Site Reliability Engineering, DevOps, Infrastructure Engineering, or a related systems-focused role.
    • Strong software engineering skills in at least one language (Python, Go, Java, or Bash) with the ability to write production-quality automation and tooling.
    • Deep hands-on experience with AWS cloud services (EC2, EKS, RDS, S3, VPC, IAM, Lambda, CloudWatch).
    • Proficiency with container technologies (Docker) and orchestration platforms (Kubernetes/EKS).
    • Solid experience with infrastructure-as-code tools, particularly Terraform.
    • Strong understanding of CI/CD principles and tools (Jenkins, GitLab CI, GitHub Actions, ArgoCD, or similar).
    • Experience with observability and monitoring platforms (Loki, Prometheus, VictoriaMetrics, Grafana, ELK/OpenSearch, PagerDuty).
    • Solid understanding of networking fundamentals: TCP/IP, DNS, load balancing, CDN, TLS/SSL, and firewall configuration.
    • Experience with incident management processes, on-call rotations, and blameless postmortem culture.
    • Strong Linux/Unix systems administration and troubleshooting skills.

    Preferred Qualifications/Skills

    • Experience in healthcare technology, government IT, or benefits administration platforms.
    • Familiarity with compliance frameworks such as HIPAA, FedRAMP, or SOC 2 and their impact on infrastructure operations.
    • Experience with PostgreSQL, mongo, mysql administration, performance tuning, and high-availability configurations.
    • Hands-on experience with chaos engineering practices and tools (Gremlin, Litmus, or equivalent).
    • Experience with GitOps workflows and tools (ArgoCD, Flux).
    • Knowledge of service mesh technologies (Istio, Linkerd) and API gateway patterns.
    • Experience with configuration management tools (Ansible, Chef, or Puppet).
    • Familiarity with FinOps principles and cloud cost optimization strategies.
    • Experience with load testing and performance benchmarking tools (k6, Locust, JMeter).
    • Strong Plus:
    • AWS certifications (Solutions Architect, DevOps Engineer, or SysOps Administrator)
    • Experience with AWS security implementation through IaC / automation


    Compensation and Benefits

    Competitive compensation - All In range of ($150,000-$200,000). (Please note that compensation may vary based on factors such as skills, experience, performance and location.)

    We offer a comprehensive benefits package, including but not limited to:

    • Health, Dental, Life, Disability, and Vision insurance
    • Healthcare spending or reimbursement accounts (HSA/FSA)
    • Retirement benefits (401k)
    • Paid time off
    • Holidays: 13 paid days per year
    • Education assistance or tuition reimbursement
    • Employee discounts for Gym memberships & commuting/travel assistance


    Our Values

    • We believe that working hard, when it is imbued with purpose, can and should be fun.
    • You'll find we are a "can do" place where people work together and roll up their sleeves to get the job done.
    • Everyone has a voice; everyone's ideas count, and everyone is respected.
    • We have built a company, as well as a community of friends and colleagues, with respect for each other.
    • Seniority level

      Mid-Senior level
    • Employment type

      Full-time
    • Job function

      Information Technology, Strategy/Planning, and Project Management
    • Industries

      Technology, Information and Internet, IT Services and IT Consulting, and Insurance

    Referrals increase your chances of interviewing at Vimo by 2x

    See who you know

    Get notified about new Site Reliability Engineer jobs in California, United States.

    Sign in to create job alert

    Similar jobs

    People also viewed

    Similar Searches

    Explore top content on LinkedIn

    Find curated posts and insights for relevant topics all in one place.

    View top content