DevOps and SRE Engineer
5+ Years
Kubernetes (EKS/GKE/AKS), Terraform, Helm, GitOps, CI/CD Pipelines, Prometheus, Grafana, OpenTelemetry, Cloud Security, Networking
Full Time
Remote
Job Description
The DevOps and SRE Engineer will be responsible for building and maintaining highly reliable, scalable, and cost-efficient cloud infrastructure. This role requires expertise in Kubernetes administration, Infrastructure as Code (IaC), observability, and CI/CD automation. The ideal candidate will ensure error budgets, SLOs, and cloud cost optimization are achieved without compromising reliability, while driving an automation-first approach across environments.
Responsibilities
- Design and maintain CI/CD pipelines with progressive delivery.
- Operate and scale EKS, GKE, or AKS clusters with strong multi-tenancy.
- Instrument systems using Prometheus, Grafana, and OpenTelemetry.
- Run incident response, postmortems, and capacity planning.
- Harden networking, IAM, and secret management.
- Deliver automated, repeatable environments using GitOps and IaC.
- Ensure clear SLOs with meaningful alerting and manage error budgets.
- Drive cloud cost efficiency per transaction while maintaining reliability.
Requirements
- Bachelor’s degree in Computer Science or related field.
- 5+ years of experience in DevOps/SRE roles supporting high-availability SaaS.
- Proven expertise in Kubernetes administration (EKS, GKE, or AKS).
- Strong experience with Terraform, Helm, and GitOps pipelines.
- Skilled in CI/CD pipeline design and maintenance.
- Knowledge of monitoring, alerting, and logging (Prometheus, Grafana, OpenTelemetry).
- Strong fundamentals in cloud networking and security.
- Calm, methodical, and automation-first mindset.
Nice to Haves
- Experience in cloud cost optimization, including AI workloads.
- Familiarity with multi-tenant SaaS environments.
- Hands-on practice with disaster recovery, backup, and restore drills.
Knowledge of compliance frameworks (SOC 2, ISO, FedRAMP).