Senior DevOps Engineer - AI Healthcare Leader

Curb

Curb

Software Engineering, Data Science

New York, NY, USA

Posted on May 29, 2026
Staff DevOps Engineer — Cloud Infrastructure, Kubernetes & AI Platform Operations

This opportunity is with a client of Andiamo, an innovative healthcare technology organization building AI-driven digital platforms that support patients, providers, and enterprise healthcare systems at scale.

About The Opportunity

We are seeking a highly experienced Staff DevOps Engineer to help lead the evolution of a modern cloud infrastructure environment powering mission-critical healthcare and AI applications. This is a senior-level engineering role designed for someone who thrives in complex distributed systems, enjoys solving large-scale operational challenges, and wants meaningful ownership over platform reliability, scalability, and infrastructure strategy.

You will play a central role in designing and operating cloud-native infrastructure across both internal platforms and enterprise partner environments. The ideal candidate combines deep Kubernetes expertise with strong cloud engineering capabilities, infrastructure-as-code experience, and a passion for building resilient, secure, and highly automated systems.

This role also offers the opportunity to work at the intersection of DevOps, AI infrastructure, platform reliability, and healthcare technology in a highly collaborative and fast-moving environment.

What You’ll Be Responsible For

Cloud Infrastructure & Platform Engineering

Lead the design, implementation, and ongoing optimization of Kubernetes-based infrastructure environments supporting large-scale production applications and enterprise integrations.

Architect and maintain cloud-native systems across multi-cloud environments, ensuring scalability, reliability, security, and operational efficiency.

Develop and enhance reusable infrastructure-as-code modules using Terraform across cloud providers and supporting services.

Drive improvements to deployment pipelines, automation frameworks, and platform tooling that enable engineering teams to ship software efficiently and safely.

CI/CD, Automation & Developer Enablement

Design and maintain enterprise-grade CI/CD workflows and reusable pipeline frameworks that support secure and scalable software delivery.

Support GitOps-based deployment strategies and operational workflows across engineering teams.

Own and maintain critical infrastructure services running within Kubernetes environments, including deployment automation, ingress systems, observability tooling, and operational support platforms.

Continuously improve developer productivity, deployment reliability, and operational visibility through automation and platform enhancements.

Security, Compliance & Reliability

Implement and support infrastructure security controls, secrets management strategies, container security scanning, and software supply chain protections.

Partner with internal teams to support compliance initiatives aligned to regulated environments including healthcare and security-focused operational standards.

Lead disaster recovery readiness initiatives including failover testing, operational runbooks, resiliency planning, and recovery validation exercises.

Monitor, troubleshoot, and improve production reliability while participating in operational incident response and daytime on-call rotations.

AI Infrastructure & Operational Innovation

Contribute to the development of next-generation AI-powered operational tooling and intelligent infrastructure automation.

Help evaluate and implement emerging technologies that improve observability, operational scalability, and platform intelligence.

Support environments involving AI workloads, high-performance infrastructure, and advanced cloud orchestration patterns.

Leadership & Cross-Functional Collaboration

Mentor engineers across the DevOps and infrastructure organization while helping establish operational standards and engineering best practices.

Partner closely with software engineering, security, product, and platform teams to drive infrastructure initiatives and long-term technical strategy.

Provide technical leadership on complex platform projects spanning cloud architecture, reliability engineering, automation, and enterprise integrations.

What You Bring

Required Qualifications

  • 5+ years of experience in DevOps, Platform Engineering, or Site Reliability Engineering
  • Deep expertise with Kubernetes and cloud-native operational tooling
  • Strong hands-on experience with Helm, ArgoCD, Helmfile, cert-manager, Kyverno, NGINX Ingress, and related Kubernetes ecosystem technologies
  • Extensive experience designing and operating infrastructure on Google Cloud Platform including GKE, IAM, Cloud SQL, storage services, and identity management
  • Advanced Terraform experience including modular infrastructure design, multi-environment deployments, and infrastructure testing practices
  • Strong experience with GitLab CI/CD pipelines, GitOps methodologies, and deployment automation
  • Programming proficiency in Python and/or Go
  • Experience supporting infrastructure security, secrets management, and compliance-focused operational environments
  • Strong troubleshooting, monitoring, and production operations experience
  • Ability to lead complex infrastructure initiatives across multiple engineering teams

Preferred Qualifications

  • Advanced scripting experience using Bash or similar tooling
  • Experience with Vault, Akeyless, or enterprise secrets management platforms
  • Operational experience with PostgreSQL, Redis, or MongoDB administration and disaster recovery planning
  • Experience with observability and monitoring platforms such as Datadog
  • Hands-on experience managing Cloudflare services including DNS, CDN, and security policies
  • Experience designing and executing disaster recovery and failover testing programs
  • Background working in highly regulated environments including HIPAA or SOC2
  • Experience with GPU clusters, HPC infrastructure, or AI-focused operational environments
  • Familiarity with AI agents, intelligent automation tooling, or agentic infrastructure systems
  • Experience with AWS and hybrid cloud environments
  • Strong communication and mentorship skills

Why This Role Is Unique

This position offers the opportunity to work on highly scalable cloud infrastructure supporting AI-powered healthcare systems with real-world impact.

You’ll help shape the operational foundation of modern healthcare technology platforms while working on challenging problems involving Kubernetes, cloud reliability, security, automation, AI infrastructure, and enterprise-scale DevOps practices.

You’ll also have significant ownership, direct technical influence, and the ability to help define the next generation of platform engineering standards inside a rapidly evolving technology environment.

Work Environment & Benefits

  • Hybrid work model based in New York City with collaborative in-office engagement
  • Competitive compensation package including salary, equity, and comprehensive healthcare benefits
  • 401(k) program and commuter benefits
  • Paid parental leave
  • Generous PTO, company holidays, sick time, and personal days
  • Collaborative team culture with regular company events and social programming
  • Opportunities for technical growth, mentorship, and long-term career advancement

If you are passionate about cloud infrastructure, operational excellence, Kubernetes ecosystems, and building resilient systems that power meaningful healthcare innovation, this role offers the chance to make a significant technical and organizational impact.

About Andiamo

Talent Partners for the AI Revolution. As a globally recognized staffing and consulting firm, we specialize in placing the top 2% of technology and go-to-market professionals with the world’s largest and most well-known companies.

For over 20 years, we've maintained the status of tier-one vendor for firms such as Palantir, Amazon, Fluidstack, Bloomberg, Relativity Space, Firefly, MasterCard, Visa, Two Sigma, Citadel, as well as other major financial services firms, elite hedge funds, Google-backed tech start-ups, and major software firms.

Our talent solutions include Permanent Placement, Contract Staffing, Executive Search, and Dedicated Recruiting Services (RPO). Find out more at www.andiamogo.com