Back to Jobs

Site Reliability Engineer (m/f/d)

Remote, USA Full-time Posted 2026-06-06

About the position As a Site Reliability Engineer in our Platform Squad, you will be a key player in keeping Flip's infrastructure fast, resilient and ready to scale. You'll shape the reliability culture, tooling and practices that allow our engineering teams to ship with confidence - at scale and without compromising availability. This role is perfect for an engineer who is passionate about building high-throughput, highly available systems and who wants to shape how a fast-growing SaaS platform runs in production.

Responsibilities

  • Further expand and optimize our cloud infrastructure on Azure and our Kubernetes clusters - designed for high throughput and highest availability - to support Flip's rapid growth across the globe.
  • Design and implement zero-downtime deployments, rollback mechanisms and disaster-recovery strategies that keep our platform available around the clock.
  • Evolve our LGTM stack (Loki, Grafana, Tempo, Mimir) to give every team the visibility they need - and use it to define and optimize our SLOs.
  • Design, develop and optimize infrastructure as code with Pulumi in Go, eliminating toil and making our platform self-service for engineering teams.
  • Promote CI/CD best practices, incident management, post-mortems and developer experience across the entire engineering organization.
  • Collaborate with your squad and engineering leadership to define the platform's direction - from scalable, high-throughput systems and cost optimization to security posture and compliance.

Requirements

  • 1–3 years of hands-on experience as a Site Reliability Engineer (SRE), Platform Engineer, DevOps Engineer, Infrastructure Engineer, Cloud Engineer, or Backend Engineer with a strong infrastructure focus.
  • Experience operating and scaling cloud infrastructures (Azure, GCP, AWS).
  • Deep knowledge of Kubernetes and container orchestration in production environments.
  • Hands-on experience with modern observability stacks (e.g. Prometheus, Mimir, Loki, ELK) and comfortable defining and operating SLOs and error budgets.
  • Solid software development skills in Go (preferred, since our IaC runs on Pulumi in Go), Python or Kotlin.
  • Hands-on experience with infrastructure as code (e.g. Pulumi, OpenTofu, Terraform) and configuration tooling (e.g. Ansible, Chef).
  • A collaborative mindset, strong communication skills and business-fluent English.
  • Willingness to participate in on-call rotations to ensure the reliability of our platform.

Nice-to-haves

  • Experience building and operating high-throughput, highly available systems in production.
  • Experience with Azure Kubernetes Service (AKS) specifically.
  • Experience with Kubernetes Gateway API and Envoy Gateway.
  • Familiarity with GitOps workflows and CI/CD pipeline design.
  • Knowledge of service mesh technologies (e.g. Linkerd, Istio).
  • Experience with Kubernetes Operators (e.g. Strimzi, CNPG)
  • Experience with operating High-Availability PostgreSQL

Benefits

  • Flexibility to work from home
  • Occasional team events, workshops, or meetings in our Berlin or Stuttgart offices
  • Costs of your E-Gym-Wellpass membership covered
  • Job bike leasing
  • Regular team events and culture days
  • Opportunity to work abroad in the European Union

Apply tot his job Apply To this Job

Similar Jobs

Site Reliability Engineering, Automation and Orchestration Engineer

Remote, USA Full-time

Site Reliability Engineer (m/f/d)

Remote, USA Full-time

Lead Site Reliability Engineer - Infrastructure

Remote, USA Full-time

Senior Site Reliability Engineer, AI Factory

Remote, USA Full-time

Site Reliability Engineer

Remote, USA Full-time

Senior Site Reliability Engineer, Fleet Management

Remote, USA Full-time

Azure Site Reliability Engineer (W2 Only / No C2C)

Remote, USA Full-time

Sr Site Reliability Engineer, Operations (US Federal)

Remote, USA Full-time

Senior Site Reliability Engineer (AWS, AI/ML, & APM)

Remote, USA Full-time

Cloud Site Reliability Engineer

Remote, USA Full-time

Experienced Operations Support Senior Manager – Strategic Planning and Execution

Remote, USA Full-time

Experienced Customer Service Representative - National Remote

Remote, USA Full-time

Experienced Customer Service Representative – Weekend Remote Support for arenaflex

Remote, USA Full-time

Part-Time Remote Customer Support Specialist - Leading E-Commerce Pet Products Company | arenaflex

Remote, USA Full-time

Chief of Staff, Marketing

Remote, USA Full-time

Experienced Customer Service Representative – Delivering Exceptional Experiences at arenaflex

Remote, USA Full-time

Real Estate Acquisitions Specialist (Inbound Leads / Remote)

Remote, USA Full-time

Experienced Online Live Chat Manager – Customer Service and Operations Leadership

Remote, USA Full-time

Experienced Entry-Level Data Entry Specialist – Remote Work Opportunities at arenaflex

Remote, USA Full-time

Technical Support / Customer Service Representative (Remote) – Join arenaflex's Global Team

Remote, USA Full-time