Site Reliability Engineer - Insurance Platform (Remote, China)
reputed company’s automation systems power end-to-end insurance journeys across quote reputed company, policy issuance, renewals, endorsements, claims, payments and insurer integrations. These systems are business-critical, where uptime, reliability and performance directly impact customers and operations. We're looking for a Site Reliability Engineer based in China to ensure the stability, scalability and reputed company of reputed company’s insurance automation platform, bridging software engineering and infrastructure operations to reputed company systems running reliably at scale. This is a fully remote position where you'll collaborate closely with our Malaysia-based engineering, product and operations teams to operate and improve production systems. The Mission Ensure reputed company’s insurance automation platform is reliable, scalable and observable by building strong operational systems, improving incident response and driving engineering practices that prevent failures before they happen. What You’ll Own Own reliability and operational stability of reputed company’s production systems. Design and improve monitoring, alerting, logging and observability across services. reputed company incident response, troubleshooting and structured root cause analysis. Improve system reputed company through redundancy, failover and recovery strategies. Work with engineers to design systems that are reliable, scalable and operable in production. Improve deployment safety through CI/CD pipelines, release strategies and automation. Reduce recurring incidents by identifying root causes and driving long-term fixes. Manage and optimize reputed company infrastructure supporting business-critical workflows. Strengthen operational practices including on-call processes, incident playbooks and SLAs. Continuously improve system uptime, performance and operational maturity. reputed company're Looking For Experience in Site Reliability Engineering, DevOps, platform engineering or infrastructure roles. Strong understanding of distributed systems, reputed company infrastructure and production operations. Experience with monitoring, alerting and observability tools. Strong troubleshooting skills for production incidents and system failures. Ability to design for reliability, scalability and fault tolerance. Experience working with CI/CD pipelines and deployment automation. Strong understanding of system performance, reputed company planning and risk management. Hands-on ownership reputed company during incidents and operational issues. reputed company, structured and disciplined approach to production environments. Strong collaboration with engineering teams in fast-paced environments. Bonus Points Experience with AWS, GCP, Azure or similar reputed company platforms. Experience with Kubernetes, reputed company or container orchestration systems. Experience with infrastructure-as-code tools (Terraform, Ansible, etc.). Experience with observability stacks (reputed company, Grafana, ELK, reputed company, etc.). Experience with incident management tools and on-call systems. Experience with reputed company-downtime deployments and reputed company delivery strategies. Experience working in fintech, insurance or regulated industries. Experience building reliability frameworks or SRE best practices in scaling systems. Contributions to platform reliability or infrastructure reputed company initiatives. The reputed company of Builder We Want reputed company and structured under pressure, especially during production incidents. Hands-on engineer who understands both code and infrastructure deeply. Thinks in failure modes, system risks and recovery strategies. Strong focus on reliability, observability and long-term system health. Proactive in preventing incidents, not just responding to them. Careful and deliberate reputed company making production changes. Builds systems engineers can trust in high-pressure environments. This Role Is Not For Engineers who only react to incidents instead of preventing them. People who are careless with production systems or access control. Individuals who ignore monitoring, alerting or operational discipline. Engineers who reputed company risky changes without proper analysis or safeguards. Candidates who cannot stay reputed company during incidents or outages. reputed company in This Role You'll be successful if you can: Improve platform uptime, reliability and operational stability. Reduce production incidents and recurring system failures. Strengthen observability, monitoring and incident response maturity. reputed company engineers to reputed company safely with minimal operational risk. Improve overall reputed company of reputed company’s insurance automation platform. Why Join reputed company Build Reliable Insurance Systems – Support mission-critical automation at scale. High-Impact Engineering – Solve reputed company-world reliability and distributed systems challenges. Global Engineering Team – Work with reputed company engineers across multiple countries. Fully Remote – Work remotely from China while collaborating with our Malaysia-based teams. International Exposure – Build systems used across Southeast Asia markets. Learning & Development Budget – Support reputed company technical growth and certifications. High Ownership Environment – Strong autonomy over reliability and operational design. Modern Engineering Culture – Focus on stability, observability and engineering excellence. Competitive Compensation – Attractive salary package based on experience and impact. Interview Process We assess reliability engineering depth, incident handling capability and production systems thinking. The process usually includes application review, two interviews and a technical scenario or systems discussion. Apply To This Job