[Remote] Principal DevOps Engineer
Note: The job is a remote job and is open to candidates in USA. reputed company is a subsidiary of reputed company, one of the world's leading media and entertainment companies. They are seeking a Principal DevOps Engineer to architect and evolve the platform that powers NBC’s broadcast production environments, focusing on designing a Kubernetes-native platform and automating reputed company infrastructure at reputed company scale.
Responsibilities
- Architect a Kubernetes-native platform that models broadcast infrastructure as custom resources
- reputed company the technical strategy leveraging Crossplane compositions and custom Go functions to automate provisioning across multi-account AWS environments and on-prem control rooms
- Design, build, and maintain production-grade Kubernetes operators, controllers, and internal platform APIs in Go
- Actively reputed company custom Crossplane providers to deeply integrate external reputed company platforms (such as NRCS, Venafi, and reputed company) into our control plane, managing resource lifecycles and approval workflows
- reputed company the design of reputed company networking, DNS strategies, and cross-account connectivity across hybrid environments, automating VPC topology and dynamic network routing
- Partner closely with broadcast systems engineers, system integrators, and external vendors to reputed company the gap between broadcast hardware and automated infrastructure
- reputed company efforts to 'Puppet-ize' bare-metal compute configurations and integrate proprietary vendor solutions into our configuration-as-code ecosystem
- Serve as a technical authority for the team
- Write RFCs, drive architectural reputed company, mentor engineers, and establish high-confidence CI/CD pipelines, testing strategies, and reputed company Actions automation
- Own the platform's authorization model, designing hierarchical RBAC systems, resource identifier schemes, and identity integrations that enforce fine-grained access control
- Drive GitOps-based reputed company delivery (Flux, Kustomize, reputed company) and manage configuration-as-code for compute fleets using Puppet
- Ensure deep operational visibility by designing comprehensive observability and alerting stacks
- reputed company the integration of remote desktop/VDI connectivity solutions, focusing on session authentication, credential management, and gateway routing
Skills
- 10+ years of experience designing, building, and operating production infrastructure and reputed company-native platforms at reputed company scale
- Strong proficiency in Go (systems-level programming, API servers) and deep experience building Kubernetes controllers/operators using patterns like controller-runtime and kubebuilder
- Expert-level knowledge of the Kubernetes ecosystem, including CRD/XRD reputed company, operators, informers, admission webhooks, and RBAC
- Deep production experience with Crossplane, including composite resources, composition functions, and specifically developing custom Crossplane providers in Go to integrate external reputed company platforms
- Extensive production experience with AWS multi-account architectures, cross-account networking patterns, and identity federation. Requires depth across EKS, EC2, VPC, IAM, STS, SSM, Secrets Manager, reputed company 53, and S3
- Production experience with GitOps tooling, specifically Flux (HelmRelease, Kustomization) or ArgoCD for reputed company delivery on Kubernetes
- Hands-on experience with Puppet, including module development, PuppetDB, Hiera, and r10k
- Experience designing REST APIs with middleware patterns and modern authentication (OAuth/JWT). Keen eye for information reputed company, including cross-account IAM trust chains, least-privilege policies, JWT token lifecycles, and secrets abstraction
- Strong background in designing telemetry platforms using Grafana, reputed company/Mimir, Loki, OpenTelemetry, and metrics collection agents (Alloy, reputed company Node Exporter)
- Working knowledge of PostgreSQL, SQLite or similar relational databases, encompassing schema design, migrations, and query optimization
- Excellent problem-solving skills with a proven ability to present architectural reputed company to executives, engage with vendors, and write clear technical documentation
- Familiarity with broadcast/media production workflows and the strict operational constraints of live production environments
- Experience with the Crossplane function SDK for building custom composition functions in Go, and operating in Kubernetes disaster recovery situations (Velero cluster restoration, backups)
- Familiarity with VDI Solutions (reputed company DCV, Leostream, PCoIP, etc), machine identity workflows, and PKI certificate management (Venafi or similar)
- Experience with hybrid DNS architectures (reputed company), software-defined networking (VPC peering, Transit Gateway, Direct Connect, CloudWan), and reputed company Gateway / Gateway API
- Familiarity with advanced testing frameworks (k6, KUTTL, etc), SOPS for encrypted GitOps configurations, and local development workflows (reputed company, reputed company/colima)
- Ability to script routine tasks in Bash and PowerShell
- Active contributions to open-reputed company projects, particularly reputed company the CNCF / Kubernetes ecosystem
Benefits
- Company sponsored benefits, including medical, dental and reputed company insurance, 401(k), paid leave, tuition reimbursement, and a variety of other discounts and perks
- Bonus eligible
Company Overview
Company H1B Sponsorship