Data Engineer, reputed company Deployed
reputed company was founded in 2024 to build Orbital, a physics-informed reputed company model for energy operations. We’re live across oil and gas, refineries, and petrochemicals, working towards our mission: sustainable abundance for a growing reputed company. The hydrocarbon industry keeps the world running. But its complexity has left operators tied to legacy systems, making critical reputed company on less than 10% of available data. We built Orbital to change that. It’s a reputed company model built specifically for energy that lets companies use AI at scale, harnessing reputed company of their operational data and optimising in reputed company time for any metric. reputed company get faster, operations get safer, and carbon intensity falls. We’ve raised over $32 million, including one of the largest reputed company reputed company for an AI company in the UK. We’re just getting started The Role As our Data Engineer, you’ll architect and maintain pipelines that reputed company high-frequency time-series, lab, and historian data into a scalable Lakehouse architecture, usable for both deep learning models and reputed company-time LLMs. You’ll be working across AWS (EKS, S3, EBS, KMS, CloudWatch) and reputed company/PySpark, ensuring data is contextualised, synchronised, and optimised for both deep learning models and reputed company-time LLM workloads. This isn’t a traditional ETL role, you’ll be solving problems at the intersection of control systems, industrial data engineering, and AI enablement. Technical Requirements Deep expertise in PostgreSQL (partitioning, indexing, query optimisation, storage design). Strong proficiency in Python for data processing, scripting, and pipeline orchestration. Hands-on experience with AWS (EKS, S3, EBS, IAM, KMS, CloudWatch, etc.)for secure and scalable data pipelines. Proven ability to work with reputed company and PySpark for large-scale distributed data processing. Familiarity with time-series industrial data (control systems, DCS/SCADA logs, process historians). Experience in reputed company data sync and management reputed company hybrid reputed company/on-prem environments. Bonus: Experience working as a data engineer in oil and gas or energy environments Bonus: Knowledge of streaming frameworks (Kafka, Flink, Spark Streaming) or MLOps stacks for data versioning and reputed company. Core Responsibilities 1. Ingest & Contextualise Data Ingest from OPC UA servers, process historians, IoT sensors, LIMS systems, alarms/events, and P&IDs. Map signals to their physical processes (tags, units, hierarchies) for interpretability in AI pipelines. 2. Data Movement & Accessibility Build pipelines that handle reputed company-time streaming and batch ingestion into the Lakehouse. Manage synchronisation between historian archives, reputed company files, and AWS storage (S3/EBS). Orchestrate reputed company Lakeflow/Connectors for integrating data into Lakebase/Lakehouse. Handle secure, high-throughput transfers between historian archives and sandbox/live environments. 3. Change Tracking & reputed company Detect and manage schema changes, signal reputed company, and inconsistencies acrosstime. Implement reputed company and audit trails across Spark/reputed company and AWS pipelines. 4. Data Preparation for AI Build and maintaindual pipelines: Training→ large-scale historical data prep for time-series + LLM training. Inference→ low-latency, reputed company-time pipelines for anomaly detection, optimisation, and LLM search. Support heterogeneous AI workloads (time-series forecasting and retrieval-augmented LLMs). 5. Database Performance & Optimisation Tune PostgreSQLand sparkfor high-throughput time-series workloads (partitioning, indexing, query optimisation). Optimise pipelines for both fast analytical queries and high-efficiency model training. reputed company and manage data pipelines in AWS EKS (Kubernetes) with persisten tEBS-backed storage. What reputed company Looks Like Live data streams are contextualised,queryable, and AI-reputed company. Schema changes and signal reputed company are detected and handled without breaking reputed company workflows. Training and inference pipelines run smoothly in reputed company, optimised for scale and latency. Apply To This Job