Back to Jobs

Production Support Engineer

Remote, USA Full-time Posted 2026-06-06

Professional Services – Support Practice Full-time · Remote (India / US / EU) Experience : 2-5 years About Lyzr Lyzr.ai's agentic AI platform powers intelligent, autonomous workflows for enterprise clients. Production Support Engineers are the front line that keeps those workflows healthy — triaging incidents, resolving tickets, digging into logs, and escalating the right issues to the right teams before clients feel the pain. This role suits someone who thrives in a fast-paced technical environment, takes ownership seriously, and genuinely enjoys the detective work of diagnosing why something broke in production. You will work within a global follow-the-sun support model, reporting to the Production Support Lead. What you’ll do Incident response & triage Monitor production dashboards and alerts; acknowledge, classify (P1–P3), and triage incoming incidents within SLA response windows. Perform first-level diagnosis using logs, traces, and monitoring tools (Datadog / Grafana / CloudWatch) to isolate root cause or rule out environmental issues. Execute approved runbook steps to resolve known issues independently; escalate novel or high-severity issues to the Lead with a clear diagnostic summary. Maintain accurate, time-stamped ticket updates throughout the incident lifecycle so clients and internal stakeholders always have visibility. Service request fulfilment Handle client service requests: configuration changes, access provisioning, agent re-deployments, and data queries within approved change management guardrails. Validate and document completed requests, ensuring audit trails are maintained in the ticketing system. Identify recurring requests that could be automated or self-served, and flag them to the Lead for process improvement. Monitoring & proactive health checks Run scheduled health checks on production agent pipelines, API integrations, and data connectors; raise pre-emptive alerts for degradation trends. Maintain and update monitoring dashboards; propose new alert thresholds based on observed patterns. Participate in post-mortems and contribute findings to the known-error database and runbooks. Knowledge & collaboration Document solutions to new issues in the internal knowledge base; keep existing runbooks accurate and up to date. Collaborate with Engineering, Platform, and Customer Success teams during handoffs, providing clear reproduction steps and log artefacts. Participate in the on-call rotation (shift-based); expected availability for P1 escalations during assigned windows. What you bring Experience: 2–5 years in application / production support or a NOC environment Domain: SaaS or cloud-hosted platform support; AI/ML familiarity a strong plus Technical: Log analysis, API debugging, SQL queries, basic Python / shell scripting Monitoring: Datadog, Grafana, CloudWatch, or equivalent observability tools Ticketing: Jira Service Management, ServiceNow, or Zendesk Cloud basics: AWS / GCP / Azure fundamentals; Docker / Kubernetes awareness Additionally, you will have: A methodical, structured approach to troubleshooting — you document what you tried, not just what worked. Clear written communication: ticket updates, client-facing messages, and handover notes that leave no ambiguity. Comfort working across time zones and collaborating asynchronously with distributed teams. Bonus: exposure to LLM-based or agentic AI systems, prompt engineering, or RAG pipelines in production. Bonus: ITIL Foundation certification or equivalent incident management training. Apply To This Job

Similar Jobs

Salesforce AI / Agentforce Developer

Remote, USA Full-time

Talent Acquisition Specialist, Philippines

Remote, USA Full-time

Character Concept Artist

Remote, USA Full-time

Mitarbeiter (m/w/d) Vertriebsinnendienst

Remote, USA Full-time

Professional Services Consultant

Remote, USA Full-time

Mitarbeiter (m/w/d) Vertriebsinnendienst

Remote, USA Full-time

Mitarbeiter (m/w/d) Vertriebsinnendienst

Remote, USA Full-time

Procurement Lead

Remote, USA Full-time

Forward Deployed Software Engineer

Remote, USA Full-time

Mitarbeiter (m/w/d) Vertriebsaußendienst

Remote, USA Full-time

Job Title: Experienced Data Entry Specialist – Remote Work Opportunity with arenaflex

Remote, USA Full-time

Remote Data Entry Specialist – Work From Home | Flexible Hours, Competitive Pay & Career Advancement Opportunities

Remote, USA Full-time

Experienced Remote Customer Service Specialist – Delivering Exceptional Arenaflex Experiences

Remote, USA Full-time

Experienced Full Stack Software Engineer – Web & Cloud Application Development for Regulatory Gen AI Chat Application at arenaflex

Remote, USA Full-time

Presales Architect

Remote, USA Full-time

Senior Game Designer - Mobile Entertainment

Remote, USA Full-time

Senior Internal Auditor (CPA, CIA, CFE, CCEP certs preferred)

Remote, USA Full-time

Oncology Account Executive - Detroit South

Remote, USA Full-time

Part‑Time Remote Virtual Assistant & Data Entry Specialist – E‑Commerce Store Management for arenaflex

Remote, USA Full-time

Workforce Engagement Management, Verint DPA Consultant

Remote, USA Full-time