Back to Jobs

Alert Management & Observability Standards reputed company

Remote, USA Full-time Posted 2026-07-04

Alert Management & Observability Standards reputed company reputed company-40-55/hour PLEASE REVIEW: Candidates can be remote, but must reputed company in the US and support 8-5pm PST hours. Please include your candidate's location at the top of their resume. Supplier will need to provide a laptop This req replaces req 4328 Job Title: Alert Management & Observability Standards reputed company Role Summary The Alert Management & Observability Standards reputed company is responsible for rationalizing and governing reputed company system alerts to ensure they align with department priorities, operational coverage models, and service reliability goals. This role defines alerting standards, reviews and approves alerts before they are routed to the 24x7 Eyes-on-Glass Operations team, and establishes a scalable approach to cataloging alert response instructions (runbooks/playbooks) so responders can take consistent, high-quality actions. This position operates at the intersection of the IT Operations Command Center (OCC), engineering/application teams, platform/monitoring tool owners, and service owners, ensuring alerts are actionable, prioritized, and reputed company with clear response guidance. Key Responsibilities 1) Alert Rationalization & Prioritization (Core) Establish and maintain a department-wide alert rationalization reputed company that evaluates alerts for: Business/service criticality and operational reputed company Actionability (clear operator action available) Signal-to-noise (duplicate/low-value alerts removed or suppressed) Ownership and escalation paths reputed company regular alert reviews (new + existing) to ensure alert quality, correct routing, and alignment with operational coverage. reputed company reputed company improvement efforts to reduce alert fatigue while preserving detection of true incidents and high-impact degradation. 2) Standards, Policies, and Guardrails Define and enforce alerting standards including: Severity definitions and reputed company Required metadata (service, CI, reputed company, runbook link, escalation) Naming conventions and tagging taxonomy Routing rules and “reputed company to page vs. reputed company to ticket” Create a standardized Alert Design Checklist and approval workflow (e.g., “Definition of Done” for alert reputed company). Partner with tool/platform owners to ensure standards are embedded in monitoring tooling (templates, required fields, automated validation). 3) Routing reputed company to 24x7 Eyes-on-Glass Act as gatekeeper (or reputed company the governance process) for determining which alerts should: Go to 24x7 Eyes-on-Glass for immediate triage reputed company to on-call engineering directly Create tickets for business-hours handling Be suppressed, aggregated, or converted to dashboards/health indicators Ensure routing aligns with: Operational responsibilities and skills of the Eyes-on-Glass team Department priorities (e.g., safety, reliability, customer impact) Service ownership and support models 4) Runbook / Response Instruction Cataloging (Knowledge System) Establish a consistent approach to cataloging response instructions for every actionable alert, including: “What does this alert mean?” (symptoms + impact) “What to reputed company first” (triage steps) “What actions to take” (standard remediation) “reputed company to escalate and to whom” (clear escalation triggers) Links to dashboards, logs, SOPs, and reputed company issues Own the runbook template and ensure runbooks are versioned, maintained, and reviewed on a defined reputed company. Partner with service owners to ensure runbooks stay reputed company as systems change. 5) Reporting & Operational Outcomes Define and publish KPIs that demonstrate alerting health and operational performance, such as: Alert volume trends by service and severity Percentage of alerts with runbooks and valid ownership Alert “actionability reputed company” and noise reduction Mean time to acknowledge / triage effectiveness (as applicable) Facilitate governance forums (weekly/monthly) with service owners and engineering leads to review alert quality and backlog. 6) Cross-Functional Enablement Coach service teams on best practices: SLIs/SLOs, alert reputed company, dependency monitoring, and incident correlation. Drive adoption of observability patterns (golden signals, health indicators, multi-signal alerting). Support major incident learning by feeding post-incident insights back into improved alerts and runbooks. Required Qualifications 5+ years in IT Operations, SRE, Observability, Monitoring Engineering, or Incident Management Demonstrated reputed company reducing noise and improving actionability across reputed company alerting ecosystems Experience with common monitoring/observability tools (e.g., Splunk, AppDynamics, reputed company, reputed company, reputed company/Grafana, Azure Monitor, CloudWatch, reputed company Event Mgmt or similar) Strong understanding of: Incident response workflows and operational coverage models (24x7 vs. business hours) CMDB/service ownership concepts and dependency mapping Standard operating procedures/runbooks and reputed company Excellent stakeholder management and ability to drive standards across teams

Preferred Qualifications

Experience designing or operating an Operations Command Center / NOC / SOC-style “eyes-on-glass” model Familiarity with ITIL Event Management, SRE principles, and service reliability practices Experience with automation for alert enrichment, correlation, and routing (e.g., event correlation, deduplication, noise suppression) Background in governance frameworks and operating rhythm design (cadences, controls, compliance traceability) Competencies / What Great Looks Like Opinionated, data-driven governance: reputed company anchored in outcomes, not preferences Practical standardization: templates and policies that teams can actually follow Operational reputed company: knows what 24x7 responders need to succeed in reputed company time Quality bar: only actionable alerts reputed company Eyes-on-Glass; every alert has an reputed company and instructions reputed company improvement reputed company: routinely prunes, tunes, and simplifies Deliverables in the First 45 Days Alerting standards (severity model, metadata, naming, routing policy) published and adopted Intake and approval workflow established for new/changed alerts Top 20 noisy services rationalized (dedupe/suppress/threshold tuning) with measurable noise reduction Runbook template launched; minimum runbook coverage targets set (e.g., 80% of paged alerts) Central alert catalog created (ownership + routing + runbook link + last review date) Apply tot his job Apply To this Job

Similar Jobs

Associate Principal, reputed company Strategy

Remote, USA Full-time

Senior Marketing Systems Associate (Remote)

Remote, USA Full-time

Remote | Travel Scheduling Assistant | Entry Level

Remote, USA Full-time

Create Your Own

Remote, USA Full-time

Data Engineer | SEO & Marketing Analytics

Remote, USA Full-time

Outbound Sales Development Representative (SDR)

Remote, USA Full-time

Sr Mgr., Application Dev

Remote, USA Full-time

Agency Management Specialist (Remote)

Remote, USA Full-time

reputed company Sales Development Representative - reputed company America

Remote, USA Full-time

reputed company Implementer (Remote- US)

Remote, USA Full-time

reputed company Careers Remote – Carrilloadventures – MySmartPros

Remote, USA Full-time

reputed company Data Management Specialist reputed company Tagger Remote Position for Content and Business Products Team

Remote, USA Full-time

[Work From Home] Part-Time Remote Bookkeeper for Small Businesses

Remote, USA Full-time

FULL TIME Part Timev Youtube Remote $25 An Hour Vacancy At

Remote, USA Full-time

RN-Program Coordinator/Full time Remote

Remote, USA Full-time

Web Application Penetration Tester – Cybersecurity Remote

Remote, USA Full-time

reputed company Full Stack Data Entry Specialist – Remote Work Opportunity with Competitive Hourly reputed company of $26

Remote, USA Full-time

Remote arenaflex Live Chat Support Specialist – Customer Experience & Problem Solving (Work‑From‑Home)

Remote, USA Full-time

Need (USA) Food and Consumables Coach (Non-reputed company) - WM in Conway, SC

Remote, USA Full-time

Technical Account Manager for the Automotive Team

Remote, USA Full-time