MIOps & Reliability Engineer

Congregate Technologies

35 LPA

Location: Hyderabad

Posted: February 10, 2026

Posted By: System Administrator

Job Description

MIOps & Reliability Engineer

Exp-8 to 14 Yrs
Location- Hyderabad, India
Role Overview
We are seeking a Senior AIOps & Reliability Engineer to design and build AI- and ML-driven operational intelligence systems. This role focuses on proactive reliability, observability, and intelligent automation across cloud-native platforms, enabling early risk detection, faster incident resolution, and self-healing systems.
Key Responsibilities – AI & Machine Learning–Driven Operations Intelligence
- Design and implement ML models for anomaly detection, predictive incident detection, failure forecasting, and root cause analysis.
- Apply AI-assisted analysis for incident summarization, classification, and remediation recommendations.
- Engineer data pipelines converting observability telemetry into ML-ready datasets.
- Continuously evaluate, retrain, and improve models using production feedback.
Shift-Left AIOps & Reliability Engineering
- Implement shift-left AIOps initiatives to surface risks early in the SDLC.
- Apply ML to code changes, Terraform diffs, and deployment metadata to predict operational risk.
- Embed ML-driven risk scoring into Azure DevOps CI/CD and PR workflows.
- Partner with engineering teams to validate observability-first development practices.
AI-Powered Operational Intelligence
- Design AI-driven incident summarization, AI-assisted runbooks, and guided remediation.
- Build human-in-the-loop decision systems for high-impact incidents.
- Balance AI, ML, and deterministic automation with a focus on explainability and trust.
Observability & Telemetry Engineering
- Instrument applications using OpenTelemetry (OTEL).
- Normalize and correlate metrics, logs, and traces.
- Integrate telemetry pipelines with New Relic.
- Define and monitor SLIs, SLOs, and operational health signals.
Cloud, Kubernetes & Platform Operations
- Design and operate workloads on Microsoft Azure.
- Manage Azure Kubernetes Service (AKS) clusters.
- Deploy containerized .NET 8 services using Helm.
ML-Enabled DevOps, Infrastructure & Automation
- Build Azure DevOps pipelines for application, infrastructure, and ML deployments.
- Manage source control using Azure DevOps Repos.
- Implement Infrastructure as Code using Terraform.
- Automate workflows using Ansible.
Intelligent Automation & Self-Healing Systems
- Build closed-loop automation triggered by ML predictions.
- Reduce alert fatigue using intelligent correlation.
- Develop self-healing systems to reduce MTTR.
Required Skills & Experience
- Strong ML fundamentals including anomaly detection and time-series analysis.
- Experience applying AI/LLM systems to operational workflows.
- Hands-on Microsoft Azure and AKS experience.
- Proficiency with Kubernetes, Helm, Azure DevOps, Terraform, and Ansible.
- Experience with OpenTelemetry and New Relic.
Nice to Have
- MLOps or ML lifecycle management experience.
- Python for ML experimentation or AI prototyping.
- Familiarity with SRE principles.
What Success Looks Like
- Operational risks identified before production.
- Early incident prediction through ML and AI insights.
- Reduced alert noise and faster incident resolution.
- Continuous improvement in platform reliability.


Technical Skills

Azure, AKS, Kubernetes, Helm, .NET 8, OpenTelemetry, New Relic,
AIOps, MLOps, Machine Learning, Anomaly Detection, Time-Series Analysis,
LLMs, Predictive Analytics, Root Cause Analysis,
Azure DevOps, CI/CD, Terraform, Ansible,
Python, Infrastructure as Code, Observability, SRE,
Incident Management, Self-Healing Systems
Application Stats

Total Applications: 0

Posted: Feb 10, 2026

About Company
Congregate Technologies

Looking for talented professionals to join our team.

View Company Profile
Share This Job