Seeking a talented Observability Engineer with hands-on experience in Datadog to enhance infrastructure and application monitoring. The role involves designing and implementing observability solutions across cloud-native environments.
Key Responsibilities
Build and maintain Datadog-based monitoring for apps, infrastructure, and cloud services
Create dashboards, monitors, and alerts to proactively detect issues
Collaborate with DevOps, SRE, and app teams to define SLOs, SLIs, and KPIs
Integrate Datadog with AWS, Kubernetes, CI/CD tools, and logging systems
Conduct performance tuning and root cause analysis
Automate observability using IaC and scripting (Terraform, Python)
Must-Have Skills
6+ years in monitoring/observability
4+ years hands-on with Datadog (APM, infra monitoring, custom metrics)
Experience with AWS, GCP, or Azure
Kubernetes and microservices monitoring
Log management, tracing, alert tuning
Scripting (Python, Shell) and IaC (Terraform preferred)
Solid grasp of DevOps/SRE practices
Nice-to-Have Skills
Datadog certifications
Integration with CI/CD, ticketing, and chatops tools
Familiarity with other tools like Prometheus, Grafana, Splunk
Experience with performance testing tools (JMeter, k6)