Prometheus Chaos Edition -

Run this between Prometheus and your real exporters. Watch Prometheus log parse error and target down – then verify your alerts fire correctly.

@app.route('/metrics') def metrics(): if random.random() < 0.2: # 20% of the time return "malformed_metric{ invalid syntax", 200 return Response(real_metrics(), mimetype='text/plain')

What happens when your Prometheus server runs out of memory? What if a metric scrape takes 30 seconds because a target is thrashing? What if your alerting rules become corrupt?

# Inject 5s latency into 50% of scrape requests for 2 minutes curl -X POST http://localhost:9091/inject/latency \ -d '"duration":"2m","percent":50,"delay":"5s"' If you run Prometheus Operator, pair it with Chaos Mesh (CNCF project) and a NetworkChaos experiment:

Prometheus Chaos Edition turns the old monitoring paradox on its head. Instead of trusting your monitoring blindly, you break it on purpose – gently, repeatedly, and observably.

Prometheus Chaos Edition -

Run this between Prometheus and your real exporters. Watch Prometheus log parse error and target down – then verify your alerts fire correctly.

@app.route('/metrics') def metrics(): if random.random() < 0.2: # 20% of the time return "malformed_metric{ invalid syntax", 200 return Response(real_metrics(), mimetype='text/plain') prometheus chaos edition

What happens when your Prometheus server runs out of memory? What if a metric scrape takes 30 seconds because a target is thrashing? What if your alerting rules become corrupt? Run this between Prometheus and your real exporters

Prometheus Chaos Edition turns the old monitoring paradox on its head. Instead of trusting your monitoring blindly, you break it on purpose – gently, repeatedly, and observably.