Service availability monitoring/flapping services

schizo@forum.uncomfortable.business · 10 months ago

Service availability monitoring/flapping services

schizo@forum.uncomfortable.business · 10 months ago

It’s hilariously annoying, but to address your points:

There’s nothing in any of the service logs
It’s notifications from services that have external monitoring, but is not always the same service
The local monitoring (which uses the same DNS records for resolution, and uses the same reverse proxy to connect) doesn’t flap at all.
It’s sites behind cloudflare, ones not behind cloudflare, and one via one of their argo tunnels, so it doesn’t seem specific to CF.

The 503s are coming from cloudflare indicating it can’t connect to the back end, which makes me think network issue again. Non-CF sites just show timeout errors.

I don’t think it’s resource related; it’s a 10850k with 64gb of ram, and it’s currently using uh, 3% cpu and about 15gb of ram so there’s more than sufficient idle resources to handle even a substantial spike in traffic (which I don’t see any indications of in the logs, but).

It’s gotta be some incredibly transient network issue but it’s so transient I’m not sure how to actually make a determination as to what happens when it breaks, since it’s “fixed itself” by the time I can get near enough to something to take a look.