API Health CLI: An HTTP Endpoint Monitor in Go
A command-line tool for monitoring the health of HTTP APIs in SRE and DevOps environments. Built in Go to leverage the language’s native concurrency model.
Full source code available on my GitHub repository.
The Problem
Infrastructure teams need to quickly check the status of multiple endpoints — during deployments, incidents, or as part of CI/CD pipelines. Existing tools are either too heavy or lack flexible configuration. This CLI addresses:
- Concurrent checks: verify N endpoints in parallel, not sequentially
- Declarative configuration: YAML file with per-endpoint headers, timeouts, and status codes
- Retries with backoff: avoid false positives from transient timeouts
- CI mode: exit code 1 on any failure, ideal for pipelines
Architecture
The project follows a layered architecture with well-defined packages:
cmd/ → CLI commands (cobra): check and watch
internal/
checker/ → Concurrent HTTP check logic with retries
config/ → YAML parser with environment variable expansion
notifier/ → Slack webhook notifier
output/ → Colored terminal result rendering
Stack
| Component | Technology |
|---|---|
| Language | Go 1.22 |
| CLI Framework | Cobra |
| Colored output | fatih/color |
| Config | gopkg.in/yaml.v3 |
| Tests | stdlib (net/http/httptest) |
Design Decisions
1. Concurrency with goroutines and WaitGroup
Checks run in independent goroutines and results are stored in a pre-allocated slice by index, preserving order without extra synchronization:
func (hc *HealthChecker) CheckAll(endpoints []Endpoint) []Result {
results := make([]Result, len(endpoints))
var wg sync.WaitGroup
for i, ep := range endpoints {
wg.Add(1)
go func(idx int, endpoint Endpoint) {
defer wg.Done()
results[idx] = hc.Check(endpoint)
}(i, ep)
}
wg.Wait()
return results
}
2. Retries with exponential backoff
On failure, the checker retries with increasing backoff to avoid overwhelming the target server:
for attempt := 0; attempt <= retries; attempt++ {
if attempt > 0 {
backoff := time.Duration(math.Pow(2, float64(attempt-1))) * 500 * time.Millisecond
time.Sleep(backoff)
}
lastResult = hc.doCheck(ep.URL, method, ep.Headers, expectedStatus)
if lastResult.Healthy {
return lastResult
}
}
3. Environment variable expansion in config
Sensitive tokens and URLs stay out of configuration files — they’re injected as environment variables:
endpoints:
- url: https://api.example.com/health
headers:
Authorization: "Bearer ${API_TOKEN}"
notifications:
slack:
webhook_url: "${SLACK_WEBHOOK_URL}"
4. CI mode with exit code
In CI/CD pipelines, the --ci flag makes the tool return exit code 1 if any endpoint fails:
healthcheck check https://api.example.com/health --ci
# Returns exit 1 if endpoint is down → pipeline fails correctly
Usage
# Quick URL check
healthcheck check https://api.example.com/health https://status.example.com
# With config file
healthcheck check --config endpoints.yaml
# With timeout and retries
healthcheck check https://api.example.com/health --timeout 3s --retries 2
# Continuous monitoring (30s default)
healthcheck watch --config endpoints.yaml --interval 10s
# With Slack notifications
healthcheck check --config endpoints.yaml --slack "$SLACK_WEBHOOK_URL"
# CI mode
healthcheck check --config endpoints.yaml --ci
Sample output
ENDPOINT STATUS TIME
──────────────────────────────────────────────────────────────────
✔ https://api.example.com/health 200 OK 124ms
✔ https://status.example.com/ping 200 OK 89ms
✘ https://internal.example.com/readiness 503 Service U 2.1s
2/3 healthy — total time 2.3s
Testing
The project has 24 tests across three packages:
- checker (10 tests): healthy/unhealthy checks, custom status codes, headers, default method, concurrent result ordering, and User-Agent header
- config (9 tests): YAML parsing, defaults, format errors, env var expansion, multiple endpoints, file loading
- notifier (6 tests): payload delivery, empty list skips request, 500 error handling, unreachable webhook, multiple failures batched in one request
func TestCheckAll_ReturnsResultsInOrder(t *testing.T) {
statuses := []int{200, 503, 200, 404, 200}
// ... creates one server per status code
results := hc.CheckAll(endpoints)
for i, result := range results {
expectedHealthy := statuses[i] == 200
if result.Healthy != expectedHealthy {
t.Errorf("result[%d]: expected healthy=%v, got healthy=%v", i, expectedHealthy, result.Healthy)
}
}
}
Coverage: checker 91.8% · config 100% · notifier 95.7%
How to Run
git clone https://github.com/enriquevaldivia1988/api-health-cli.git
cd api-health-cli
# Build
make build
# Run tests
make test
# Install globally
make install
# Use directly with go run
go run . check https://httpbin.org/status/200 https://httpbin.org/status/503
Conclusion
This project demonstrates how to build a production-grade CLI tool in Go: with real concurrency, flexible configuration, tests using embedded HTTP servers, and a polished terminal user experience.
Enrique Valdivia