Metrics reference

TAG exposes Prometheus metrics at the /metrics endpoint.

Accessing metrics

# Local
curl http://localhost:8080/metrics

# Kubernetes (port-forward)
kubectl port-forward svc/tag 8080:8080
curl http://localhost:8080/metrics

Request metrics

tag_requests_total

Type: Counter

Total number of requests processed by TAG.

Label	Description
`operation`	S3 operation: `GetObject`, `PutObject`, `DeleteObject`, `HeadObject`
`status`	Result: `success`, `error`, `auth_error`, `range_not_satisfiable`

# Request rate by operation
rate(tag_requests_total[5m])

# Error rate
sum(rate(tag_requests_total{status="error"}[5m])) / sum(rate(tag_requests_total[5m]))

# GetObject success rate
rate(tag_requests_total{operation="GetObject",status="success"}[5m]) /
rate(tag_requests_total{operation="GetObject"}[5m])

tag_request_duration_seconds

Type: Histogram

Request duration in seconds.

Label	Description
`operation`	S3 operation

# P50 latency
histogram_quantile(0.5, rate(tag_request_duration_seconds_bucket[5m]))

# P99 latency by operation
histogram_quantile(0.99, sum(rate(tag_request_duration_seconds_bucket[5m])) by (operation, le))

Cache metrics

tag_cache_hits_total

Type: Counter — total number of cache hits.

tag_cache_misses_total

Type: Counter — total number of cache misses.

tag_cache_operations_total

Type: Counter

Label	Description
`operation`	Operation type: `get`, `put`, `delete`
`result`	Result: `hit`, `miss`, `success`, `error`

# Cache hit ratio
rate(tag_cache_hits_total[5m]) /
(rate(tag_cache_hits_total[5m]) + rate(tag_cache_misses_total[5m]))

# Cache operation breakdown
sum by (operation, result) (rate(tag_cache_operations_total[5m]))

tag_range_from_cache_hits_total

Type: Counter — number of range requests served from cached full objects.

Broadcast metrics

tag_broadcast_shared_total

Type: Counter — requests that joined an existing broadcast stream.

tag_broadcast_fetches_total

Type: Counter — upstream fetches (broadcast initiators).

tag_broadcast_slow_consumers_total

Type: Counter — listeners disconnected for being too slow.

tag_active_broadcasts

Type: Gauge — currently active broadcast streams.

# Coalescing ratio (higher is better)
rate(tag_broadcast_shared_total[5m]) /
(rate(tag_broadcast_shared_total[5m]) + rate(tag_broadcast_fetches_total[5m]))

Background fetch metrics

tag_background_fetches_triggered_total

Type: Counter — background full-object fetches triggered by range requests.

tag_background_fetches_succeeded_total

Type: Counter — background fetches completed successfully.

tag_background_fetches_failed_total

Type: Counter — background fetches that failed.

tag_active_background_fetches

Type: Gauge — currently active background fetches.

# Background fetch success rate
rate(tag_background_fetches_succeeded_total[5m]) /
rate(tag_background_fetches_triggered_total[5m])

Revalidation metrics

tag_revalidations_triggered_total

Type: Counter — cache revalidation attempts (conditional GET/HEAD to upstream).

tag_revalidations_not_modified_total

Type: Counter — revalidations where upstream returned 304 Not Modified.

tag_revalidations_updated_total

Type: Counter — revalidations where upstream returned 200 with new data.

tag_revalidations_failed_total

Type: Counter — revalidations that failed due to errors.

tag_revalidations_stale_served_total

Type: Counter — times stale cached data was served because revalidation failed.

# Revalidation 304 ratio (higher = better cache freshness)
rate(tag_revalidations_not_modified_total[5m]) /
rate(tag_revalidations_triggered_total[5m])

# Stale serve ratio (should be low)
rate(tag_revalidations_stale_served_total[5m]) /
rate(tag_revalidations_triggered_total[5m])

Upstream metrics

tag_upstream_request_duration_seconds

Type: Histogram — upstream (Tigris) request duration in seconds.

Label	Description
`method`	HTTP method: `GET`, `PUT`, `DELETE`, `HEAD`

tag_upstream_errors_total

Type: Counter — total upstream errors.

Label	Description
`method`	HTTP method

Authentication metrics

tag_auth_failures_total

Type: Counter

Label	Description
`reason`	Failure reason: `invalid_signature`, `unknown_key`, `expired`

tag_local_auth_validations_total

Type: Counter — local authentication validation attempts in transparent proxy mode.

Label	Description
`result`	Validation result: `success`, `missing_auth`, `parse_error`, `unknown_key`, `signature_mismatch`, `authz_expired`

# Local auth success rate
rate(tag_local_auth_validations_total{result="success"}[5m]) /
sum(rate(tag_local_auth_validations_total[5m]))

# Auth failure breakdown by reason
sum by (result) (rate(tag_local_auth_validations_total{result!="success"}[5m]))

tag_derived_key_store_size

Type: Gauge — number of derived signing keys currently stored. TAG learns signing keys from Tigris responses and caches them for local SigV4 validation. A value of 0 after receiving requests indicates key learning is not working.

tag_authz_cache_size

Type: Gauge — number of active per-bucket authorization cache entries (accessKey × bucket pairs). Each entry represents a client that has been granted access to a specific bucket.

tag_proxy_signing_keys_received_total

Type: Counter — number of signing key sets received from Tigris responses. Incremented each time Tigris returns an X-Tigris-Proxy-Signing-Keys header that TAG uses to enable local validation.

# Rate of new key learning events
rate(tag_proxy_signing_keys_received_total[5m])

Connection metrics

tag_active_connections

Type: Gauge — number of active connections.

tag_bytes_transferred_total

Type: Counter — total bytes transferred.

Label	Description
`direction`	Transfer direction: `in`, `out`

# Throughput (bytes/sec)
rate(tag_bytes_transferred_total[5m])

# Outbound throughput
rate(tag_bytes_transferred_total{direction="out"}[5m])

Prometheus scrape configuration

scrape_configs:
  - job_name: "tag"
    kubernetes_sd_configs:
      - role: pod
    relabel_configs:
      - source_labels: [__meta_kubernetes_pod_label_app]
        action: keep
        regex: tag
      - source_labels: [__meta_kubernetes_pod_container_port_number]
        action: keep
        regex: "8080"

Accessing metrics​

Request metrics​

tag_requests_total​

tag_request_duration_seconds​

Cache metrics​

tag_cache_hits_total​

tag_cache_misses_total​

tag_cache_operations_total​

tag_range_from_cache_hits_total​

Broadcast metrics​

tag_broadcast_shared_total​

tag_broadcast_fetches_total​

tag_broadcast_slow_consumers_total​

tag_active_broadcasts​

Background fetch metrics​

tag_background_fetches_triggered_total​

tag_background_fetches_succeeded_total​

tag_background_fetches_failed_total​

tag_active_background_fetches​

Revalidation metrics​

tag_revalidations_triggered_total​

tag_revalidations_not_modified_total​

tag_revalidations_updated_total​

tag_revalidations_failed_total​

tag_revalidations_stale_served_total​

Upstream metrics​

tag_upstream_request_duration_seconds​

tag_upstream_errors_total​

Authentication metrics​

tag_auth_failures_total​

tag_local_auth_validations_total​

tag_derived_key_store_size​

tag_authz_cache_size​

tag_proxy_signing_keys_received_total​

Connection metrics​

tag_active_connections​

tag_bytes_transferred_total​

Prometheus scrape configuration​

Accessing metrics

Request metrics

tag_requests_total

tag_request_duration_seconds

Cache metrics

tag_cache_hits_total

tag_cache_misses_total

tag_cache_operations_total

tag_range_from_cache_hits_total

Broadcast metrics

tag_broadcast_shared_total

tag_broadcast_fetches_total

tag_broadcast_slow_consumers_total

tag_active_broadcasts

Background fetch metrics

tag_background_fetches_triggered_total

tag_background_fetches_succeeded_total

tag_background_fetches_failed_total

tag_active_background_fetches

Revalidation metrics

tag_revalidations_triggered_total

tag_revalidations_not_modified_total

tag_revalidations_updated_total

tag_revalidations_failed_total

tag_revalidations_stale_served_total

Upstream metrics

tag_upstream_request_duration_seconds

tag_upstream_errors_total

Authentication metrics

tag_auth_failures_total

tag_local_auth_validations_total

tag_derived_key_store_size

tag_authz_cache_size

tag_proxy_signing_keys_received_total

Connection metrics

tag_active_connections

tag_bytes_transferred_total

Prometheus scrape configuration