# Metrics reference

TAG exposes Prometheus metrics at the `/metrics` endpoint.

## Accessing metrics[​](#accessing-metrics "Direct link to Accessing metrics")

```
# Local

curl http://localhost:8080/metrics



# Kubernetes (port-forward)

kubectl port-forward svc/tag 8080:8080

curl http://localhost:8080/metrics
```

## Request metrics[​](#request-metrics "Direct link to Request metrics")

### tag\_requests\_total[​](#tag_requests_total "Direct link to tag_requests_total")

**Type:** Counter

Total number of requests processed by TAG.

| Label       | Description                                                          |
| ----------- | -------------------------------------------------------------------- |
| `operation` | S3 operation: `GetObject`, `PutObject`, `DeleteObject`, `HeadObject` |
| `status`    | Result: `success`, `error`, `auth_error`, `range_not_satisfiable`    |

```
# Request rate by operation

rate(tag_requests_total[5m])



# Error rate

sum(rate(tag_requests_total{status="error"}[5m])) / sum(rate(tag_requests_total[5m]))



# GetObject success rate

rate(tag_requests_total{operation="GetObject",status="success"}[5m]) /

rate(tag_requests_total{operation="GetObject"}[5m])
```

### tag\_request\_duration\_seconds[​](#tag_request_duration_seconds "Direct link to tag_request_duration_seconds")

**Type:** Histogram

Request duration in seconds.

| Label       | Description  |
| ----------- | ------------ |
| `operation` | S3 operation |

```
# P50 latency

histogram_quantile(0.5, rate(tag_request_duration_seconds_bucket[5m]))



# P99 latency by operation

histogram_quantile(0.99, sum(rate(tag_request_duration_seconds_bucket[5m])) by (operation, le))
```

## Cache metrics[​](#cache-metrics "Direct link to Cache metrics")

### tag\_cache\_hits\_total[​](#tag_cache_hits_total "Direct link to tag_cache_hits_total")

**Type:** Counter — total number of cache hits.

### tag\_cache\_misses\_total[​](#tag_cache_misses_total "Direct link to tag_cache_misses_total")

**Type:** Counter — total number of cache misses.

### tag\_cache\_operations\_total[​](#tag_cache_operations_total "Direct link to tag_cache_operations_total")

**Type:** Counter

| Label       | Description                               |
| ----------- | ----------------------------------------- |
| `operation` | Operation type: `get`, `put`, `delete`    |
| `result`    | Result: `hit`, `miss`, `success`, `error` |

```
# Cache hit ratio

rate(tag_cache_hits_total[5m]) /

(rate(tag_cache_hits_total[5m]) + rate(tag_cache_misses_total[5m]))



# Cache operation breakdown

sum by (operation, result) (rate(tag_cache_operations_total[5m]))
```

### tag\_range\_from\_cache\_hits\_total[​](#tag_range_from_cache_hits_total "Direct link to tag_range_from_cache_hits_total")

**Type:** Counter — number of range requests served from cached full objects.

## Broadcast metrics[​](#broadcast-metrics "Direct link to Broadcast metrics")

### tag\_broadcast\_shared\_total[​](#tag_broadcast_shared_total "Direct link to tag_broadcast_shared_total")

**Type:** Counter — requests that joined an existing broadcast stream.

### tag\_broadcast\_fetches\_total[​](#tag_broadcast_fetches_total "Direct link to tag_broadcast_fetches_total")

**Type:** Counter — upstream fetches (broadcast initiators).

### tag\_broadcast\_slow\_consumers\_total[​](#tag_broadcast_slow_consumers_total "Direct link to tag_broadcast_slow_consumers_total")

**Type:** Counter — listeners disconnected for being too slow.

### tag\_active\_broadcasts[​](#tag_active_broadcasts "Direct link to tag_active_broadcasts")

**Type:** Gauge — currently active broadcast streams.

```
# Coalescing ratio (higher is better)

rate(tag_broadcast_shared_total[5m]) /

(rate(tag_broadcast_shared_total[5m]) + rate(tag_broadcast_fetches_total[5m]))
```

## Background fetch metrics[​](#background-fetch-metrics "Direct link to Background fetch metrics")

### tag\_background\_fetches\_triggered\_total[​](#tag_background_fetches_triggered_total "Direct link to tag_background_fetches_triggered_total")

**Type:** Counter — background full-object fetches triggered by range requests.

### tag\_background\_fetches\_succeeded\_total[​](#tag_background_fetches_succeeded_total "Direct link to tag_background_fetches_succeeded_total")

**Type:** Counter — background fetches completed successfully.

### tag\_background\_fetches\_failed\_total[​](#tag_background_fetches_failed_total "Direct link to tag_background_fetches_failed_total")

**Type:** Counter — background fetches that failed.

### tag\_active\_background\_fetches[​](#tag_active_background_fetches "Direct link to tag_active_background_fetches")

**Type:** Gauge — currently active background fetches.

```
# Background fetch success rate

rate(tag_background_fetches_succeeded_total[5m]) /

rate(tag_background_fetches_triggered_total[5m])
```

## Revalidation metrics[​](#revalidation-metrics "Direct link to Revalidation metrics")

### tag\_revalidations\_triggered\_total[​](#tag_revalidations_triggered_total "Direct link to tag_revalidations_triggered_total")

**Type:** Counter — cache revalidation attempts (conditional GET/HEAD to upstream).

### tag\_revalidations\_not\_modified\_total[​](#tag_revalidations_not_modified_total "Direct link to tag_revalidations_not_modified_total")

**Type:** Counter — revalidations where upstream returned 304 Not Modified.

### tag\_revalidations\_updated\_total[​](#tag_revalidations_updated_total "Direct link to tag_revalidations_updated_total")

**Type:** Counter — revalidations where upstream returned 200 with new data.

### tag\_revalidations\_failed\_total[​](#tag_revalidations_failed_total "Direct link to tag_revalidations_failed_total")

**Type:** Counter — revalidations that failed due to errors.

### tag\_revalidations\_stale\_served\_total[​](#tag_revalidations_stale_served_total "Direct link to tag_revalidations_stale_served_total")

**Type:** Counter — times stale cached data was served because revalidation failed.

```
# Revalidation 304 ratio (higher = better cache freshness)

rate(tag_revalidations_not_modified_total[5m]) /

rate(tag_revalidations_triggered_total[5m])



# Stale serve ratio (should be low)

rate(tag_revalidations_stale_served_total[5m]) /

rate(tag_revalidations_triggered_total[5m])
```

## Upstream metrics[​](#upstream-metrics "Direct link to Upstream metrics")

### tag\_upstream\_request\_duration\_seconds[​](#tag_upstream_request_duration_seconds "Direct link to tag_upstream_request_duration_seconds")

**Type:** Histogram — upstream (Tigris) request duration in seconds.

| Label    | Description                                 |
| -------- | ------------------------------------------- |
| `method` | HTTP method: `GET`, `PUT`, `DELETE`, `HEAD` |

### tag\_upstream\_errors\_total[​](#tag_upstream_errors_total "Direct link to tag_upstream_errors_total")

**Type:** Counter — total upstream errors.

| Label    | Description |
| -------- | ----------- |
| `method` | HTTP method |

## Authentication metrics[​](#authentication-metrics "Direct link to Authentication metrics")

### tag\_auth\_failures\_total[​](#tag_auth_failures_total "Direct link to tag_auth_failures_total")

**Type:** Counter

| Label    | Description                                                   |
| -------- | ------------------------------------------------------------- |
| `reason` | Failure reason: `invalid_signature`, `unknown_key`, `expired` |

### tag\_local\_auth\_validations\_total[​](#tag_local_auth_validations_total "Direct link to tag_local_auth_validations_total")

**Type:** Counter — local authentication validation attempts in transparent proxy mode.

| Label    | Description                                                                                                       |
| -------- | ----------------------------------------------------------------------------------------------------------------- |
| `result` | Validation result: `success`, `missing_auth`, `parse_error`, `unknown_key`, `signature_mismatch`, `authz_expired` |

```
# Local auth success rate

rate(tag_local_auth_validations_total{result="success"}[5m]) /

sum(rate(tag_local_auth_validations_total[5m]))



# Auth failure breakdown by reason

sum by (result) (rate(tag_local_auth_validations_total{result!="success"}[5m]))
```

### tag\_derived\_key\_store\_size[​](#tag_derived_key_store_size "Direct link to tag_derived_key_store_size")

**Type:** Gauge — number of derived signing keys currently stored. TAG learns signing keys from Tigris responses and caches them for local SigV4 validation. A value of 0 after receiving requests indicates key learning is not working.

### tag\_authz\_cache\_size[​](#tag_authz_cache_size "Direct link to tag_authz_cache_size")

**Type:** Gauge — number of active per-bucket authorization cache entries (`accessKey × bucket` pairs). Each entry represents a client that has been granted access to a specific bucket.

### tag\_proxy\_signing\_keys\_received\_total[​](#tag_proxy_signing_keys_received_total "Direct link to tag_proxy_signing_keys_received_total")

**Type:** Counter — number of signing key sets received from Tigris responses. Incremented each time Tigris returns an `X-Tigris-Proxy-Signing-Keys` header that TAG uses to enable local validation.

```
# Rate of new key learning events

rate(tag_proxy_signing_keys_received_total[5m])
```

## Connection metrics[​](#connection-metrics "Direct link to Connection metrics")

### tag\_active\_connections[​](#tag_active_connections "Direct link to tag_active_connections")

**Type:** Gauge — number of active connections.

### tag\_bytes\_transferred\_total[​](#tag_bytes_transferred_total "Direct link to tag_bytes_transferred_total")

**Type:** Counter — total bytes transferred.

| Label       | Description                     |
| ----------- | ------------------------------- |
| `direction` | Transfer direction: `in`, `out` |

```
# Throughput (bytes/sec)

rate(tag_bytes_transferred_total[5m])



# Outbound throughput

rate(tag_bytes_transferred_total{direction="out"}[5m])
```

## Prometheus scrape configuration[​](#prometheus-scrape-configuration "Direct link to Prometheus scrape configuration")

```
scrape_configs:

  - job_name: "tag"

    kubernetes_sd_configs:

      - role: pod

    relabel_configs:

      - source_labels: [__meta_kubernetes_pod_label_app]

        action: keep

        regex: tag

      - source_labels: [__meta_kubernetes_pod_container_port_number]

        action: keep

        regex: "8080"
```
