Benchmark

The Benchmark resource provides test performance analytics across your Glow instance. Use it to review eval scores, track test history, and export data for reporting. Benchmark is an API-only resource with no CLI commands.

What is Benchmark?

Benchmark aggregates test and eval performance data into a single analytics surface. It collects scores, pass rates, invocation counts, and test history across departments and date ranges. Administrators and instructors can use benchmark data to compare eval performance over time, identify trends, and export results for institutional reporting.

Each benchmark response includes eval cards (high-level performance summaries), paginated test history, department filters, and inline analytics facets for client-side rendering.

Quick Start

API

Fetch benchmark data for the current semester:

Calls below use $GLOW_INSTANCE_URL + $GLOW_TOKEN — see Authentication to export them once.


curl -X POST $GLOW_INSTANCE_URL/test/benchmark \
  -H "Authorization: Bearer $GLOW_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "start_date": "2026-01-15",
    "end_date": "2026-05-15"
  }'

The response includes evals (eval performance cards), departments, history (paginated test runs), and analytics (filter facets).

Filter by department:


curl -X POST $GLOW_INSTANCE_URL/test/benchmark \
  -H "Authorization: Bearer $GLOW_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "start_date": "2026-01-15",
    "end_date": "2026-05-15",
    "department_ids": ["dept-nursing-101"]
  }'

Run execution

Searching Test History

Use POST /test/search to paginate and filter test history independently of the full benchmark payload:


curl -X POST $GLOW_INSTANCE_URL/test/search \
  -H "Authorization: Bearer $GLOW_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "history_page": 1,
    "history_page_size": 25,
    "history_search": "midterm",
    "history_sort_by": "created_at",
    "history_sort_order": "desc"
  }'

The response returns data (an array of BenchmarkHistoryItem objects), total_count, pagination fields, and eval_options for dropdown filters.

Filtering by Eval and Archive Status


curl -X POST $GLOW_INSTANCE_URL/test/search \
  -H "Authorization: Bearer $GLOW_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "history_eval_ids": ["eval-abc-123"],
    "history_archived": false,
    "history_page": 1,
    "history_page_size": 50
  }'

Refreshing and Exporting

Refresh materialized views to ensure benchmark data is up to date. This invalidates caches and rebuilds aggregated views:


curl -X POST $GLOW_INSTANCE_URL/test/refresh \
  -H "Authorization: Bearer $GLOW_TOKEN"

Returns success, refreshed_views, and invalidated_tags.

Export benchmark data as a denormalized ZIP file for offline analysis or institutional reporting:


curl -X POST $GLOW_INSTANCE_URL/test/export \
  -H "Authorization: Bearer $GLOW_TOKEN"

The response contains content (base64-encoded ZIP), file_name, mime_type, and row_count.

Common Operations

Operation	Method	Endpoint
Get benchmark data	`POST`	`POST /test/benchmark`
Search test history	`POST`	`POST /test/search`
Refresh views	`POST`	`POST /test/refresh`
Export data (ZIP)	`POST`	`POST /test/export`
Get documentation	`POST`	`/benchmark/docs`

Benchmark API
Health Guide — service-level monitoring