Skip to Content
Media

Media

Every file that flows through Glow — audio attempts, scenario illustrations, document PDFs, scenario videos, raw text uploads — funnels through a single upload_id primitive and a small grid of per-modality routes that hang off the artifact each upload belongs to.

The upload_id pattern

Every byte that lands in Glow is registered as a single uploads_entry row and surfaced to clients as an opaque upload_id UUID. Three things use it:

  1. Upload endpoints return it after multipart ingest (or after promoting a pre-reserved upload — see attempt audio below).
  2. Download / preview endpoints take it back in a JSON request body and stream the bytes (or a rendered preview) out.
  3. Message items reference it inside <kind>_upload_ids arrays on GroupDetailMessageItem, so an assistant turn that produced an audio clip + an image carries both audio_upload_ids and image_upload_ids for the client to fetch.

The same upload_id is valid across every download surface that serves its kind — see the matrix below for which artifacts host which kinds. When in doubt, hit /system/<kind>_download with the id; it’s the canonical mirror of every download family.


Uploading

Uploads are multipart/form-data POSTs against the artifact that owns the eventual binding. The route name is always <kind>_upload:

curl -X POST $GLOW_INSTANCE_URL/document/file_upload \ -H "Authorization: Bearer $GLOW_TOKEN" \ -F "file=@./paper.pdf"

The response carries the upload_id plus an artifact-side wrapper id (files_id, images_id, audios_id, …) for the resource ladder.

CLI ergonomics fan out to the same endpoints:

glow documents upload file ./paper.pdf glow scenarios upload image ./classroom.png glow scenarios upload video ./role-play.mp4 glow documents upload text ./transcript.txt glow attempts upload audio ./reply.webm

Per-modality content-type allowlists are enforced before any bytes are written:

UploadAccepted MIME types
audio_uploadaudio/{mpeg,mp3,wav,ogg,webm,flac,aac,x-m4a,mp4,x-wav}
image_uploadimage/{png,jpeg,gif,svg+xml,webp,bmp,tiff}
video_uploadvideo/*
file_uploadany (catch-all binary)
text_uploadtext/{plain,html,csv,markdown,xml}, application/{json,xml}

Codec parameters (audio/webm;codecs=opus from MediaRecorder) are stripped before the check; the bare type is persisted.

Attempt audio — promoting a pre-reserved upload

The attempt audio route is the one upload that accepts a second shape: pass ?upload_id=<uuid> to promote an existing raw upload (typically captured by the realtime adapter) into the full audio chain. Three shapes total:

  • file only — write bytes + build the chain on top
  • upload_id only — reuse an existing upload, stack resource + junctions
  • upload_id + file — client pre-reserved a slot via /attempt/audio/new and is filling it

Response is always {audio_id, audios_id, upload_id}.


Downloading

Downloads are POSTs (id in the JSON body, not the URL) and return bytes via StreamingResponse with full HTTP Range support — so audio scrubbing, video seeking, and partial PDF loads work without a custom protocol:

curl -X POST $GLOW_INSTANCE_URL/system/audio_download \ -H "Authorization: Bearer $GLOW_TOKEN" \ -H "Range: bytes=0-65535" \ -d '{"audio_id": "<upload_id>"}' \ --output chunk.webm

Range semantics:

  • No Range header200 OK, full body, Accept-Ranges: bytes.
  • Range: bytes=START-END206 Partial Content, Content-Range: bytes START-END/TOTAL, Content-Length: N.
  • Out-of-bounds ends clamp to file_size - 1; out-of-bounds starts reset to 0 (no 416 today).

Content-Disposition is always inline with a percent-encoded filename.

System fallback

Every download family — audio, image, video, file, text, call — exists on /system as the canonical mirror. Per-artifact routes are convenience aliases so audit logs attribute the read to the artifact context (attempt / scenario / document). The wire shape is identical:

POST /attempt/file_download { "file_id": "<id>" } # audited as attempt POST /system/file_download { "file_id": "<id>" } # canonical

Body is always {"<kind>_id": "<upload_id>"}audio_id, file_id, image_id, video_id, text_id, call_id.


File preview

POST /<art>/file_preview returns a PNG rendering of the first page of a previewable upload — today, PDFs. Useful for document tiles and inline message previews.

curl -X POST $GLOW_INSTANCE_URL/system/file_preview \ -H "Authorization: Bearer $GLOW_TOKEN" \ -d '{"file_id": "<upload_id>"}' \ --output preview.png

Response is image/png with Cache-Control: private, max-age=3600, must-revalidate. Hosted on system, attempt, scenario, and document — the four artifacts that actually carry file uploads.


How media attaches to messages

Inside a GroupDetailMessageItem (see Group for the full shape), uploads are referenced by one array per kind:

{ "id": "<message_id>", "role": "user", "text_upload_ids": ["<upload_id>"], "audio_upload_ids": ["<upload_id>"], "image_upload_ids": [], "video_upload_ids": [], "file_upload_ids": ["<upload_id>"] }

When a client sends a chat turn that includes media, it uploads first, collects the upload_ids, and passes them on the chat_message request. Clients render media by walking each array and issuing a <kind>_download (or file_preview) per id.


Per-artifact matrix

Which media operations exist on which artifact. D = download, U = upload, P = file_preview, blank = not hosted.

Artifactaudioimagevideofiletextcallpreview
system (canonical)DDDDDDP
attemptD + UDDDDDP
scenarioD + UD + UDDDP
documentD + UD + UDP
every other artifact*DDD

* agent, auth, cohort, department, eval, field, model, parameter, persona, profile, provider, rubric, setting, simulation, test, tool — all expose the cross-cutting trio (call_download, file_download, text_download) so server-rendered exports, generation call replays, and inline text reads work consistently. Uploads for those kinds go through attempt / scenario / document.

Reading the canonical rows:

  • system is the canonical full-download set + preview — hit it when you have an upload_id and don’t care about audit attribution.
  • attempt is the only artifact that hosts audio_upload — that’s where learner replies land.
  • scenario owns the visual upload surface (image_upload, video_upload).
  • document is the text/file shop — both upload kinds plus full text/file downloads.

  • Streaming — SSE / WebSocket surfaces for generation events; realtime adapter audio chunks ride alongside the upload_id flow above.
  • Chat — where audio uploads attach to a live attempt turn.
  • GroupGroupDetailMessageItem shape and where <kind>_upload_ids arrays show up on the wire.
  • API Reference — POST /system/audio_download — schema for the canonical download.
  • Documents — owns file + text upload UX.
  • Scenarios — owns image + video upload UX.
Last updated on