Media
Every file that flows through Glow — audio attempts, scenario
illustrations, document PDFs, scenario videos, raw text uploads —
funnels through a single upload_id primitive and a small grid of
per-modality routes that hang off the artifact each upload belongs to.
The upload_id pattern
Every byte that lands in Glow is registered as a single uploads_entry
row and surfaced to clients as an opaque upload_id UUID. Three
things use it:
- Upload endpoints return it after multipart ingest (or after promoting a pre-reserved upload — see attempt audio below).
- Download / preview endpoints take it back in a JSON request body and stream the bytes (or a rendered preview) out.
- Message items reference it inside
<kind>_upload_idsarrays onGroupDetailMessageItem, so an assistant turn that produced an audio clip + an image carries bothaudio_upload_idsandimage_upload_idsfor the client to fetch.
The same upload_id is valid across every download surface that
serves its kind — see the matrix below for which artifacts host
which kinds. When in doubt, hit /system/<kind>_download with the
id; it’s the canonical mirror of every download family.
Uploading
Uploads are multipart/form-data POSTs against the artifact that
owns the eventual binding. The route name is always <kind>_upload:
curl -X POST $GLOW_INSTANCE_URL/document/file_upload \
-H "Authorization: Bearer $GLOW_TOKEN" \
-F "file=@./paper.pdf"The response carries the upload_id plus an artifact-side wrapper id
(files_id, images_id, audios_id, …) for the resource ladder.
CLI ergonomics fan out to the same endpoints:
glow documents upload file ./paper.pdf
glow scenarios upload image ./classroom.png
glow scenarios upload video ./role-play.mp4
glow documents upload text ./transcript.txt
glow attempts upload audio ./reply.webmPer-modality content-type allowlists are enforced before any bytes are written:
| Upload | Accepted MIME types |
|---|---|
audio_upload | audio/{mpeg,mp3,wav,ogg,webm,flac,aac,x-m4a,mp4,x-wav} |
image_upload | image/{png,jpeg,gif,svg+xml,webp,bmp,tiff} |
video_upload | video/* |
file_upload | any (catch-all binary) |
text_upload | text/{plain,html,csv,markdown,xml}, application/{json,xml} |
Codec parameters (audio/webm;codecs=opus from MediaRecorder) are
stripped before the check; the bare type is persisted.
Attempt audio — promoting a pre-reserved upload
The attempt audio route is the one upload that accepts a second
shape: pass ?upload_id=<uuid> to promote an existing raw upload
(typically captured by the realtime adapter) into the full audio
chain. Three shapes total:
fileonly — write bytes + build the chain on topupload_idonly — reuse an existing upload, stack resource + junctionsupload_id + file— client pre-reserved a slot via/attempt/audio/newand is filling it
Response is always {audio_id, audios_id, upload_id}.
Downloading
Downloads are POSTs (id in the JSON body, not the URL) and return
bytes via StreamingResponse with full HTTP Range support — so audio
scrubbing, video seeking, and partial PDF loads work without a custom
protocol:
curl -X POST $GLOW_INSTANCE_URL/system/audio_download \
-H "Authorization: Bearer $GLOW_TOKEN" \
-H "Range: bytes=0-65535" \
-d '{"audio_id": "<upload_id>"}' \
--output chunk.webmRange semantics:
- No
Rangeheader →200 OK, full body,Accept-Ranges: bytes. Range: bytes=START-END→206 Partial Content,Content-Range: bytes START-END/TOTAL,Content-Length: N.- Out-of-bounds ends clamp to
file_size - 1; out-of-bounds starts reset to0(no416today).
Content-Disposition is always inline with a percent-encoded
filename.
System fallback
Every download family — audio, image, video, file, text,
call — exists on /system as the canonical mirror. Per-artifact
routes are convenience aliases so audit logs attribute the read to
the artifact context (attempt / scenario / document). The wire shape
is identical:
POST /attempt/file_download { "file_id": "<id>" } # audited as attempt
POST /system/file_download { "file_id": "<id>" } # canonicalBody is always {"<kind>_id": "<upload_id>"} — audio_id, file_id,
image_id, video_id, text_id, call_id.
File preview
POST /<art>/file_preview returns a PNG rendering of the first page
of a previewable upload — today, PDFs. Useful for document tiles and
inline message previews.
curl -X POST $GLOW_INSTANCE_URL/system/file_preview \
-H "Authorization: Bearer $GLOW_TOKEN" \
-d '{"file_id": "<upload_id>"}' \
--output preview.pngResponse is image/png with Cache-Control: private, max-age=3600, must-revalidate. Hosted on system, attempt, scenario, and
document — the four artifacts that actually carry file uploads.
How media attaches to messages
Inside a GroupDetailMessageItem (see Group for the full
shape), uploads are referenced by one array per kind:
{
"id": "<message_id>",
"role": "user",
"text_upload_ids": ["<upload_id>"],
"audio_upload_ids": ["<upload_id>"],
"image_upload_ids": [],
"video_upload_ids": [],
"file_upload_ids": ["<upload_id>"]
}When a client sends a chat turn that includes media, it uploads first,
collects the upload_ids, and passes them on the chat_message
request. Clients render media by walking each array and issuing a
<kind>_download (or file_preview) per id.
Per-artifact matrix
Which media operations exist on which artifact. D = download, U = upload, P = file_preview, blank = not hosted.
| Artifact | audio | image | video | file | text | call | preview |
|---|---|---|---|---|---|---|---|
| system (canonical) | D | D | D | D | D | D | P |
| attempt | D + U | D | D | D | D | D | P |
| scenario | D + U | D + U | D | D | D | P | |
| document | D + U | D + U | D | P | |||
| every other artifact* | D | D | D |
* agent, auth, cohort, department, eval, field, model,
parameter, persona, profile, provider, rubric, setting,
simulation, test, tool — all expose the cross-cutting trio
(call_download, file_download, text_download) so server-rendered
exports, generation call replays, and inline text reads work
consistently. Uploads for those kinds go through attempt /
scenario / document.
Reading the canonical rows:
- system is the canonical full-download set + preview — hit it
when you have an
upload_idand don’t care about audit attribution. - attempt is the only artifact that hosts
audio_upload— that’s where learner replies land. - scenario owns the visual upload surface (
image_upload,video_upload). - document is the text/file shop — both upload kinds plus full text/file downloads.
Related
- Streaming — SSE / WebSocket surfaces for generation events; realtime adapter audio chunks ride alongside the upload_id flow above.
- Chat — where audio uploads attach to a live attempt turn.
- Group —
GroupDetailMessageItemshape and where<kind>_upload_idsarrays show up on the wire. - API Reference —
POST /system/audio_download— schema for the canonical download. - Documents — owns file + text upload UX.
- Scenarios — owns image + video upload UX.