Content Parts

Message content is an array of content parts. Each part has a type and type-specific fields.

Content Part Types

Type	Description	Required Fields
`text`	Plain text	`text`
`reasoning`	Chain-of-thought	`text`
`json`	Structured data	`data`
`image`	Image reference	`ref`
`audio`	Audio reference	`ref`
`video`	Video reference	`ref`
`document`	Document reference	`ref`
`tool_call`	Tool invocation	`name`, `call_id`, `arguments`
`tool_result`	Tool output	`name`, `call_id`, `result`

Text

Plain text content:

{ "type": "text", "text": "Hello, world!" }

Reasoning

Chain-of-thought or scratchpad content. Use reasoning_policy in defaults to control training behavior:

{
  "type": "reasoning",
  "text": "Let me think through this step by step. First, I need to..."
}

JSON

Structured data output:

{
  "type": "json",
  "data": {
    "name": "John",
    "age": 30,
    "city": "New York"
  }
}

Media Types

Images, audio, video, and documents reference external files:

{
  "type": "image",
  "ref": { "asset_id": "img-001" },
  "mime_type": "image/jpeg"
}

Or with direct URI:

{
  "type": "audio",
  "ref": { "uri": "s3://bucket/audio/clip.wav" },
  "mime_type": "audio/wav",
  "sha256": "abc123...",
  "bytes": 123456
}

Reference Options

Field	Description
`ref.asset_id`	Reference to assets.jsonl entry (preferred)
`ref.uri`	Direct URI to the file
`mime_type`	MIME type of the content
`sha256`	SHA-256 hash for verification
`bytes`	File size in bytes

Tool Call

Invoke a tool:

{
  "type": "tool_call",
  "name": "calculator",
  "call_id": "calc-001",
  "arguments": { "expression": "2 + 2" }
}

Field	Required	Description
`name`	Yes	Tool name (must match toolset)
`call_id`	Yes	Unique identifier for this call
`arguments`	Yes	Tool input arguments

Tool Result

Return from a tool execution:

{
  "type": "tool_result",
  "name": "calculator",
  "call_id": "calc-001",
  "result": { "value": 4 }
}

Field	Required	Description
`name`	Yes	Tool name
`call_id`	Yes	Matches the tool_call
`result`	Yes	Tool output

Mixed Content

Messages can contain multiple content parts:

{
  "role": "user",
  "content": [
    { "type": "text", "text": "Describe this image:" },
    { "type": "image", "ref": { "asset_id": "img-001" } }
  ]
}

{
  "role": "assistant",
  "content": [
    { "type": "reasoning", "text": "I should analyze the visual elements..." },
    { "type": "text", "text": "The image shows a sunset over the ocean." }
  ]
}

Metadata

Any content part can include metadata:

{
  "type": "text",
  "text": "Hello!",
  "metadata": {
    "source": "human",
    "confidence": 0.95
  }
}