Multimodal Data

Multimodal Data

LLM Training Data supports rich multimodal content including images, audio, video, and documents.

Supported Media Types

TypeContent PartCommon Formats
ImagesimageJPEG, PNG, WebP, GIF
AudioaudioWAV, MP3, FLAC, OGG
VideovideoMP4, WebM, AVI
DocumentsdocumentPDF, DOCX

Basic Multimodal Record

{
  "id": "mm-001",
  "messages": [
    {
      "role": "user",
      "content": [
        { "type": "text", "text": "What's in this image?" },
        { "type": "image", "ref": { "asset_id": "img-001" } }
      ]
    },
    {
      "role": "assistant",
      "content": [
        { "type": "text", "text": "The image shows a golden retriever playing in a park." }
      ]
    }
  ]
}

Asset Management

For large datasets, use assets.jsonl to deduplicate media references:

assets.jsonl:

{"asset_id":"img-001","uri":"s3://bucket/images/dog.jpg","mime_type":"image/jpeg","sha256":"abc...","bytes":123456}
{"asset_id":"img-002","uri":"s3://bucket/images/cat.png","mime_type":"image/png","sha256":"def...","bytes":234567}

Record:

{
  "type": "image",
  "ref": { "asset_id": "img-001" }
}

Direct URI References

For one-off media, use direct URIs:

{
  "type": "image",
  "ref": { "uri": "s3://bucket/unique-image.jpg" },
  "mime_type": "image/jpeg",
  "sha256": "ghi...",
  "bytes": 345678
}

Audio Transcription Example

{
  "id": "audio-001",
  "messages": [
    {
      "role": "user",
      "content": [
        { "type": "text", "text": "Transcribe this audio clip:" },
        { "type": "audio", "ref": { "asset_id": "aud-001" }, "mime_type": "audio/wav" }
      ]
    },
    {
      "role": "assistant",
      "content": [
        { "type": "text", "text": "The speaker says: 'Hello and welcome to our podcast.'" }
      ]
    }
  ]
}

Video Understanding Example

{
  "id": "video-001",
  "messages": [
    {
      "role": "user",
      "content": [
        { "type": "text", "text": "Describe what happens in this video:" },
        { "type": "video", "ref": { "asset_id": "vid-001" }, "mime_type": "video/mp4" }
      ]
    },
    {
      "role": "assistant",
      "content": [
        { "type": "text", "text": "The video shows a chef preparing a salad. First, they wash the lettuce, then chop tomatoes and cucumbers, and finally toss everything with dressing." }
      ]
    }
  ]
}

Document Analysis Example

{
  "id": "doc-001",
  "messages": [
    {
      "role": "user",
      "content": [
        { "type": "text", "text": "What is the main topic of this document?" },
        { "type": "document", "ref": { "asset_id": "doc-001" }, "mime_type": "application/pdf" }
      ]
    },
    {
      "role": "assistant",
      "content": [
        { "type": "text", "text": "This document is a quarterly financial report covering revenue, expenses, and profit margins for Q3 2024." }
      ]
    }
  ]
}

Multiple Media in One Message

{
  "role": "user",
  "content": [
    { "type": "text", "text": "Compare these two images:" },
    { "type": "image", "ref": { "asset_id": "img-before" } },
    { "type": "image", "ref": { "asset_id": "img-after" } }
  ]
}

Interleaved Text and Media

{
  "role": "user",
  "content": [
    { "type": "text", "text": "Here's the first chart:" },
    { "type": "image", "ref": { "asset_id": "chart-1" } },
    { "type": "text", "text": "And here's the second:" },
    { "type": "image", "ref": { "asset_id": "chart-2" } },
    { "type": "text", "text": "What trends do you see?" }
  ]
}

Best Practices

  1. Optimize file sizes - Resize and compress media appropriately
  2. Use consistent formats - Standardize on a few formats per media type
  3. Include MIME types - Help processors handle files correctly
  4. Add checksums - Verify file integrity
  5. Deduplicate - Use assets.jsonl for repeated media
  6. Balance modalities - Ensure good coverage across media types