Multimodal SFT Example

Multimodal SFT Example

This example shows an SFT dataset with images, audio, and documents.

File Structure

multimodal-sft/
  metadata.json
  assets.jsonl
  data/
    train.jsonl

metadata.json

{
  "schema_version": "llm-training-data/v1",
  "dataset": {
    "name": "multimodal-sft-example",
    "description": "Example multimodal SFT dataset with images and audio"
  },
  "defaults": {
    "system": [
      { "type": "text", "text": "You are a helpful assistant that can analyze images, audio, and documents." }
    ]
  },
  "asset_index": {
    "path": "assets.jsonl",
    "format": "jsonl"
  },
  "files": [
    {
      "split": "train",
      "objective": "sft",
      "shards": [
        { "path": "data/train.jsonl" }
      ]
    }
  ]
}

assets.jsonl

{"asset_id":"img-sunset","uri":"s3://bucket/images/sunset.jpg","mime_type":"image/jpeg","sha256":"aaa...","bytes":245000}
{"asset_id":"img-chart","uri":"s3://bucket/images/sales-chart.png","mime_type":"image/png","sha256":"bbb...","bytes":128000}
{"asset_id":"aud-speech","uri":"s3://bucket/audio/speech.wav","mime_type":"audio/wav","sha256":"ccc...","bytes":1024000}
{"asset_id":"doc-report","uri":"s3://bucket/docs/quarterly-report.pdf","mime_type":"application/pdf","sha256":"ddd...","bytes":2048000}

data/train.jsonl

Image Description

{
  "id": "mm-001",
  "messages": [
    {
      "role": "user",
      "content": [
        { "type": "text", "text": "Describe what you see in this image." },
        { "type": "image", "ref": { "asset_id": "img-sunset" } }
      ]
    },
    {
      "role": "assistant",
      "content": [
        {
          "type": "text",
          "text": "The image shows a beautiful sunset over the ocean. The sky is painted in vibrant shades of orange, pink, and purple. The sun is partially visible on the horizon, casting a golden reflection across the calm water. A few clouds are scattered across the sky, catching the last rays of sunlight."
        }
      ]
    }
  ]
}

Chart Analysis

{
  "id": "mm-002",
  "messages": [
    {
      "role": "user",
      "content": [
        { "type": "text", "text": "Analyze this sales chart and summarize the trends." },
        { "type": "image", "ref": { "asset_id": "img-chart" } }
      ]
    },
    {
      "role": "assistant",
      "content": [
        {
          "type": "text",
          "text": "Based on the sales chart:\n\n**Key Trends:**\n- Q1: Steady growth of 15% over the previous quarter\n- Q2: Significant spike, reaching the year's peak\n- Q3: Slight decline, likely seasonal\n- Q4: Recovery trend beginning\n\n**Insights:**\n- The company shows overall positive growth trajectory\n- Q2 performance suggests successful marketing campaigns\n- Q3 dip is consistent with industry patterns\n\nRecommendation: Focus on maintaining Q2-level engagement strategies."
        }
      ]
    }
  ]
}

Audio Transcription

{
  "id": "mm-003",
  "messages": [
    {
      "role": "user",
      "content": [
        { "type": "text", "text": "Please transcribe this audio and summarize the main points." },
        { "type": "audio", "ref": { "asset_id": "aud-speech" } }
      ]
    },
    {
      "role": "assistant",
      "content": [
        {
          "type": "text",
          "text": "**Transcription:**\n\"Good morning everyone. Today I'll be discussing our new product launch strategy. We've identified three key markets: North America, Europe, and Asia Pacific. Our goal is to achieve 20% market penetration in the first year.\"\n\n**Summary:**\n- Topic: New product launch strategy\n- Target markets: North America, Europe, Asia Pacific\n- Goal: 20% market penetration in year one"
        }
      ]
    }
  ]
}

Document Analysis

{
  "id": "mm-004",
  "messages": [
    {
      "role": "user",
      "content": [
        { "type": "text", "text": "What are the key financial metrics from this report?" },
        { "type": "document", "ref": { "asset_id": "doc-report" } }
      ]
    },
    {
      "role": "assistant",
      "content": [
        {
          "type": "text",
          "text": "**Key Financial Metrics from Q3 Report:**\n\n| Metric | Value | YoY Change |\n|--------|-------|------------|\n| Revenue | $45.2M | +12% |\n| Gross Margin | 68% | +3% |\n| Operating Income | $8.1M | +18% |\n| Net Income | $6.2M | +15% |\n\n**Highlights:**\n- Strong revenue growth driven by enterprise segment\n- Improved margins from operational efficiency\n- Cash position remains healthy at $28M"
        }
      ]
    }
  ]
}

Multiple Images

{
  "id": "mm-005",
  "messages": [
    {
      "role": "user",
      "content": [
        { "type": "text", "text": "Compare these two images:" },
        { "type": "image", "ref": { "asset_id": "img-sunset" } },
        { "type": "image", "ref": { "asset_id": "img-chart" } }
      ]
    },
    {
      "role": "assistant",
      "content": [
        {
          "type": "text",
          "text": "These images are quite different in nature:\n\n**Image 1 (Sunset):**\n- Type: Photograph\n- Subject: Natural landscape\n- Mood: Peaceful, aesthetic\n- Use: Art, decoration, relaxation\n\n**Image 2 (Chart):**\n- Type: Data visualization\n- Subject: Business metrics\n- Mood: Professional, analytical\n- Use: Reports, presentations, decision-making\n\nThe contrast highlights how visual content serves vastly different purposes."
        }
      ]
    }
  ]
}

Full JSONL File

{"id":"mm-001","messages":[{"role":"user","content":[{"type":"text","text":"Describe what you see in this image."},{"type":"image","ref":{"asset_id":"img-sunset"}}]},{"role":"assistant","content":[{"type":"text","text":"The image shows a beautiful sunset over the ocean..."}]}]}
{"id":"mm-002","messages":[{"role":"user","content":[{"type":"text","text":"Analyze this sales chart."},{"type":"image","ref":{"asset_id":"img-chart"}}]},{"role":"assistant","content":[{"type":"text","text":"Based on the sales chart..."}]}]}
{"id":"mm-003","messages":[{"role":"user","content":[{"type":"text","text":"Transcribe this audio."},{"type":"audio","ref":{"asset_id":"aud-speech"}}]},{"role":"assistant","content":[{"type":"text","text":"Transcription: Good morning everyone..."}]}]}