Multimodal SFT Example
Multimodal SFT Example
This example shows an SFT dataset with images, audio, and documents.
File Structure
multimodal-sft/
metadata.json
assets.jsonl
data/
train.jsonl
metadata.json
{
"schema_version": "llm-training-data/v1",
"dataset": {
"name": "multimodal-sft-example",
"description": "Example multimodal SFT dataset with images and audio"
},
"defaults": {
"system": [
{ "type": "text", "text": "You are a helpful assistant that can analyze images, audio, and documents." }
]
},
"asset_index": {
"path": "assets.jsonl",
"format": "jsonl"
},
"files": [
{
"split": "train",
"objective": "sft",
"shards": [
{ "path": "data/train.jsonl" }
]
}
]
}
assets.jsonl
{"asset_id":"img-sunset","uri":"s3://bucket/images/sunset.jpg","mime_type":"image/jpeg","sha256":"aaa...","bytes":245000}
{"asset_id":"img-chart","uri":"s3://bucket/images/sales-chart.png","mime_type":"image/png","sha256":"bbb...","bytes":128000}
{"asset_id":"aud-speech","uri":"s3://bucket/audio/speech.wav","mime_type":"audio/wav","sha256":"ccc...","bytes":1024000}
{"asset_id":"doc-report","uri":"s3://bucket/docs/quarterly-report.pdf","mime_type":"application/pdf","sha256":"ddd...","bytes":2048000}
data/train.jsonl
Image Description
{
"id": "mm-001",
"messages": [
{
"role": "user",
"content": [
{ "type": "text", "text": "Describe what you see in this image." },
{ "type": "image", "ref": { "asset_id": "img-sunset" } }
]
},
{
"role": "assistant",
"content": [
{
"type": "text",
"text": "The image shows a beautiful sunset over the ocean. The sky is painted in vibrant shades of orange, pink, and purple. The sun is partially visible on the horizon, casting a golden reflection across the calm water. A few clouds are scattered across the sky, catching the last rays of sunlight."
}
]
}
]
}
Chart Analysis
{
"id": "mm-002",
"messages": [
{
"role": "user",
"content": [
{ "type": "text", "text": "Analyze this sales chart and summarize the trends." },
{ "type": "image", "ref": { "asset_id": "img-chart" } }
]
},
{
"role": "assistant",
"content": [
{
"type": "text",
"text": "Based on the sales chart:\n\n**Key Trends:**\n- Q1: Steady growth of 15% over the previous quarter\n- Q2: Significant spike, reaching the year's peak\n- Q3: Slight decline, likely seasonal\n- Q4: Recovery trend beginning\n\n**Insights:**\n- The company shows overall positive growth trajectory\n- Q2 performance suggests successful marketing campaigns\n- Q3 dip is consistent with industry patterns\n\nRecommendation: Focus on maintaining Q2-level engagement strategies."
}
]
}
]
}
Audio Transcription
{
"id": "mm-003",
"messages": [
{
"role": "user",
"content": [
{ "type": "text", "text": "Please transcribe this audio and summarize the main points." },
{ "type": "audio", "ref": { "asset_id": "aud-speech" } }
]
},
{
"role": "assistant",
"content": [
{
"type": "text",
"text": "**Transcription:**\n\"Good morning everyone. Today I'll be discussing our new product launch strategy. We've identified three key markets: North America, Europe, and Asia Pacific. Our goal is to achieve 20% market penetration in the first year.\"\n\n**Summary:**\n- Topic: New product launch strategy\n- Target markets: North America, Europe, Asia Pacific\n- Goal: 20% market penetration in year one"
}
]
}
]
}
Document Analysis
{
"id": "mm-004",
"messages": [
{
"role": "user",
"content": [
{ "type": "text", "text": "What are the key financial metrics from this report?" },
{ "type": "document", "ref": { "asset_id": "doc-report" } }
]
},
{
"role": "assistant",
"content": [
{
"type": "text",
"text": "**Key Financial Metrics from Q3 Report:**\n\n| Metric | Value | YoY Change |\n|--------|-------|------------|\n| Revenue | $45.2M | +12% |\n| Gross Margin | 68% | +3% |\n| Operating Income | $8.1M | +18% |\n| Net Income | $6.2M | +15% |\n\n**Highlights:**\n- Strong revenue growth driven by enterprise segment\n- Improved margins from operational efficiency\n- Cash position remains healthy at $28M"
}
]
}
]
}
Multiple Images
{
"id": "mm-005",
"messages": [
{
"role": "user",
"content": [
{ "type": "text", "text": "Compare these two images:" },
{ "type": "image", "ref": { "asset_id": "img-sunset" } },
{ "type": "image", "ref": { "asset_id": "img-chart" } }
]
},
{
"role": "assistant",
"content": [
{
"type": "text",
"text": "These images are quite different in nature:\n\n**Image 1 (Sunset):**\n- Type: Photograph\n- Subject: Natural landscape\n- Mood: Peaceful, aesthetic\n- Use: Art, decoration, relaxation\n\n**Image 2 (Chart):**\n- Type: Data visualization\n- Subject: Business metrics\n- Mood: Professional, analytical\n- Use: Reports, presentations, decision-making\n\nThe contrast highlights how visual content serves vastly different purposes."
}
]
}
]
}
Full JSONL File
{"id":"mm-001","messages":[{"role":"user","content":[{"type":"text","text":"Describe what you see in this image."},{"type":"image","ref":{"asset_id":"img-sunset"}}]},{"role":"assistant","content":[{"type":"text","text":"The image shows a beautiful sunset over the ocean..."}]}]}
{"id":"mm-002","messages":[{"role":"user","content":[{"type":"text","text":"Analyze this sales chart."},{"type":"image","ref":{"asset_id":"img-chart"}}]},{"role":"assistant","content":[{"type":"text","text":"Based on the sales chart..."}]}]}
{"id":"mm-003","messages":[{"role":"user","content":[{"type":"text","text":"Transcribe this audio."},{"type":"audio","ref":{"asset_id":"aud-speech"}}]},{"role":"assistant","content":[{"type":"text","text":"Transcription: Good morning everyone..."}]}]}