Preference Training Example

Preference Training Example

This example shows a preference training dataset for DPO or RLHF.

File Structure

preference-dataset/
  metadata.json
  data/
    pref-train.jsonl

metadata.json

{
  "schema_version": "llm-training-data/v1",
  "dataset": {
    "name": "preference-training-example",
    "description": "Example dataset for DPO/preference training"
  },
  "defaults": {
    "system": [
      { "type": "text", "text": "You are a helpful, harmless, and honest assistant." }
    ]
  },
  "files": [
    {
      "split": "train",
      "objective": "preference",
      "shards": [
        { "path": "data/pref-train.jsonl" }
      ]
    }
  ]
}

data/pref-train.jsonl

{"id":"pref-001","messages":[{"role":"user","content":[{"type":"text","text":"How do I make a good first impression?"}]}],"candidates":[{"label":"chosen","messages":[{"role":"assistant","content":[{"type":"text","text":"Here are some tips for making a good first impression:\n\n1. **Smile and make eye contact** - Shows confidence and friendliness\n2. **Listen actively** - Show genuine interest in others\n3. **Use a firm handshake** - Conveys confidence\n4. **Remember names** - People appreciate being remembered\n5. **Be positive** - Avoid complaining or negativity\n\nFirst impressions form quickly, so be mindful of your body language and tone."}]}]},{"label":"rejected","messages":[{"role":"assistant","content":[{"type":"text","text":"Just be yourself I guess. Smile or whatever."}]}]}]}
{"id":"pref-002","messages":[{"role":"user","content":[{"type":"text","text":"Explain blockchain technology."}]}],"candidates":[{"label":"chosen","messages":[{"role":"assistant","content":[{"type":"text","text":"Blockchain is a distributed digital ledger technology. Here's how it works:\n\n**Core concepts:**\n- **Blocks**: Groups of transactions bundled together\n- **Chain**: Blocks linked cryptographically in sequence\n- **Distributed**: Copies exist across many computers\n- **Immutable**: Once recorded, data is very hard to change\n\n**Key benefits:**\n- Transparency - Anyone can verify transactions\n- Security - Cryptographic protection\n- Decentralization - No single point of failure\n\nBlockchain powers cryptocurrencies like Bitcoin, but is also used for supply chain tracking, voting systems, and more."}]}]},{"label":"rejected","messages":[{"role":"assistant","content":[{"type":"text","text":"Blockchain is what Bitcoin uses. It's like a database but different. Very technical stuff."}]}]}]}

Pretty-Printed Record

{
  "id": "pref-001",
  "messages": [
    {
      "role": "user",
      "content": [
        { "type": "text", "text": "How do I make a good first impression?" }
      ]
    }
  ],
  "candidates": [
    {
      "label": "chosen",
      "messages": [
        {
          "role": "assistant",
          "content": [
            {
              "type": "text",
              "text": "Here are some tips for making a good first impression:\n\n1. **Smile and make eye contact**..."
            }
          ]
        }
      ]
    },
    {
      "label": "rejected",
      "messages": [
        {
          "role": "assistant",
          "content": [
            { "type": "text", "text": "Just be yourself I guess. Smile or whatever." }
          ]
        }
      ]
    }
  ]
}

Multi-Turn Context

Preference with conversation history:

{
  "id": "pref-003",
  "messages": [
    {
      "role": "user",
      "content": [{ "type": "text", "text": "I'm planning a trip to Japan." }]
    },
    {
      "role": "assistant",
      "content": [{ "type": "text", "text": "That sounds exciting! When are you planning to go?" }]
    },
    {
      "role": "user",
      "content": [{ "type": "text", "text": "In April. What should I see?" }]
    }
  ],
  "candidates": [
    {
      "label": "chosen",
      "messages": [
        {
          "role": "assistant",
          "content": [
            {
              "type": "text",
              "text": "April is perfect for cherry blossom season! Here are must-see destinations:\n\n**Tokyo**: Ueno Park, Shinjuku Gyoen for sakura viewing\n**Kyoto**: Philosopher's Path, Maruyama Park\n**Osaka**: Osaka Castle Park\n\nBook accommodations early as it's peak season. Would you like specific recommendations for any city?"
            }
          ]
        }
      ]
    },
    {
      "label": "rejected",
      "messages": [
        {
          "role": "assistant",
          "content": [{ "type": "text", "text": "Tokyo and Kyoto are popular." }]
        }
      ]
    }
  ]
}

Ranked Preferences

With multiple candidates and rankings:

{
  "id": "pref-ranked",
  "messages": [
    {
      "role": "user",
      "content": [{ "type": "text", "text": "Write a short poem about the ocean." }]
    }
  ],
  "candidates": [
    {
      "label": "chosen",
      "rank": 1,
      "score": 0.95,
      "messages": [
        {
          "role": "assistant",
          "content": [
            {
              "type": "text",
              "text": "Waves dance beneath the silver moon,\nSalt air carries an ancient tune,\nDepths hold secrets yet untold,\nStories of sailors, brave and bold."
            }
          ]
        }
      ]
    },
    {
      "label": "neutral",
      "rank": 2,
      "score": 0.70,
      "messages": [
        {
          "role": "assistant",
          "content": [
            {
              "type": "text",
              "text": "The ocean is blue and wide,\nWith many fish inside,\nWaves crash on the shore,\nI'd like to see some more."
            }
          ]
        }
      ]
    },
    {
      "label": "rejected",
      "rank": 3,
      "score": 0.30,
      "messages": [
        {
          "role": "assistant",
          "content": [{ "type": "text", "text": "Ocean poem: water, fish, waves, blue. The end." }]
        }
      ]
    }
  ]
}

Requirements

  • At least one candidate with label: "chosen"
  • At least one candidate with label: "rejected"
  • Minimum 2 candidates total