Preference Records

Preference records are used for Direct Preference Optimization (DPO), RLHF, and reward model training. They contain a prompt context plus multiple candidate responses with preference labels.

Requirements

candidates array with at least 2 candidates
At least one chosen candidate
At least one rejected candidate

Basic Example

{
  "id": "pref-001",
  "messages": [
    {
      "role": "user",
      "content": [{ "type": "text", "text": "Explain quantum computing in simple terms." }]
    }
  ],
  "candidates": [
    {
      "candidate_id": "chosen-1",
      "label": "chosen",
      "messages": [
        {
          "role": "assistant",
          "content": [
            {
              "type": "text",
              "text": "Quantum computing uses quantum bits (qubits) that can be 0, 1, or both at once. This lets quantum computers solve certain problems much faster than regular computers by exploring many possibilities simultaneously."
            }
          ]
        }
      ]
    },
    {
      "candidate_id": "rejected-1",
      "label": "rejected",
      "messages": [
        {
          "role": "assistant",
          "content": [
            {
              "type": "text",
              "text": "Quantum computing is really complicated. It uses physics stuff to do calculations. You probably wouldn't understand it."
            }
          ]
        }
      ]
    }
  ]
}

Multi-Turn Context

The messages array provides context, then candidates provide competing continuations:

{
  "id": "pref-multi-001",
  "messages": [
    {
      "role": "user",
      "content": [{ "type": "text", "text": "I need help writing an email to my boss about taking time off." }]
    },
    {
      "role": "assistant",
      "content": [
        {
          "type": "text",
          "text": "I'd be happy to help! What dates are you planning to take off, and what's the reason?"
        }
      ]
    },
    {
      "role": "user",
      "content": [{ "type": "text", "text": "Next Friday through the following Monday. It's for my sister's wedding." }]
    }
  ],
  "candidates": [
    {
      "label": "chosen",
      "messages": [
        {
          "role": "assistant",
          "content": [
            {
              "type": "text",
              "text": "Here's a professional email:\n\nSubject: Time Off Request - [Date Range]\n\nDear [Boss's Name],\n\nI am writing to request time off from Friday, [Date] through Monday, [Date] to attend my sister's wedding.\n\nI will ensure all my responsibilities are covered before I leave and will be available for any urgent matters.\n\nThank you for considering my request.\n\nBest regards,\n[Your Name]"
            }
          ]
        }
      ]
    },
    {
      "label": "rejected",
      "messages": [
        {
          "role": "assistant",
          "content": [{ "type": "text", "text": "Just tell them you need off for a wedding, they'll understand." }]
        }
      ]
    }
  ]
}

Candidate Structure

Each candidate has:

Field	Type	Required	Description
`candidate_id`	string	No	Unique identifier
`label`	string	Yes	`chosen`, `rejected`, or `neutral`
`rank`	integer	No	Ranking (1 = best)
`score`	number	No	Numeric score
`messages`	array	Yes	Candidate response messages
`metadata`	object	No	Custom metadata

Labels

chosen - Preferred response
rejected - Non-preferred response
neutral - Neither preferred nor rejected (for ranking data)

Ranked Preferences

For datasets with more than two candidates, use rank:

{
  "id": "pref-ranked-001",
  "messages": [
    {
      "role": "user",
      "content": [{ "type": "text", "text": "What's the best way to learn a new language?" }]
    }
  ],
  "candidates": [
    {
      "label": "chosen",
      "rank": 1,
      "messages": [
        {
          "role": "assistant",
          "content": [
            {
              "type": "text",
              "text": "The most effective approach combines immersion, consistent practice, and structured learning..."
            }
          ]
        }
      ]
    },
    {
      "label": "neutral",
      "rank": 2,
      "messages": [
        {
          "role": "assistant",
          "content": [{ "type": "text", "text": "You should practice every day and maybe use some apps..." }]
        }
      ]
    },
    {
      "label": "rejected",
      "rank": 3,
      "messages": [
        { "role": "assistant", "content": [{ "type": "text", "text": "Just watch movies in that language." }] }
      ]
    }
  ]
}

With Tool Use

Candidates can include tool interactions:

{
  "id": "pref-tool-001",
  "messages": [
    {
      "role": "user",
      "content": [{ "type": "text", "text": "What's the weather like in Tokyo?" }]
    }
  ],
  "candidates": [
    {
      "label": "chosen",
      "messages": [
        {
          "role": "assistant",
          "content": [{ "type": "tool_call", "name": "weather", "call_id": "w1", "arguments": { "city": "Tokyo" } }]
        },
        {
          "role": "tool",
          "content": [
            {
              "type": "tool_result",
              "name": "weather",
              "call_id": "w1",
              "result": { "temp": 22, "condition": "sunny" }
            }
          ]
        },
        {
          "role": "assistant",
          "content": [{ "type": "text", "text": "It's currently 22°C and sunny in Tokyo." }]
        }
      ]
    },
    {
      "label": "rejected",
      "messages": [
        {
          "role": "assistant",
          "content": [{ "type": "text", "text": "I don't have access to real-time weather data." }]
        }
      ]
    }
  ]
}

Best Practices

Clear preference signal - Ensure chosen is clearly better than rejected
Meaningful differences - Avoid trivial distinctions
Diverse rejection types - Include various failure modes
Consistent evaluation - Apply the same standards across records
Annotator agreement - For human-labeled data, track inter-annotator agreement