Preference Records

Preference Records

Preference records are used for Direct Preference Optimization (DPO), RLHF, and reward model training. They contain a prompt context plus multiple candidate responses with preference labels.

Requirements

  • candidates array with at least 2 candidates
  • At least one chosen candidate
  • At least one rejected candidate

Basic Example

{
  "id": "pref-001",
  "messages": [
    {
      "role": "user",
      "content": [{ "type": "text", "text": "Explain quantum computing in simple terms." }]
    }
  ],
  "candidates": [
    {
      "candidate_id": "chosen-1",
      "label": "chosen",
      "messages": [
        {
          "role": "assistant",
          "content": [{ "type": "text", "text": "Quantum computing uses quantum bits (qubits) that can be 0, 1, or both at once. This lets quantum computers solve certain problems much faster than regular computers by exploring many possibilities simultaneously." }]
        }
      ]
    },
    {
      "candidate_id": "rejected-1",
      "label": "rejected",
      "messages": [
        {
          "role": "assistant",
          "content": [{ "type": "text", "text": "Quantum computing is really complicated. It uses physics stuff to do calculations. You probably wouldn't understand it." }]
        }
      ]
    }
  ]
}

Multi-Turn Context

The messages array provides context, then candidates provide competing continuations:

{
  "id": "pref-multi-001",
  "messages": [
    {
      "role": "user",
      "content": [{ "type": "text", "text": "I need help writing an email to my boss about taking time off." }]
    },
    {
      "role": "assistant",
      "content": [{ "type": "text", "text": "I'd be happy to help! What dates are you planning to take off, and what's the reason?" }]
    },
    {
      "role": "user",
      "content": [{ "type": "text", "text": "Next Friday through the following Monday. It's for my sister's wedding." }]
    }
  ],
  "candidates": [
    {
      "label": "chosen",
      "messages": [
        {
          "role": "assistant",
          "content": [{ "type": "text", "text": "Here's a professional email:\n\nSubject: Time Off Request - [Date Range]\n\nDear [Boss's Name],\n\nI am writing to request time off from Friday, [Date] through Monday, [Date] to attend my sister's wedding.\n\nI will ensure all my responsibilities are covered before I leave and will be available for any urgent matters.\n\nThank you for considering my request.\n\nBest regards,\n[Your Name]" }]
        }
      ]
    },
    {
      "label": "rejected",
      "messages": [
        {
          "role": "assistant",
          "content": [{ "type": "text", "text": "Just tell them you need off for a wedding, they'll understand." }]
        }
      ]
    }
  ]
}

Candidate Structure

Each candidate has:

FieldTypeRequiredDescription
candidate_idstringNoUnique identifier
labelstringYeschosen, rejected, or neutral
rankintegerNoRanking (1 = best)
scorenumberNoNumeric score
messagesarrayYesCandidate response messages
metadataobjectNoCustom metadata

Labels

  • chosen - Preferred response
  • rejected - Non-preferred response
  • neutral - Neither preferred nor rejected (for ranking data)

Ranked Preferences

For datasets with more than two candidates, use rank:

{
  "id": "pref-ranked-001",
  "messages": [
    {
      "role": "user",
      "content": [{ "type": "text", "text": "What's the best way to learn a new language?" }]
    }
  ],
  "candidates": [
    {
      "label": "chosen",
      "rank": 1,
      "messages": [{ "role": "assistant", "content": [{ "type": "text", "text": "The most effective approach combines immersion, consistent practice, and structured learning..." }] }]
    },
    {
      "label": "neutral",
      "rank": 2,
      "messages": [{ "role": "assistant", "content": [{ "type": "text", "text": "You should practice every day and maybe use some apps..." }] }]
    },
    {
      "label": "rejected",
      "rank": 3,
      "messages": [{ "role": "assistant", "content": [{ "type": "text", "text": "Just watch movies in that language." }] }]
    }
  ]
}

With Tool Use

Candidates can include tool interactions:

{
  "id": "pref-tool-001",
  "messages": [
    {
      "role": "user",
      "content": [{ "type": "text", "text": "What's the weather like in Tokyo?" }]
    }
  ],
  "candidates": [
    {
      "label": "chosen",
      "messages": [
        {
          "role": "assistant",
          "content": [
            { "type": "tool_call", "name": "weather", "call_id": "w1", "arguments": { "city": "Tokyo" } }
          ]
        },
        {
          "role": "tool",
          "content": [
            { "type": "tool_result", "name": "weather", "call_id": "w1", "result": { "temp": 22, "condition": "sunny" } }
          ]
        },
        {
          "role": "assistant",
          "content": [{ "type": "text", "text": "It's currently 22°C and sunny in Tokyo." }]
        }
      ]
    },
    {
      "label": "rejected",
      "messages": [
        {
          "role": "assistant",
          "content": [{ "type": "text", "text": "I don't have access to real-time weather data." }]
        }
      ]
    }
  ]
}

Best Practices

  1. Clear preference signal - Ensure chosen is clearly better than rejected
  2. Meaningful differences - Avoid trivial distinctions
  3. Diverse rejection types - Include various failure modes
  4. Consistent evaluation - Apply the same standards across records
  5. Annotator agreement - For human-labeled data, track inter-annotator agreement