RFT Records

RFT Records

Reinforcement fine-tuning (RFT) records include reference information for automated grading, enabling training with verifiers.

Requirements

  • reference object with grading criteria or correct answers

Basic Example

{
  "id": "rft-001",
  "messages": [
    {
      "role": "user",
      "content": [{ "type": "text", "text": "What is 15 + 27?" }]
    },
    {
      "role": "assistant",
      "content": [{ "type": "text", "text": "42" }]
    }
  ],
  "reference": {
    "grading": { "type": "exact_match" },
    "answer": "42"
  }
}

Reference Structure

The reference object is flexible but typically contains:

FieldDescription
gradingGrading method configuration
answer / answersExpected answer(s)
rubricRubric criteria for complex tasks

Grading Types

Exact Match

{
  "reference": {
    "grading": { "type": "exact_match" },
    "answer": "42"
  }
}

Numeric Tolerance

{
  "reference": {
    "grading": { "type": "numeric", "tolerance": 0.01 },
    "answer": 3.14159
  }
}

Multiple Correct Answers

{
  "reference": {
    "grading": { "type": "any_of" },
    "answers": ["Paris", "paris", "PARIS"]
  }
}

Rubric-Based

{
  "reference": {
    "grading": { "type": "rubric" },
    "rubric": {
      "must_have": ["mentions Paris", "correct country"],
      "nice_to_have": ["includes population", "mentions landmarks"],
      "must_not_have": ["incorrect facts", "harmful content"]
    }
  }
}

Multi-Step Problems

For problems requiring multiple steps:

{
  "id": "rft-multi-001",
  "messages": [
    {
      "role": "user",
      "content": [{ "type": "text", "text": "A store sells apples for $2 each. If you buy 5 apples and pay with a $20 bill, how much change do you get?" }]
    },
    {
      "role": "assistant",
      "content": [
        { "type": "reasoning", "text": "Cost = 5 * $2 = $10. Change = $20 - $10 = $10." },
        { "type": "text", "text": "$10" }
      ]
    }
  ],
  "reference": {
    "grading": { "type": "exact_match", "format": "currency" },
    "answers": {
      "final": "$10",
      "intermediate": {
        "cost": "$10"
      }
    }
  }
}

With Tool Use

{
  "id": "rft-tool-001",
  "messages": [
    {
      "role": "user",
      "content": [{ "type": "text", "text": "Calculate the compound interest on $1000 at 5% for 3 years." }]
    },
    {
      "role": "assistant",
      "content": [
        { "type": "tool_call", "name": "calculator", "call_id": "c1", "arguments": { "expression": "1000 * (1.05 ^ 3)" } }
      ]
    },
    {
      "role": "tool",
      "content": [
        { "type": "tool_result", "name": "calculator", "call_id": "c1", "result": { "value": 1157.625 } }
      ]
    },
    {
      "role": "assistant",
      "content": [{ "type": "text", "text": "The total after 3 years is $1,157.63, so the compound interest earned is $157.63." }]
    }
  ],
  "reference": {
    "grading": { "type": "numeric", "tolerance": 0.01 },
    "answers": {
      "total": 1157.63,
      "interest": 157.63
    }
  }
}

Best Practices

  1. Clear grading criteria - Specify exactly how answers should be evaluated
  2. Handle edge cases - Include acceptable variations in answers
  3. Intermediate steps - For multi-step problems, track intermediate answers
  4. Consistent format - Use the same reference structure across similar tasks