Records Overview
Records Overview
Each line in a JSONL shard file is a record. The record structure depends on the training objective specified in the file configuration.
Base Record Structure
All records share these common fields:
| Field | Type | Required | Description |
|---|---|---|---|
id | string | No | Unique record identifier |
messages | array | Yes | Conversation messages |
toolset_id | string | No | Override file/default toolset |
metadata | object | No | Custom record metadata |
Record Types
| Training Objective | Record Schema | Key Requirement |
|---|---|---|
| SFT | sft_record | At least one assistant message |
| Preference | preference_record | Candidates with chosen/rejected labels |
| RFT | rft_record | Reference object with grading criteria |
| CPT | cpt_record | Document-only, no assistant/tool messages |
Message Structure
Each message has:
{
"role": "user",
"content": [
{ "type": "text", "text": "Hello!" }
],
"name": "optional-name",
"annotations": {}
}
Roles
| Role | Description | Used In |
|---|---|---|
system | System instructions | All objectives |
user | User input | All objectives |
assistant | Model response | SFT, Preference, RFT |
tool | Tool execution result | SFT, Preference, RFT |
document | Document context | CPT |
Content Parts
Content is always an array of content parts. See Content Parts Reference.
Validation
Records are validated against objective-specific schemas:
- SFT: Must have at least one message with
role: "assistant" - Preference: Must have
candidatesarray withchosenandrejectedlabels - RFT: Must have
referenceobject - CPT: Must NOT have any
assistantortoolmessages
Record ID Best Practices
While id is optional, it’s recommended for:
- Debugging and error tracking
- Deduplication
- Reproducibility
Use meaningful, unique IDs:
{"id": "math-001", "messages": [...]}
{"id": "code-review-042", "messages": [...]}
{"id": "uuid-550e8400-e29b-41d4-a716-446655440000", "messages": [...]}
Metadata
The metadata field can store custom information:
{
"id": "example",
"messages": [...],
"metadata": {
"source": "human-annotation",
"annotator_id": "ann-123",
"quality_score": 0.95
}
}