Toolsets

Toolsets

Toolsets define the tools available to the model during training. Define tools once at the dataset level and reference them in files or records.

Structure

{
  "toolsets": [
    {
      "id": "my-tools",
      "tools": [
        {
          "name": "tool_name",
          "description": "What the tool does",
          "input_schema": { ... },
          "output_schema": { ... }
        }
      ]
    }
  ]
}

Tool Definition

FieldRequiredDescription
nameYesTool function name
descriptionNoHuman-readable description
input_schemaNoJSON Schema for inputs
output_schemaNoJSON Schema for outputs

Example Toolset

{
  "toolsets": [
    {
      "id": "math-tools",
      "tools": [
        {
          "name": "calculator",
          "description": "Evaluate a mathematical expression",
          "input_schema": {
            "type": "object",
            "properties": {
              "expression": {
                "type": "string",
                "description": "Math expression to evaluate"
              }
            },
            "required": ["expression"]
          },
          "output_schema": {
            "type": "object",
            "properties": {
              "value": { "type": "number" }
            },
            "required": ["value"]
          }
        },
        {
          "name": "unit_converter",
          "description": "Convert between units",
          "input_schema": {
            "type": "object",
            "properties": {
              "value": { "type": "number" },
              "from_unit": { "type": "string" },
              "to_unit": { "type": "string" }
            },
            "required": ["value", "from_unit", "to_unit"]
          }
        }
      ]
    }
  ]
}

Using Toolsets

Default Toolset

Set a default toolset in defaults:

{
  "defaults": {
    "toolset_id": "math-tools"
  }
}

File-Level Toolset

Override for specific files:

{
  "files": [
    {
      "split": "train",
      "objective": "sft",
      "toolset_id": "advanced-tools",
      "shards": [...]
    }
  ]
}

Record-Level Toolset

Override for specific records:

{
  "id": "record-001",
  "toolset_id": "special-tools",
  "messages": [...]
}

Priority Order

Toolset resolution follows this priority:

  1. Record toolset_id (highest)
  2. File toolset_id
  3. Default toolset_id (lowest)

Multiple Toolsets

Define multiple toolsets for different scenarios:

{
  "toolsets": [
    {
      "id": "basic-tools",
      "tools": [
        { "name": "calculator", ... }
      ]
    },
    {
      "id": "web-tools",
      "tools": [
        { "name": "web_search", ... },
        { "name": "fetch_url", ... }
      ]
    },
    {
      "id": "all-tools",
      "tools": [
        { "name": "calculator", ... },
        { "name": "web_search", ... },
        { "name": "fetch_url", ... }
      ]
    }
  ]
}

Best Practices

  1. Clear descriptions - Help the model understand when to use each tool
  2. Strict schemas - Use additionalProperties: false for cleaner data
  3. Consistent naming - Use snake_case for tool names
  4. Required fields - Mark required inputs in the schema
  5. Examples - Include example calls in descriptions if helpful