Tool Use Datasets

Tool Use Datasets

This guide covers creating training data for teaching models to use tools (function calling).

Defining Tools

Define tools in toolsets:

{
  "toolsets": [
    {
      "id": "web-tools",
      "tools": [
        {
          "name": "web_search",
          "description": "Search the web for information",
          "input_schema": {
            "type": "object",
            "properties": {
              "query": { "type": "string", "description": "Search query" }
            },
            "required": ["query"]
          }
        },
        {
          "name": "get_weather",
          "description": "Get current weather for a location",
          "input_schema": {
            "type": "object",
            "properties": {
              "location": { "type": "string" },
              "units": { "type": "string", "enum": ["celsius", "fahrenheit"] }
            },
            "required": ["location"]
          }
        }
      ]
    }
  ],
  "defaults": {
    "toolset_id": "web-tools"
  }
}

Basic Tool Call

{
  "id": "tool-001",
  "messages": [
    {
      "role": "user",
      "content": [{ "type": "text", "text": "What's the weather in Tokyo?" }]
    },
    {
      "role": "assistant",
      "content": [
        {
          "type": "tool_call",
          "name": "get_weather",
          "call_id": "weather-1",
          "arguments": { "location": "Tokyo", "units": "celsius" }
        }
      ]
    },
    {
      "role": "tool",
      "content": [
        {
          "type": "tool_result",
          "name": "get_weather",
          "call_id": "weather-1",
          "result": { "temperature": 22, "condition": "sunny", "humidity": 65 }
        }
      ]
    },
    {
      "role": "assistant",
      "content": [{ "type": "text", "text": "It's currently 22°C and sunny in Tokyo with 65% humidity." }]
    }
  ]
}

Multiple Tool Calls

{
  "id": "tool-002",
  "messages": [
    {
      "role": "user",
      "content": [{ "type": "text", "text": "Compare the weather in Tokyo and New York" }]
    },
    {
      "role": "assistant",
      "content": [
        {
          "type": "tool_call",
          "name": "get_weather",
          "call_id": "weather-tokyo",
          "arguments": { "location": "Tokyo" }
        },
        {
          "type": "tool_call",
          "name": "get_weather",
          "call_id": "weather-nyc",
          "arguments": { "location": "New York" }
        }
      ]
    },
    {
      "role": "tool",
      "content": [
        {
          "type": "tool_result",
          "name": "get_weather",
          "call_id": "weather-tokyo",
          "result": { "temperature": 22, "condition": "sunny" }
        },
        {
          "type": "tool_result",
          "name": "get_weather",
          "call_id": "weather-nyc",
          "result": { "temperature": 15, "condition": "cloudy" }
        }
      ]
    },
    {
      "role": "assistant",
      "content": [{ "type": "text", "text": "Tokyo is warmer at 22°C and sunny, while New York is 15°C and cloudy." }]
    }
  ]
}

Sequential Tool Use

{
  "id": "tool-003",
  "messages": [
    {
      "role": "user",
      "content": [{ "type": "text", "text": "Find recent news about AI and summarize the top story" }]
    },
    {
      "role": "assistant",
      "content": [
        {
          "type": "tool_call",
          "name": "web_search",
          "call_id": "search-1",
          "arguments": { "query": "AI news today" }
        }
      ]
    },
    {
      "role": "tool",
      "content": [
        {
          "type": "tool_result",
          "name": "web_search",
          "call_id": "search-1",
          "result": {
            "results": [
              { "title": "New AI Model Released", "url": "https://..." },
              { "title": "AI in Healthcare", "url": "https://..." }
            ]
          }
        }
      ]
    },
    {
      "role": "assistant",
      "content": [
        {
          "type": "tool_call",
          "name": "fetch_url",
          "call_id": "fetch-1",
          "arguments": { "url": "https://..." }
        }
      ]
    },
    {
      "role": "tool",
      "content": [
        {
          "type": "tool_result",
          "name": "fetch_url",
          "call_id": "fetch-1",
          "result": { "content": "A new AI model was released today..." }
        }
      ]
    },
    {
      "role": "assistant",
      "content": [{ "type": "text", "text": "The top AI news today is about a new model release..." }]
    }
  ]
}

When NOT to Use Tools

Include examples where tools aren’t needed:

{
  "id": "no-tool-001",
  "messages": [
    {
      "role": "user",
      "content": [{ "type": "text", "text": "What is photosynthesis?" }]
    },
    {
      "role": "assistant",
      "content": [{ "type": "text", "text": "Photosynthesis is the process by which plants convert sunlight into energy..." }]
    }
  ]
}

Tool Errors

Include examples of handling tool errors:

{
  "id": "tool-error-001",
  "messages": [
    {
      "role": "user",
      "content": [{ "type": "text", "text": "Get weather for Atlantis" }]
    },
    {
      "role": "assistant",
      "content": [
        {
          "type": "tool_call",
          "name": "get_weather",
          "call_id": "weather-err",
          "arguments": { "location": "Atlantis" }
        }
      ]
    },
    {
      "role": "tool",
      "content": [
        {
          "type": "tool_result",
          "name": "get_weather",
          "call_id": "weather-err",
          "result": { "error": "Location not found" }
        }
      ]
    },
    {
      "role": "assistant",
      "content": [{ "type": "text", "text": "I couldn't find weather data for Atlantis. This might be a fictional or unrecognized location. Could you provide a real city name?" }]
    }
  ]
}

Best Practices

  1. Unique call_ids - Each tool call needs a unique identifier
  2. Match call_ids - Tool results must reference the correct call_id
  3. Vary complexity - Include simple and complex tool interactions
  4. Include failures - Teach graceful error handling
  5. Balance with non-tool examples - Not every query needs tools
  6. Realistic arguments - Use plausible input values