PAPER 28 · Bonus · tool use

Toolformer

Schick et al. 2023 Paper

A model learns to call tools through self-supervised API-use examples.

Core concept

Toolformer teaches a model to decide when to call external tools by generating its own tool-use training examples.

Why it mattered

It points toward LLMs as coordinators of calculators, search, translation, calendars, and APIs.

Visual shortcut · Learn when to call tools

Toolformer teaches the model the decision to use a tool, not just the syntax of using one.

How it works

Insert possible tool calls into text.

Check whether the call improves prediction.

Keep useful examples.

Fine-tune the model to call tools when helpful.

The quick digest

The model starts with ordinary text and tries inserting API calls. If a call helps predict the continuation, that example is kept. Over many examples, the model learns not just how to call a tool, but when the call is worth making.

This is different from simply giving a model an API. Toolformer makes tool use part of the model’s learned behavior. It can learn that arithmetic should use a calculator, factual questions might use search, and some tasks need no tool at all.

The paper is an early bridge from language modeling to tool-using agents. Modern function calling is more structured, but the central question is the same: when should the model stop talking and use another system?

What to remember

One-liner

The model learns when to use tools.

Why it matters

Calling the tool is part of the policy.

Builder instinct

Tool use turns LLMs into coordinators.

Read it like this

First pass: Study the self-labeling pipeline.
Second pass: Then ask what makes a tool call useful enough to keep.
Then build taste: Compare with ReAct and modern function calling.

Build instinct

Create examples where a model should call a calculator or retrieval tool, then evaluate unnecessary versus useful calls.

Read source → All papers