Skip to content

Getting Started

What is TOON?

Token-Oriented Object Notation is a compact, human-readable encoding of the JSON data model that minimizes tokens and makes structure easy for models to follow. It's intended for LLM input as a drop-in, lossless representation of your existing JSON.

TOON combines YAML's indentation-based structure for nested objects with a CSV-style tabular layout for uniform arrays. TOON's sweet spot is uniform arrays of objects (multiple fields per row, same structure across items), achieving CSV-like compactness while adding explicit structure that helps LLMs parse and validate data reliably.

Think of it as a translation layer: use JSON programmatically, and encode it as TOON for LLM input.

Why TOON?

Standard JSON is verbose and token-expensive. For uniform arrays of objects, JSON repeats every field name for every record:

json
{
  "users": [
    { "id": 1, "name": "Alice", "role": "admin" },
    { "id": 2, "name": "Bob", "role": "user" }
  ]
}

YAML already reduces some redundancy with indentation instead of braces:

yaml
users:
  - id: 1
    name: Alice
    role: admin
  - id: 2
    name: Bob
    role: user

TOON goes further by declaring fields once and streaming data as rows:

yaml
users[2]{id,name,role}:
  1,Alice,admin
  2,Bob,user

The [2] declares the array length, enabling LLMs to answer dataset size questions and detect truncation. The {id,name,role} declares the field names. Each row is then a compact, comma-separated list of values. This is the core pattern: declare structure once, stream data compactly. The format approaches CSV's efficiency while adding explicit structure.

For a more realistic example, here's how TOON handles a dataset with both nested objects and tabular arrays:

json
{
  "context": {
    "task": "Our favorite hikes together",
    "location": "Boulder",
    "season": "spring_2025"
  },
  "friends": ["ana", "luis", "sam"],
  "hikes": [
    {
      "id": 1,
      "name": "Blue Lake Trail",
      "distanceKm": 7.5,
      "elevationGain": 320,
      "companion": "ana",
      "wasSunny": true
    },
    {
      "id": 2,
      "name": "Ridge Overlook",
      "distanceKm": 9.2,
      "elevationGain": 540,
      "companion": "luis",
      "wasSunny": false
    },
    {
      "id": 3,
      "name": "Wildflower Loop",
      "distanceKm": 5.1,
      "elevationGain": 180,
      "companion": "sam",
      "wasSunny": true
    }
  ]
}
yaml
context:
  task: Our favorite hikes together
  location: Boulder
  season: spring_2025
friends[3]: ana,luis,sam
hikes[3]{id,name,distanceKm,elevationGain,companion,wasSunny}:
  1,Blue Lake Trail,7.5,320,ana,true
  2,Ridge Overlook,9.2,540,luis,false
  3,Wildflower Loop,5.1,180,sam,true

Notice how TOON combines YAML's indentation for the context object with inline format for the primitive friends array and tabular format for the structured hikes array. Each format is chosen automatically based on the data structure.

Design Goals

TOON is optimized for specific use cases. It aims to:

  • Make uniform arrays of objects as compact as possible by declaring structure once and streaming data.
  • Stay fully lossless and deterministic – round-trips preserve all data and structure.
  • Keep parsing simple and robust for both LLMs and humans through explicit structure markers.
  • Provide validation guardrails (array lengths, field counts) that help detect truncation and malformed output.

When to Use TOON

TOON excels with uniform arrays of objects – data with the same structure across items. For LLM prompts, the format produces deterministic, minimally quoted text with built-in validation. Explicit array lengths ([N]) and field headers ({fields}) help detect truncation and malformed data, while the tabular structure declares fields once rather than repeating them in every row.

TIP

The TOON format is stable, but also an idea in progress. Nothing's set in stone – help shape where it goes by contributing to the spec or sharing feedback.

When Not to Use TOON

TOON is not always the best choice. Consider alternatives when:

  • Deeply nested or non-uniform structures (tabular eligibility ≈ 0%): JSON-compact often uses fewer tokens. Example: complex configuration objects with many nested levels.
  • Semi-uniform arrays (~40–60% tabular eligibility): Token savings diminish. Prefer JSON if your pipelines already rely on it.
  • Pure tabular data: CSV is smaller than TOON for flat tables. TOON adds minimal overhead (~5-10%) to provide structure (array length declarations, field headers, delimiter scoping) that improves LLM reliability.
  • Latency-critical applications: Benchmark on your exact setup. Some deployments (especially local/quantized models) may process compact JSON faster despite TOON's lower token count.

NOTE

For data-driven comparisons across different structures, see benchmarks. When optimizing for latency, measure TTFT, tokens/sec, and total time for both TOON and JSON-compact and use whichever performs better in your specific environment.

Installation

TypeScript Library

Install the library via your preferred package manager:

bash
npm install @toon-format/toon
bash
pnpm add @toon-format/toon
bash
yarn add @toon-format/toon

CLI

The CLI can be used without installation via npx, or installed globally:

bash
npx @toon-format/cli input.json -o output.toon
bash
npm install -g @toon-format/cli
bash
pnpm add -g @toon-format/cli
bash
yarn global add @toon-format/cli

For full CLI documentation, see the CLI reference.

Media Type & File Extension

TOON files conventionally use the .toon extension. For HTTP transmission, the provisional media type is text/toon, always with UTF-8 encoding. While you may specify charset=utf-8 explicitly, it's optional – UTF-8 is the default assumption. This follows the registration process outlined in spec §18.2.

Your First Example

The examples below use the TypeScript library for demonstration, but the same operations work in any language with a TOON implementation.

Let's encode a simple dataset with the TypeScript library:

ts
import { encode } from '@toon-format/toon'

const data = {
  users: [
    { id: 1, name: 'Alice', role: 'admin' },
    { id: 2, name: 'Bob', role: 'user' }
  ]
}

console.log(encode(data))

Output:

yaml
users[2]{id,name,role}:
  1,Alice,admin
  2,Bob,user

Decoding Back to JSON

Decoding is just as simple:

ts
import { decode } from '@toon-format/toon'

const toon = `
users[2]{id,name,role}:
  1,Alice,admin
  2,Bob,user
`

const data = decode(toon)
console.log(JSON.stringify(data, null, 2))

Output:

json
{
  "users": [
    { "id": 1, "name": "Alice", "role": "admin" },
    { "id": 2, "name": "Bob", "role": "user" }
  ]
}

Round-tripping is lossless: decode(encode(x)) always equals x (after normalization of non-JSON types like Date, NaN, etc.).

Where to Go Next

Now that you've seen your first TOON document, read the Format Overview for complete syntax details (objects, arrays, quoting rules, key folding), then explore Using TOON with LLMs to see how to use it effectively in prompts. For implementation details, check the API Reference (TypeScript) or the Specification (language-agnostic normative rules).