---
title: "LLM 0.32a0  is a major backwards-compatible refactor"
source_name: "Simon Willison's Weblog"
original_url: "https://simonwillison.net/2026/Apr/29/llm/#atom-everything"
canonical_url: "https://www.traeai.com/articles/da0e311b-f1fb-440f-863d-f16fdcd6ecbe"
content_type: "article"
language: "英文"
score: 8.5
tags: ["LLM","Python","自然语言处理"]
published_at: "2026-04-29T19:01:47+00:00"
created_at: "2026-04-30T03:53:16.280935+00:00"
---

# LLM 0.32a0  is a major backwards-compatible refactor

Canonical URL: https://www.traeai.com/articles/da0e311b-f1fb-440f-863d-f16fdcd6ecbe
Original source: https://simonwillison.net/2026/Apr/29/llm/#atom-everything

## Summary

LLM 0.32a0 版本重构了输入输出模型，支持消息序列和多类型响应流，以适应现代大语言模型的多样性。

## Key Takeaways

- 新版本支持将输入表示为消息序列。
- 模型响应可以由不同类型的多个部分组成。
- 改进了与现有API（如OpenAI）的兼容性。

## Content

Title: LLM 0.32a0  is a major backwards-compatible refactor

URL Source: http://simonwillison.net/2026/Apr/29/llm/

Published Time: Thu, 30 Apr 2026 03:52:36 GMT

Markdown Content:
# LLM 0.32a0 is a major backwards-compatible refactor

# [Simon Willison’s Weblog](http://simonwillison.net/)

[Subscribe](http://simonwillison.net/about/#subscribe)

**Sponsored by:** Sonar — Now with SAST + SCA for secure, dependency-aware Agentic Engineering. [SonarQube Advanced Security](https://fandf.co/4bzyODl)

## LLM 0.32a0 is a major backwards-compatible refactor

29th April 2026

I just released [LLM 0.32a0](https://llm.datasette.io/en/latest/changelog.html#a0-2026-04-28), an alpha release of my [LLM](https://llm.datasette.io/) Python library and CLI tool for accessing LLMs, with some consequential changes that I’ve been working towards for quite a while.

Previous versions of LLM modeled the world in terms of prompts and responses. Send the model a text prompt, get back a text response.

import llm

model = llm.get_model("gpt-5.5")
response = model.prompt("Capital of France?")
print(response.text())
This made sense when I started working on the library back in April 2023. A lot has changed since then!

LLM provides an abstraction over thousands of different models via its [plugin system](https://llm.datasette.io/en/stable/plugins/index.html). The original abstraction—of text input that returns text output—was no longer able to represent everything I needed it to.

Over time LLM itself has grown [attachments](https://simonwillison.net/2024/Oct/29/llm-multi-modal/) to handle image, audio, and video input, then [schemas](https://simonwillison.net/2025/Feb/28/llm-schemas/) for outputting structured JSON, then [tools](https://simonwillison.net/2025/May/27/llm-tools/) for executing tool calls. Meanwhile LLMs kept evolving, adding reasoning support and the ability to return images and all kinds of other interesting capabilities.

LLM needs to evolve to better handle the diversity of input and output types that can be processed by today’s frontier models.

The 0.32a0 alpha has two key changes: model inputs can be represented as a sequence of messages, and model responses can be composed of a stream of differently typed parts.

#### Prompts as a sequence of messages

LLMs accept input as text, but ever since ChatGPT demonstrated the value of a two-way conversational interface, the most common way to prompt them has been to treat that input as a sequence of conversational turns.

The first turn might look like this:

```
user: Capital of France?
assistant:
```

(The model then gets to fill out the reply from the assistant.)

But each subsequent turn needs to replay the entire conversation up to that point, as a sort of screenplay:

```
user: Capital of France?
assistant: Paris
user: Germany?
assistant:
```

Most of the JSON APIs from the major vendors follow this pattern. Here’s what the above looks like using the OpenAI chat completions API, which has been widely imitated by other providers:

undefinedshell
curl https://api.openai.com/v1/chat/completions \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-5.5",
    "messages": [
      {
        "role": "user",
        "content": "Capital of France?"
      },
      {
        "role": "assistant",
        "content": "Paris"
      },
      {
        "role": "user",
        "content": "Germany?"
      }
    ]
  }'
undefined

Prior to 0.32, LLM modeled these as conversations:

model = llm.get_model("gpt-5.5")

conversation = model.conversation()
r1 = conversation.prompt("Capital of France?")
print(r1.text())
# Outputs "Paris"

r2 = conversation.prompt("Germany?")
print(r2.text())
# Outputs "Berlin"
This worked if you were building a conversation with the model from scratch, but it didn’t provide a way to feed in a previous conversation from the start. This made tasks like building an emulation of the OpenAI chat completions API much harder than they should have been.

The `llm` CLI tool worked around this through a custom mechanism for persisting and inflating conversations using SQLite, but that never became a stable part of the LLM API—and there are many places you might want to use the Python library without committing to SQLite as the storage layer.

The new alpha now supports this:

import llm
from llm import user, assistant

model = llm.get_model("gpt-5.5")

response = model.prompt(messages=[
    user("Capital of France?"),
    assistant("Paris"),
    user("Germany?"),
])
print(response.text())
The `llm.user()` and `llm.assistant()` functions are new builder functions designed to be used within that `messages=[]` array.

The previous `prompt=` option still works, but LLM upgrades it to a single-item messages array behind the scenes.

You can also now _reply_ to a response, as an alternative to building a conversation:

response2 = response.reply("How about Hungary?")
print(response2) # Default __str__() calls .text()
#### Streaming parts

The other major new interface in the alpha concerns streaming results back from a prompt.

Previously, LLM supported streaming like this:

response = model.prompt("Generate an SVG of a pelican riding a bicycle")
for chunk in response:
    print(chunk, end="")
Or this async variant:

import asyncio
import llm

model = llm.get_async_model("gpt-5.5")
response = model.prompt("Generate an SVG of a pelican riding a bicycle")

async def run():
    async for chunk in response:
        print(chunk, end="", flush=True)

asyncio.run(run())
Many of today’s models return mixed types of content. A prompt run against Claude might return reasoning output, then text, then a JSON request for a tool call, then more text content.

Some models can even execute tools on the server-side, for example OpenAI’s [code interpreter tool](https://developers.openai.com/api/docs/guides/tools-code-interpreter?lang=curl) or Anthropic’s [web search](https://platform.claude.com/docs/en/agents-and-tools/tool-use/web-search-tool). This means the results from the model can combine text, tool calls, tool outputs and other formats.

Multi-modal output models are starting to emerge too, which can return images or even [snippets of audio](https://developers.openai.com/api/docs/guides/audio#add-audio-to-your-existing-application) intermixed into that streaming response.

The new LLM alpha models these as a stream of typed message parts. Here’s what that looks like as a Python API consumer:

import asyncio
import llm

model = llm.get_model("gpt-5.5")
prompt = "invent 3 cool dogs, first talk about your motivations"

def describe_dog(name: str, bio: str) -> str:
    """Record the name and biography of a hypothetical dog."""
    return f"{name}: {bio}"

def sync_example():
    response = model.prompt(
        prompt,
        tools=[describe_dog],
    )
    for event in response.stream_events():
        if event.type == "text":
            print(event.chunk, end="", flush=True)
        elif event.type == "tool_call_name":
            print(f"\n Tool call: {event.chunk}(", end="", flush=True)
        elif event.type == "tool_call_args":
            print(event.chunk, end="", flush=True)

async def async_example():
    model = llm.get_async_model("gpt-5.5")
    response = model.prompt(
        prompt,
        tools=[describe_dog],
    )
    async for event in response.astream_events():
        if event.type == "text":
            print(event.chunk, end="", flush=True)
        elif event.type == "tool_call_name":
            print(f"\n Tool call: {event.chunk}(", end="", flush=True)
        elif event.type == "tool_call_args":
            print(event.chunk, end="", flush=True)

sync_example()
asyncio.run(async_example())
Sample output (from just the first sync example):

> `My motivation: create three memorable dogs with distinct “cool” styles—one cinematic, one adventurous, and one charmingly chaotic—so each feels like they could star in their own story.`
> 
> `Tool call: describe_dog({"name": "Nova Jetpaw", "bio": "A sleek silver-gray whippet who wears tiny aviator goggles and loves sprinting along moonlit beaches. Nova is fearless, elegant, and rumored to outrun drones just for fun."}`
> 
> `Tool call: describe_dog({"name": "Mochi Thunderbark", "bio": "A fluffy corgi with a dramatic black-and-gold bandana and the confidence of a rock star. Mochi is short, loud, loyal, and leads a neighborhood 'security patrol' made entirely of squirrels."}`
> 
> `Tool call: describe_dog({"name": "Atlas Snowfang", "bio": "A massive white husky with ice-blue eyes and a backpack full of trail snacks. Atlas is calm, heroic, and always knows the way home—even during blizzards, fog, or confusing camping trips."}`

At the end of the response you can call `response.execute_tool_calls()` to actually run the functions that were requested, or send a `response.reply()` to have those tools called and their return values sent back to the model:

print(response.reply("Tell me about the dogs"))
This new mechanism for streaming different token types means the CLI tool can now display “thinking” text in a different color from the text in the final response. The thinking text goes to stderr so it won’t affect results that are piped into other tools.

This example uses Claude Sonnet 4.6 (with an updated streaming event version of the [llm-anthropic](https://github.com/simonw/llm-anthropic) plugin) as Anthropic’s models return their reasoning text as part of the response:

undefinedshell
llm -m claude-sonnet-4.6 'Think about 3 cool dogs then describe them' \
  -o thinking_display 1
undefined

![Image 1: Animated demo. Starts with ~/dev/scratch/llm-anthropic % uv run llm -m claude-sonnet-4.6 'Think about 3 cool dogs then describe them' -o thinking_display 1 - the text then streams in grey: The user wants me to think about 3 cool dogs and then describe them. Let me come up with 3 interesting, cool dogs and describe them. Then switches to regular color text for the output that describes the dogs.](https://static.simonwillison.net/static/2026/claude-thinking-llm.gif)

You can suppress the output of reasoning tokens using the new `-R/--no-reasoning` flag. Surprisingly that ended up being the only CLI-facing change in this release.

#### A mechanism for serializing and deserializing responses

As mentioned earlier, LLM has quite inflexible code at the moment for persisting conversations to SQLite. I’ve added a new mechanism in 0.32a0 that should provide Python API users a way to roll their own alternative:

serializable = response.to_dict()
# serializable is a JSON-style dictionary
# store it anywhere you like, then inflate it:
response = Response.from_dict(serializable)
The dictionary this returns is actually a `TypedDict` defined in the new [llm/serialization.py](https://github.com/simonw/llm/blob/main/llm/serialization.py) module.

#### What’s next?

I’m releasing this as an alpha so I can upgrade various plugins and exercise the new design in real world environments for a few days. I expect the stable 0.32 release will be very similar to this alpha, unless alpha testing reveals some design flaw in the way I’ve put this all together.

There’s one remaining large task: I’d like to redesign the SQLite logging system to better capture the more finely grained details that are returned by this new abstraction.

Ideally I’d like to model this as a graph, to best support situations like an OpenAI-style chat completions API where the same conversations are constantly extended and then repeated with every prompt. I want to be able to store those without duplicating them in the database.

I’m undecided as to whether that should be a feature in 0.32 or I should hold it for 0.33.

Posted [29th April 2026](http://simonwillison.net/2026/Apr/29/) at 7:01 pm · Follow me on [Mastodon](https://fedi.simonwillison.net/@simon), [Bluesky](https://bsky.app/profile/simonwillison.net), [Twitter](https://twitter.com/simonw) or [subscribe to my newsletter](https://simonwillison.net/about/#subscribe)

## More recent articles

*   [Tracking the history of the now-deceased OpenAI Microsoft AGI clause](http://simonwillison.net/2026/Apr/27/now-deceased-agi-clause/) - 27th April 2026
*   [DeepSeek V4 - almost on the frontier, a fraction of the price](http://simonwillison.net/2026/Apr/24/deepseek-v4/) - 24th April 2026

This is **LLM 0.32a0 is a major backwards-compatible refactor** by Simon Willison, posted on [29th April 2026](http://simonwillison.net/2026/Apr/29/).

Part of series **[New releases of LLM](http://simonwillison.net/series/llm-releases/)**

1.   [Feed a video to a vision LLM as a sequence of JPEG frames on the CLI (also LLM 0.25)](http://simonwillison.net/2025/May/5/llm-video-frames/) - May 5, 2025, 5:38 p.m. 
2.   [Large Language Models can run tools in your terminal with LLM 0.26](http://simonwillison.net/2025/May/27/llm-tools/) - May 27, 2025, 8:35 p.m. 
3.   [LLM 0.27, the annotated release notes: GPT-5 and improved tool calling](http://simonwillison.net/2025/Aug/11/llm-027/) - Aug. 11, 2025, 11:57 p.m. 
4.   **LLM 0.32a0 is a major backwards-compatible refactor** - April 29, 2026, 7:01 p.m. 

[projects 526](http://simonwillison.net/tags/projects/)[python 1248](http://simonwillison.net/tags/python/)[ai 1991](http://simonwillison.net/tags/ai/)[annotated-release-notes 48](http://simonwillison.net/tags/annotated-release-notes/)[generative-ai 1765](http://simonwillison.net/tags/generative-ai/)[llms 1731](http://simonwillison.net/tags/llms/)[llm 593](http://simonwillison.net/tags/llm/)
**Previous:**[Tracking the history of the now-deceased OpenAI Microsoft AGI clause](http://simonwillison.net/2026/Apr/27/now-deceased-agi-clause/)

### Monthly briefing

Sponsor me for **$10/month** and get a curated email digest of the month's most important LLM developments.

Pay me to send you less!

[Sponsor & subscribe](https://github.com/sponsors/simonw/)

*   [Disclosures](http://simonwillison.net/about/#disclosures)
*   [Colophon](http://simonwillison.net/about/#about-site)
*   ©
*   [2002](http://simonwillison.net/2002/)
*   [2003](http://simonwillison.net/2003/)
*   [2004](http://simonwillison.net/2004/)
*   [2005](http://simonwillison.net/2005/)
*   [2006](http://simonwillison.net/2006/)
*   [2007](http://simonwillison.net/2007/)
*   [2008](http://simonwillison.net/2008/)
*   [2009](http://simonwillison.net/2009/)
*   [2010](http://simonwillison.net/2010/)
*   [2011](http://simonwillison.net/2011/)
*   [2012](http://simonwillison.net/2012/)
*   [2013](http://simonwillison.net/2013/)
*   [2014](http://simonwillison.net/2014/)
*   [2015](http://simonwillison.net/2015/)
*   [2016](http://simonwillison.net/2016/)
*   [2017](http://simonwillison.net/2017/)
*   [2018](http://simonwillison.net/2018/)
*   [2019](http://simonwillison.net/2019/)
*   [2020](http://simonwillison.net/2020/)
*   [2021](http://simonwillison.net/2021/)
*   [2022](http://simonwillison.net/2022/)
*   [2023](http://simonwillison.net/2023/)
*   [2024](http://simonwillison.net/2024/)
*   [2025](http://simonwillison.net/2025/)
*   [2026](http://simonwillison.net/2026/)
