如何将 AI 编码代理连接到 macOS 上的浏览器

freeCodeCamp.org

freeCodeCamp.org2026年5月26日

How to Connect Your AI Coding Agent to a Browser on macOS

8.5Score

TL;DR · AI Summary

This article explains how to connect your AI coding agent to a browser on macOS using Safari MCP, rather than using headless browsers like Puppeteer or Playwright.

Key Takeaways

Using Safari MCP allows your AI agent to drive the Safari browser you already us
Safari MCP uses the WebKit engine, avoiding the resource-intensive nature of hea
With Safari MCP, your AI agent can perform automation tasks within pages you are

Outline

Jump quickly between sections.

§Introduction
Explains how to connect your AI coding agent to a browser on macOS.
·What is MCP, and Why Does Browser Automation Need It?
Describes the MCP protocol and its role in browser automation.
·Why Safari Instead of Chrome or Playwright?
Compares Safari MCP to headless browsers.
·Setting Up Safari MCP and Running Automation Tasks
Detailed steps on configuring Safari MCP and performing automation tasks.

Mindmap

See how the topics connect at a glance.

查看大纲文本（无障碍 / 无 JS 友好）

如何将 AI 编码代理连接到 macOS 上的浏览器
- 什么是 MCP？为什么浏览器自动化需要它？
  - MCP 协议的作用
- 为什么选择 Safari 而不是 Chrome 或 Playwright？
  - Safari 的优势
  - 无头浏览器的缺点
- 如何设置 Safari MCP 并运行自动化任务？
  - 安装和配置 Safari MCP
  - 编写和运行自动化脚本

Highlights

Key sentences worth saving and sharing.

Safari MCP allows your AI agent to drive the Safari browser you already use, without needing to handle additional authentication issues.
— Paragraph 2
⬇︎ 下载 PNG 𝕏 分享到 X
Safari MCP uses the WebKit engine, avoiding the resource-intensive nature of headless Chromium.
— Paragraph 3
⬇︎ 下载 PNG 𝕏 分享到 X
With Safari MCP, your AI agent can perform automation tasks within pages you are logged into, something headless browsers cannot.
— Paragraph 4
⬇︎ 下载 PNG 𝕏 分享到 X

#AI coding agent#Safari MCP#macOS

Open original article

markdown


URL 源: https://www.freecodecamp.org/news/how-to-connect-your-ai-coding-agent-to-a-browser-on-macos/

发布时间: 2026-05-26T12:40:33.270Z

Markdown 内容:
![Image 1: 将你的 AI 编码代理连接到 macOS 上的浏览器](https://cdn.hashnode.com/uploads/covers/5fc16e412cae9c5b190b6cdd/7e77f1c5-6942-4dbe-a3c6-ca74cc4354e5.png)
像 Claude Code、Cursor 及其余的 AI 编码代理一样，它们在阅读和编写代码方面已经变得非常出色。但是一旦它们需要 _查看网页上的某些内容_，就会遇到瓶颈。它们无法看到你的预览站点。它们无法读取你在分析仪表板中的错误。它们无法检查刚刚构建的表单是否实际提交。

通常的解决方法是给代理一个无头浏览器——Puppeteer 或 Playwright 驱动一个新的 Chromium 实例。这可以工作，但效果有限。无头 Chromium 每次会话都以陌生人身份启动：没有登录信息、没有 Cookie、没有会话。它会启动第二个浏览器引擎来推高你的 CPU 使用率，并启动风扇。越来越多的网站在看到时就直接阻止了它。

还有一个选择，在 Mac 上这是一个不错的选择：让代理驾驶你已经使用的 **Safari**——那个已经登录了 GitHub、分析工具、预览环境的 Safari。这就是 Safari MCP 的作用。它是一个开源的 MCP 服务器，通过大约 80 个工具将 Safari 暴露给任何支持 MCP 的代理，没有任何 Chromium、WebDriver 和独立的浏览器需要照顾。

在这篇教程中，你将把 Safari MCP 连接到一个 AI 代理，运行你的第一个自动化脚本，然后构建一个无头浏览器根本无法完成的事情：一个可以在你已登录的页面内工作的自动化脚本。到结束时，你会不仅了解如何搭建这个系统，还会知道什么时候使用原生浏览器自动化是正确的选择——什么时候不是。

你需要以下内容：

*   一台 Mac（Safari MCP 是 macOS 特有的——稍后会详细解释这种权衡）

*   Node.js 18 或更高版本

*   一个支持 MCP 的 AI 代理——本教程使用 Claude Code 和 Cursor，但任何 MCP 客户端都可以使用

## 目录

## MCP 是什么，为什么浏览器自动化需要它？

在搭建任何东西之前，了解 Safari MCP 中的“MCP”代表什么是有帮助的。

**MCP** 是模型上下文协议——一种用于将 AI 代理连接到外部工具和数据的开放标准。你可以把它想象成 USB 端口。在 USB 出现之前，每个设备都需要自己的连接器。MCP 是同意使用同一个连接器：一个说 MCP 的代理可以使用任何说 MCP 的工具，而无需在两端编写自定义集成代码。

一个 **MCP 服务器** 暴露一组工具。一个 **MCP 客户端**——你的 AI 代理——发现这些工具并调用它们。服务器描述每个工具（它的名称、功能、接受的参数），代理决定何时调用它。当 Claude Code 决定需要读取一个网页时，它不会自己运行浏览器代码。它调用某个 MCP 服务器提供的工具。

浏览器自动化非常适合这种模式。代理的任务是推理——“我需要看看预览站点的内容，然后检查控制台中的错误。”实际的操作——打开标签页、等待加载、读取 DOM、捕获控制台输出——都是明确的操作，应该隐藏在一个稳定的接口后面。这个接口正是 MCP 服务器提供的。

Safari MCP 就是一个这样的服务器。它作为本地进程运行，暴露了大约 80 个浏览器工具（导航、点击、填充、读取、截图、提取等），任何 MCP 客户端都可以驱动它。代理从不接触 AppleScript 或 WebKit 内部。它只是调用 `safari_navigate` 并得到结果。

“USB 端口”的框架对于实际原因很重要：本教程中的所有内容都不是针对 Claude 的特定。将 Safari MCP 连接到 Cursor、Cline、Windsurf 或你自己的 MCP 客户端，工具是相同的。

## 为什么使用 Safari 而不是 Chrome 或 Playwright？

如果你以前自动化过浏览器，几乎肯定使用过 Puppeteer、Playwright 或 Selenium 来驱动 Chrome。那么为什么要选择 Safari 呢？

这归结于三个重要的区别，一旦是 _AI 代理_ 而不是测试脚本在驱动浏览器。

**1. 它是你真正使用的浏览器，有你真正会话。** Playwright 启动的无头 Chromium 是一个干净的工作室。它从未登录过任何东西。如果你想让你的代理读取你的分析仪表板，首先必须解决认证问题——存储凭证、编写登录脚本、处理两步验证提示、刷新令牌。Safari MCP 跳过了这一切。它驱动你每天使用的 Safari 实例，该实例已经登录到了你的仪表板、GitHub 和电子邮件。代理免费继承这些会话。

**2. 它不会融化你的笔记本电脑。** 无头 Chromium 是一个与你已经打开的浏览器并行运行的第二个完整浏览器引擎。在一台笔记本电脑上，这是真正的 CPU、内存和你能听到风扇的声音。Safari MCP 使用每台 Mac 上已经运行的 WebKit 引擎——不需要启动第二个引擎。项目测量显示，浏览工作时 CPU 使用率降低了约 60%，自动化在后台运行 Safari，因此不会抢夺你的屏幕。

**3. 网站不会将其视为机器人。** 无头浏览器会泄露信息。它们会暴露 `navigator.webdriver`，它们自带明显的自动化指纹，而检测机器人服务——Cloudflare 的挑战页面、reCAPTCHA、许多 B2B 网站前面的 WAF——已经变得非常擅长识别它们。你通过操作系统驱动的实际 Safari 看起来就像它本来的样子：一个人的浏览器。（请注意：这是为了自动化 _你的_ 账户和网站——而不是逃避你不拥有的访问控制。）

The cost of all this is the obvious one: Safari MCP is macOS-only. It's built on WebKit and AppleScript, so there's no Windows or Linux story. If your agent runs on a Linux CI box, this isn't your tool. If it runs on your Mac — which, for a coding agent, it very often does — the trade is a good one. We'll come back to limitations honestly at the end.

安装 Safari MCP

安装过程实际上只需要一个命令，但首先需要调整两个 Safari 设置。我们按顺序来。

第一步 — 启用 Safari 的开发者功能

Safari MCP 通过在 Safari 中运行 JavaScript 来读取和控制页面。有两个设置必须启用：

打开 Safari → Preferences → Advanced 并勾选 "Show features for web developers." 这会显示开发菜单。

打开新的 Develop 菜单并勾选 "Allow JavaScript from Apple Events."

第二个设置很重要。它允许外部进程（MCP 服务器）请求 Safari 在页面上运行 JavaScript。没有它，每个工具调用都会失败。

第二步 — 运行服务器

code

npx safari-mcp

这就是整个安装过程。npx 会获取包并运行它；不需要构建。第一次代理调用工具时，macOS 会弹出权限提示——类似于 _"终端想要控制 Safari."_ 点击 OK。这是标准的自动化权限，你可以在 系统偏好设置 → 隐私与安全 → 自动化 下稍后查看。

如果你想永久安装：

code

npm install -g safari-mcp

第三步 — 告诉你的代理

你的 AI 代理需要知道服务器存在。对于 Claude Code，只需一条命令即可：

code

claude mcp add safari -- npx safari-mcp

对于 Cursor，在项目中创建 .cursor/mcp.json：

code

{
  "mcpServers": {
    "safari": {
      "command": "npx",
      "args": ["safari-mcp"]
    }
  }
}

对于每种客户端（Claude Desktop、Cline、Windsurf、Continue、VS Code），这个过程都相同。你告诉代理：“有一个名为 safari 的 MCP 服务器；通过运行 npx safari-mcp 启动它。”

重启你的代理（或重新加载其 MCP 服务器），它就会连接。在 Claude Code 中，你可以使用 /mcp 命令确认，该命令列出已连接的服务器及其工具。你应该看到 safari，并且有大约 80 个可用工具。

就是这样。现在你的代理有了一个浏览器。

你的第一个自动化：阅读页面

让我们用最简单的任务来证明连线是否正常工作：让代理打开一个网页并告诉我页面的内容。

在你的代理中，只需用普通语言提问：

"使用 safari 工具打开 example.com 并告诉我页面说了什么。"

背后，代理进行了两次工具调用。首先导航：

code

{ "tool": "safari_navigate", "arguments": { "url": "https://example.com" } }

然后读取内容：

code

{ "tool": "safari_read_page", "arguments": {} }

safari_read_page 返回页面的标题、URL 和文本内容，去掉了 HTML 标签——正是 LLM 所需的形式。代理会收到类似以下的内容：

code

Example Domain
https://example.com/
This domain is for use in illustrative examples in documents. You may
use this domain in literature without prior coordination or asking for
permission.

然后将其传递给你。你看到了你的代理正在浏览。

关于代理如何查看页面的一个快速说明，因为这会影响下游的所有操作。safari_read_page 对于“这说的是什么”非常有用。但是当代理需要 _行动_ — 点击按钮、填写字段 — 只有文本是不够的。它需要知道实际存在的内容以及如何定位它们。为了这一点，更好的第一步是 safari_snapshot：

code

{ "tool": "safari_snapshot", "arguments": {} }

这会返回页面的可访问性树视图，其中每个交互元素都有一个稳定的 ref ID：

code

[textbox ref=0_8] "Full Name" value=""
[combobox ref=0_10] "Subject"
[button ref=0_15] "Submit"

那些 ref ID 是代理可靠的句柄。CSS 选择器在页面重新渲染时会失效。快照的 ref 在页面生命周期内一直有效。记住这一点——这是一次性自动化和每次都能工作的自动化的区别。

收益：自动化登录流程

阅读 example.com 是一个连线测试。这里是你无法用无头浏览器真正完成的事情。

选择你在 Safari 中已经登录的网站——你的分析、项目看板、CI 仪表板。我们将使用 GitHub，因为每个开发者都有账户，并且通知页面是一个真正的、稍微令人厌烦的任务。任务：让代理打开我的 GitHub 通知并总结哪些需要回复，哪些只是 FYI。

问代理：

"打开我的 GitHub 通知，读取它们并将它们分类为‘需要回复’和‘仅 FYI’。"

代理导航：

code

{ "tool": "safari_navigate", "arguments": { "url": "https://github.com/notifications" } }

停下来注意发生了什么。没有登录屏幕。没有 OAuth 舞蹈。没有环境变量中的个人访问令牌。Safari 已经以你身份认证，所以代理会直接到达你的真实通知。无头 Chromium 会在这一处遇到登录障碍并停止。

通知列表是增量加载的，因此代理应该在读取之前等待内容。safari_wait_for 会轮询页面直到某个选择器或文本出现，或者超时：

code

{ "tool": "safari_wait_for", "arguments": { "text": "Inbox", "timeout": 10000 } }

然后它读取。safari_read_page 限定在通知区域返回干净的文本：

code

{ "tool": "safari_read_page", "arguments": { "selector": "main" } }

代理推理这些文本并把分组摘要交给你。整个循环——导航、等待、读取、总结——只是一系列工具调用。

当你需要数据以精确的形状而不是文字形式提供给其他步骤或写入文件时，代理可以使用 safari_evaluate，它在页面上运行自定义 JavaScript 并返回你构建的内容：

code

{
  "tool": "safari_evaluate",
  "arguments": {
    "expression": "JSON.stringify([...document.querySelectorAll('li')].map(li => li.innerText.trim()))"
  }
}

The agent writes that expression itself, against the structure it just saw in the snapshot — you don't hand-author selectors.

You might be thinking: _GitHub has an API, why scrape the page?_ Fair. For GitHub specifically, the API is excellent. But the point generalizes. Most of the dashboards you stare at every day — your billing portal, your error tracker's specific filtered view, a client's analytics, the admin panel of some tool your company pays for — either have no usable API or would cost you an afternoon of OAuth setup to reach. With Safari MCP, "the page I'm already looking at" _is_ the API. The agent reads what you can see, because it's using the browser you're seeing it in.

That's the capability headless automation can't match. Not speed, not features — access.

Handling the Tricky Parts

A first automation always looks easy. Three things tend to bite on the second one.

Tab Safety — The Agent Must not Hijack Your Tabs

This is the scariest failure mode: you're typing in a tab, the agent navigates _that_ tab, and your work is gone. Safari MCP guards against it by stamping each automation tab with an identity marker — it uses window.name, which survives page navigations — and resolving "the agent's tab" through that marker on every call. If it can't positively identify its own tab, it refuses to act and raises a re-anchor error rather than guessing.

The practical rule for you: let the agent open its own tab with safari_new_tab, and it will stay in its lane. Don't point it at "the current tab" and assume.

Waiting for Dynamic Content

Modern pages render after load. If the agent reads too early, it reads an empty shell. Don't have it guess with fixed sleeps — use safari_wait_for, which polls for a selector or text until it appears or the timeout elapses:

code

{ "tool": "safari_wait_for", "arguments": { "selector": ".results-list", "timeout": 8000 } }

This is the single most common fix for "the automation works when I step through it slowly but fails when it runs."

Framework Forms

Set a React or Vue input's .value directly and the framework never notices — its internal state stays empty, and your "filled" form submits blank. Safari MCP's safari_fill and safari_fill_form use the native value setters and dispatch the input and change events the framework listens for, so React, Vue, Angular, and Svelte state all stay in sync:

code

{
  "tool": "safari_fill_form",
  "arguments": {
    "fields": [
      { "selector": "#email", "value": "jane@example.com" },
      { "selector": "#message", "value": "Looks great." }
    ]
  }
}

For framework-heavy pages where CSS selectors are fragile, go back to the snapshot refs from the previous section — pass { "ref": "0_9" } instead of { "selector": "#email" }. Refs survive re-renders; selectors don't.

None of these are exotic. They're just the difference between a demo and an automation you'd actually leave running.

Limitations: When Not to Use This

A tool tutorial that only lists strengths isn't worth much. Here's where Safari MCP is the wrong choice.

It's macOS-only, and that's structural. Safari MCP is built on WebKit and AppleScript. There's no Windows or Linux port coming, because the foundation doesn't exist on those platforms. If your agent runs in Linux CI, use Playwright.

It drives one Safari, on one Mac. This is browser automation for _your_ machine — a coding agent working alongside you. It is not a fleet. If you need 50 parallel browsers scraping in a data center, that's a headless-Chromium-in-containers job, and Safari MCP is the wrong shape for it.

Cross-browser test suites should stay on Playwright. If you're writing end-to-end tests that must pass on Chrome, Firefox, and Safari, use the tool built for that. Safari MCP drives exactly one engine: WebKit.

It shares a browser with you. Because it uses your real Safari, the agent and you are in the same browser. That's the entire point — but it means you should let the agent work in its own tabs and not fight it for the same window.

The honest summary: Safari MCP is built for one specific situation — an AI agent doing real browser work on the Mac you're sitting at, against sites you're already logged into. In that situation it's hard to beat. Outside it, reach for the headless tools. Knowing which situation you're in is the actual skill.

Wrapping Up

You've gone from an AI agent that could only see code to one that can see the web — the real web, behind your real logins.

To recap what you did: you learned what MCP is and why browser automation belongs behind that interface. You saw why a native Safari engine beats a headless Chromium for an agent working on your Mac and you installed Safari MCP with one command and two settings. You ran a first read, and then you did the thing that actually matters — an automation inside a logged-in page, with no auth code at all. Finally, you saw the edges: tab safety, waiting for dynamic content, framework forms, and the cases where you should pick a different tool.

The bigger idea is worth holding onto. An AI agent is only as capable as the tools you connect to it. Giving it a browser — a _real_ one — turns "write me code" into "go look at the staging site, find the bug, and tell me what's wrong." That's a different kind of collaborator.

Safari MCP is open source under the MIT license, and it exposes around 80 tools beyond the handful you used here — screenshots, network inspection, storage, accessibility audits, multi-tab workflows. The repository and full tool reference are at github.com/achiya-automation/safari-mcp. Point your agent at it and see what it does when it can finally look around.

* *

* *

code


Learn to code for free. freeCodeCamp's open source curriculum has helped more than 40,000 people get jobs as developers. [Get started](https://www.freecodecamp.org/learn)