With 104M of image-text pairs, this is one of the largest, if not the largest, openly-licensed image...

Julien Chaumond(@julien_c)

Julien Chaumond(@julien_c)2026年5月28日

With 104M of image-text pairs, this is one of the largest, if not the largest, openly-licensed image...

8.5Score

TL;DR · AI 摘要

一个包含1.04亿图像-文本对的开源数据集MONET发布，为可复现的文本到图像研究提供支持。

核心要点

MONET数据集包含1.04亿图像-文本对，是目前最大的开源数据集之一。
MONET采用Apache 2.0许可，支持文本到图像模型的复现研究。
相关代码库Nano T2I可在GitHub上获取，方便用户训练自己的文本到图像模型。

结构提纲

按章节快速跳转。

§MONET数据集介绍
MONET是目前最大的开源图像-文本数据集之一，包含1.04亿对数据。
·数据集特点
MONET采用Apache 2.0许可，经过去重和重新标注，适合复现研究。
·相关工具与资源
Nano T2I代码库提供训练文本到图像模型的支持，托管在GitHub上。

思维导图

用一张图看清主题之间的关系。

查看大纲文本（无障碍 / 无 JS 友好）

MONET数据集发布

金句 / Highlights

值得收藏与分享的关键句。

With 104M of image-text pairs, this is one of the largest, if not the largest, openly-licensed image dataset.
— 第1段
⬇︎ 下载 PNG 𝕏 分享到 X
MONET: An Apache2.0 deduped and recaptioned dataset of 105M samples unlocking reproducible text-to-image research.
— 第2段
⬇︎ 下载 PNG 𝕏 分享到 X
Nano T2I: A codebase to train your own T2I model.
— 第2段
⬇︎ 下载 PNG 𝕏 分享到 X

#MONET#文本到图像#数据集#Hugging Face

打开原文

Julien Chaumond 在 X 上的发言：

"拥有 1.04 亿张图像-文本对，这是目前最大规模的开源图像数据集之一，如果不是最大的话。

而且它就在

@huggingface

！！致敬

@heyjasperai"

URL 来源:

https://x.com/julien_c/status/2059994944861852053

Markdown 内容:

不要错过正在发生的事情

Julien Chaumond @julien_c

拥有 1.04 亿张图像-文本对，这是目前最大规模的开源图像数据集之一，如果不是最大的话。而且它就在

@huggingface

！！致敬

@heyjasperai

引用

Clément Chadebec @CChadebec 16h

Image 1: 📢 新版 @heyjasper 发布！ Image 2: 📢 MONET Image 3: 🌸 ：一个包含 1.05 亿样本的 Apache2.0 许可的去重和重新标注数据集，用于可复现的文本到图像研究。Nano T2I Image 4: 🖌️ ：一个用于训练自己的文本到图像模型的代码库 Image 5: 🤗 @huggingface：huggingface.co/datasets/jaspe Image 6: 💻 ：github.com/gojasper/nano- 非常

下午 1:47 · 2026 年 5 月 28 日

14.9K 次观看