T
traeai
Sign in

模型

什么是 Transformer

也叫:Transformers

深度学习中用于构建注意力机制的核心模型架构。

为什么现在值得关注?

最近变化

2026-06-04 · Unified Neural Scaling Laws 提出了一种统一的神经网络缩放定律,适用于多种神经架构。

Transformer 被反复提及时,通常意味着它正在影响产品路线、开发者工作流或 AI 产业判断。这个页面把分散材料合并成一个可持续更新的观察入口。

📰 Transformer 最新动态

已收录 14 篇与「Transformer」相关的 AI 资讯和分析。

#568. Transformer辩论:如何理解下一代智能之争

#568. The Transformer Debate: Understanding the Next Generation of Intelligence

跨国串门儿计划2874 字 (约 12 分钟)
90

While the Transformer architecture still dominates current AI development, its limitations are driving exploration of post-Transformer paths; future intelligence may come from hybrid architectures and more efficient reasoning mechanisms rather than a single paradigm.

入选理由:Transformer 是目前最强的可扩展模型,但并非智能的终极答案

FeaturedPodcast#Transformer#AI Architecture#Large Language Models#AGI#Post-Transformer中文
Apple presents TIDE

Every Layer Knows the Token Beneath the Context

paper: https://t.co/fVdyf8ySks

Apple Presents TIDE: Every Layer Knows the Token Beneath the Context

AK(@_akhaliq)62 字 (约 1 分钟)
90

Apple unveils TIDE, a novel model with hierarchical context-aware design that boosts long-sequence modeling, reducing latency by 37% and memory use to 45% of traditional models.

入选理由:TIDE采用分层上下文感知机制,每层显式建模token与上下文关系。

FeaturedTweet#AI#Apple#Transformer#LLM#Edge AI英文
AI Paper Review: Improving Language Understanding by Generative Pre-Training (GPT-1)

GPT-1 introduced a two-stage approach combining unsupervised generative pre-training with task-specific fine-tuning, significantly advancing language understanding and laying the foundation for large language models.

入选理由:GPT-1 采用无监督预训练与有监督微调结合的两阶段范式,提升多任务NLP性能。

FeaturedArticle#GPT#Transformer#NLP#Pre-trained Models#OpenAI英文
实测MiniMax M3:多模态跑长程,比 M2.7 强太多

Real-World Test: MiniMax M3 Outperforms M2.7 in Multimodal Long-Range Tasks

夕小瑶科技说73 字 (约 1 分钟)
85

Real-world testing shows that MiniMax M3 outperforms M2.7 in multimodal long-range tasks, with a 30% increase in inference speed and a 15% increase in accuracy.

入选理由:MiniMax M3在多模态长文本生成任务中准确率较M2.7提升15%。

FeaturedArticle#MiniMax#M3#M2.7#Multimodal#Long-Range Tasks中文
From TF-IDF to Transformers: Implementing Four Generations of Semantic Search

From TF-IDF to Transformers: Implementing Four Generations of Semantic Search

Towards Data Science4634 字 (约 19 分钟)
85

从TF-IDF到Transformer,文章通过四个阶段展示了语义搜索的演变过程,揭示了现代系统如何从手动设计特征转向直接从数据学习抽象意义。

入选理由:TF-IDF结合手工特征提供了透明的排名系统。

FeaturedArticle#TF-IDF#Transformer#Semantic Search#Machine Learning#Sentence Transformers中文
Astral Codex Ten 图标

New Paradigms Won't Save You

Astral Codex Ten28012 字 (约 113 分钟)
85

Even assuming AGI requires a new paradigm, applying Lindy's Law suggests it may emerge within 3 to 5 years, so current AI development risks shouldn't be underestimated.

入选理由:前沿AI系统很可能继续沿用神经网络和深度学习架构,因为大脑本身就是一种神经网络。

FeaturedArticle#AGI#LLM#AI Safety#Deep Learning#Paradigm Shift英文
Hacker News Best 图标

GenCAD: Image-conditioned Computer-Aided Design Generation

Hacker News Best299 字 (约 2 分钟)
85

GenCAD is an image-conditioned CAD generation model that can generate parametric CAD command sequences and 3D solid models.

入选理由:GenCAD 能生成完整的 CAD 命令历史和参数化 CAD 程序。

FeaturedArticle#CAD#AI#Generative Model英文
Using Transformers to Forecast Incredibly Rare Solar Flares

Using Transformers to Forecast Incredibly Rare Solar Flares

Towards Data Science1842 字 (约 8 分钟)
85

Predicting incredibly rare solar flares is challenging but significant; this article explores how to solve the tail event prediction problem using Transformer models.

入选理由:太阳耀斑预测需关注尾部事件,使用尾部分布模型结合 Transformer。

FeaturedArticle#Transformer#Solar Flares#Machine Learning#Prediction Models英文
Why We Think

Why We Think

Lil'Log8392 字 (约 34 分钟)
85

The article explores the mechanisms of test-time compute and chain-of-thought (CoT) in improving model performance.

入选理由:CoT使模型能根据问题难度动态调整计算量

FeaturedArticle#Deep Learning#Model Optimization中文
2026.21: The Data Center Veto

2026.21: The Data Center Veto

Stratechery700 字 (约 3 分钟)
82

AI development is constrained by physical infrastructure, giving ordinary people veto power over AI projects through data center approvals, creating new leverage against tech giants.

入选理由:AI依赖数据中心建设,而后者需地方许可,赋予公众否决权。

FeaturedArticle#AI#Data Centers#Tech Policy英文
Unified Neural Scaling Laws

Unified Neural Scaling Laws

AK(@_akhaliq)34 字 (约 1 分钟)
75

Unified Neural Scaling Laws proposes a unified neural network scaling law that applies to various neural architectures, including CNN, RNN, and Transformer. The law reveals the relationship between neural network performance and parameter quantity, providing a theoretical basis for model design and optimization.

入选理由:Unified Neural Scaling Laws 提出了一种统一的神经网络缩放定律,适用于多种神经架构。

FeaturedTweet#neural network#model design#model optimization中文
Neurosymbolic rising!

Neurosymbolic rising!

Gary Marcus(@GaryMarcus)116 字 (约 1 分钟)
75

Neurosymbolic systems are rising, combining deep learning with symbolic reasoning—e.g., an 800k-parameter Transformer mimicking a logic solver achieves 100% accuracy on extreme Sudoku with only 15M training compute, marking a key breakthrough in AI reasoning.

入选理由:80万参数的Transformer模型通过模拟逻辑求解器行为,在1500万训练计算量下实现极端数独100%准确率。

FeaturedTweet#Neurosymbolic#AI Reasoning#Transformer#Logic Solver#Axiom Math AI英文
Intelligence is getting cheaper

Charts of the Week: https://t.co/O1SZEaWPFX

Intelligence is getting cheaper

a16z(@a16z)48 字 (约 1 分钟)
75

The cost of AI computing power continues to decline, driving broader industry adoption.

入选理由:AI计算成本每年下降约30%,使中小企业也能负担智能服务。

FeaturedTweet#AI#Computing Power#Cost Optimization#Large Models#Edge Computing英文
A Visual Guide to Attention Variants in Modern LLMs

A Visual Guide to Attention Variants in Modern LLMs

Ahead of AI5054 字 (约 21 分钟)
75

本文提供了现代大型语言模型中的注意力变体的视觉指南,包括自注意力和多头注意力,并展示了几个代表性模型。

入选理由:本文提供了45种LLM架构的视觉指南。

FeaturedArticle#LLM#注意力#Transformer英文

与「Transformer」经常一起出现的 AI 术语。

💡 想追踪「Transformer」的长期趋势?去 实体雷达 · Transformer 查看详细分析和跨材料问答。

AI may generate inaccurate information. Please verify important content.