T
traeai
Sign in

模型

GPT-5.5

别名:GPT5.5

OpenAI 发布的前沿语言模型。

已跟踪 30 条高相关材料

TraeAI 观察

相关材料

已收录 30 条与 GPT-5.5 相关的内容,按评分排序。

OpenAI's GPT-5.5 and Codex Reach General Availability on Amazon Bedrock

OpenAI 的 GPT-5.5 和 Codex 现已通过 Amazon Bedrock 提供,支持企业级治理和合规性。

入选理由:GPT-5.5 和 Codex 现在可通过 Amazon Bedrock 使用,无需引入新供应商。

FeaturedArticle#OpenAI#Amazon Bedrock#AI#云服务英文
Introducing new capabilities to GPT-Rosalind

Introducing new capabilities to GPT-Rosalind

OpenAI Blog2278 字 (约 10 分钟)
85

OpenAI introduces a new model update to GPT-Rosalind, designed for life sciences research at enterprise scale. The updated model combines GPT-5.5's agentic coding and tool-use capabilities with stronger model intelligence in core drug-discovery domains such as medicinal chemistry and genomics. GPT-Rosalind shows broad performance gains on research tasks from biology experts, complex medicinal chemistry queries, quantitative biology, and wet lab troubleshooting.

入选理由:GPT-Rosalind combines GPT-5.5's agentic coding and tool-use capabilities with stronger model intelligence in core drug-discovery domains.

FeaturedArticle#GPT-Rosalind#life sciences#research#performance improvement#model update英文
MiniMax M3一手实测:老黄PPT上74个Logo,我以为能难住它

MiniMax M3 is China's first open-source model with simultaneous long-context, multimodal, and coding capabilities; it scored 59% on SWE-Bench Pro, outperforming GPT-5.5 and Gemini 3.1 Pro, with efficiency boosted to 1/20 of the previous generation.

入选理由:M3在SWE-Bench Pro上得分59%,超越GPT-5.5和Gemini 3.1 Pro

FeaturedArticle#MiniMax#Open Source Model#Multimodal#Coding Capability#AI Evaluation中文
OpenAI models and Codex on Amazon Bedrock are now generally available

OpenAI Models and Codex on Amazon Bedrock Are Now Generally Available

AWS Machine Learning Blog965 字 (约 4 分钟)
85

OpenAI’s GPT-5.5, GPT-5.4, and Codex are now generally available on Amazon Bedrock for production deployment, matching OpenAI’s pricing and inheriting AWS security & governance frameworks.

入选理由:GPT-5.5 在 Bedrock 上提供与 OpenAI 直接调用相同的每 token 定价,无额外费用。

FeaturedArticle#OpenAI#Amazon Bedrock#GPT-5.5#Codex#AI Inference英文
Finally a good benchmark (DeepSWE)

Finally a Good Benchmark (Deep Suite)

Matthew Berman3734 字 (约 15 分钟)
85

Deep Suite is a software engineering benchmark designed to provide more accurate model evaluations than existing public benchmarks. It offers four major advantages: contamination-free tasks, high diversity, real-world complexity, and reliable verification. According to Deep Suite's testing, GPT 5.5 outperforms Opus 4.7.

入选理由:Deep Suite 通过手写任务避免了模型在预训练期间看到解决方案的问题。

FeaturedVideo#AI#Machine Learning#Deep Learning#Natural Language Processing#Software Engineering中文
Hacker News Best 图标

I think Anthropic and OpenAI have found product-market fit

Hacker News Best1867 字 (约 8 分钟)
85

文章认为 Anthropic 和 OpenAI 已经找到了产品市场契合点,通过提高 API 价格锁定企业客户。

入选理由:Anthropic 和 OpenAI 都提高了 API 价格,锁定企业客户。

FeaturedArticle#Anthropic#OpenAI#API 价格#企业客户#产品市场契合点英文
https://t.co/o6CEQEW0V4

https://t.co/o6CEQEW0V4

向阳乔木(@vista8)2575 字 (约 11 分钟)
85

Every公司的CEO Dan Shipper分享了AI工具在实际工作中的应用,揭示了AI越强反而使人更忙的现象,并预测未来工作方式将向公司级和工作操作系统方向发展。

入选理由:AI工具在实际工作中存在缺陷,无法主动发现问题并重新定义。

FeaturedTweet#AI#Every公司#Dan Shipper#工作方式变革#SaaS中文
Underappreciated how capable GPT-5.5 is at cybersecurity:

Underappreciated how capable GPT-5.5 is at cybersecurity:

Greg Brockman(@gdb)94 字 (约 1 分钟)
85

GPT-5.5 被低估了其在网络安全领域的强大能力,成功发现了一个27年的远程代码执行漏洞。

入选理由:GPT-5.5 发现了一个1999年引入的27年-old RCE漏洞。

FeaturedTweet#GPT-5.5#网络安全#RCE漏洞#人工智能英文
OpenAI Blog 图标

Warp’s big bet on building open source with GPT-5.5

OpenAI Blog884 字 (约 4 分钟)
85

Warp 使用 GPT-5.5 推动开源软件开发,通过 Open Agentic Development 模型,人类定义目标,AI 代理执行任务,提高开发效率和代码质量。

入选理由:Warp 引入 Open Agentic Development 模型,AI 代理协助编写代码,提高开发效率。

FeaturedArticle#Warp#GPT-5.5#Open Agentic Development#Oz#开源软件开发英文
Simon Willison's Weblog 图标

I think Anthropic and OpenAI have found product-market fit

Simon Willison's Weblog1867 字 (约 8 分钟)
85

Anthropic和OpenAI通过调整定价策略,表明它们已经找到了产品市场契合点,企业客户现在按API价格付费,而非之前的折扣价。

入选理由:Anthropic和OpenAI将企业客户的定价从折扣价改为API价格。

FeaturedArticle#Anthropic#OpenAI#产品市场契合点#定价策略#企业客户中文
ITBench-AA: Frontier Models Score Below 50% on the First Benchmark for Agentic Enterprise IT Tasks — by Artificial Analysis and IBM

ITBench-AA is a new benchmark series evaluating models on agentic enterprise IT tasks, starting with Site Reliability Engineering tasks where frontier models score below 50% on ITBench-AA's SRE tasks benchmark model performance on Kubernetes incident response, where models and agents must diagnose live systems by reading logs, tracing dependencies, and identifying root-cause entities across complex infrastructure.

入选理由:Claude Opus 4.7 在 ITBench-AA 中表现最佳,得分为 47%

FeaturedArticle#ITBench-AA#Site Reliability Engineering#Frontier Models#IBM#Kubernetes中文
The Latest Codex Updates and The Truth about Opus 4.8

The Latest Codex Updates and The Truth about Opus 4.8

Riley Brown6488 字 (约 26 分钟)
78

Anthropic released Claude Opus 4.8, but experts like Greg Eisenberg and Matt Wolf argue it’s nearly indistinguishable from 4.7, signaling a shift to iPhone-style incremental upgrades; Deep Suite data shows GPT 5.5 outperforms Opus 4.8 in coding tasks at lower cost and token usage, while OpenAI’s Codex saw undisclosed but impactful updates.

入选理由:Opus 4.8与4.7对比,作者及多位专家均无法分辨性能差异,体现模型演进进入‘iPhone式’渐进阶段。

FeaturedVideo#AI Models#Claude#GPT-5.5#Codex#SWEBench英文
Open source is going to win

We already have an open-weights model competitive with GPT-5.5 and Opus...

Open source is going to win

Paul Couvert(@itsPaulAi)203 字 (约 1 分钟)
75

The open-weight model MiniMax M3 has reached performance comparable to GPT-5.5 and Opus 4.7, outperforming Gemini 3.1 Pro in coding tasks, and costs 10x less to use, with weights to be released on Hugging Face next week.

入选理由:MiniMax M3在SWE Bench Pro上与GPT-5.5性能相当

FeaturedTweet#Open Source#AI Model#MiniMax M3#GPT-5.5#Gemini英文
OpenAI + Amazon Bedrock:

OpenAI + Amazon Bedrock

Greg Brockman(@gdb)74 字 (约 1 分钟)
75

OpenAI's GPT-5.5, GPT-5.4, and Codex models are now generally available on Amazon Bedrock, supporting auto-scaling and next-gen inference engine for building multi-step autonomous agents.

入选理由:GPT-5.5、GPT-5.4 和 Codex 已在 Amazon Bedrock 上正式可用,支持自动扩展。

FeaturedTweet#OpenAI#Amazon Bedrock#GPT-5.5#AI Models#Cloud Services英文
$10K Cursor Credits 到期了,很想念它 😄

5月放开用 Cursor,差不多用了 $2K,大致整理了 Cursor 使用体验:
· 100% 时间都在用 Agent Window...

$10K Cursor Credits Expired, Miss It So Much 😄

meng shao(@shao__meng)400 字 (约 2 分钟)
75

After the $10K Cursor credit expired, users reported that Agent Window mode almost completely replaced traditional IDEs; GPT-5.5 and Composer 2.5 performed well in different scenarios, especially Composer 2.5 Fast mode which is fast and good at generating flowcharts, but default output is not Markdown and cannot be copied directly as Markdown, affecting efficiency.

入选理由:用户 100% 时间使用 Cursor 的 Agent Window,未打开传统 IDE 界面。

FeaturedTweet#Cursor#AI Editor#Agent Window#GPT-5.5#Composer 2.5中英混合
Lovable on How GPT-5.5 Unlocks Better Planning for Complex Builds

GPT-5.5 significantly improves planning for complex builds: 31% better intent understanding, 22% fewer memory lapses, enabling non-coders to focus on goals, not code.

入选理由:GPT-5.5 规划阶段意图理解提升31%,减少重复交互需求。

FeaturedVideo#GPT-5.5#AI Planning#Lovable#No-code Development英文
Major upgrade to GPT-Rosalind, with much better intelligence for drug discovery, analysis, design, a...

Major GPT-Rosalind Upgrade: Enhanced Agentic Intelligence for Drug Discovery

Greg Brockman(@gdb)104 字 (约 1 分钟)
72

GPT-Rosalind's major upgrade integrates GPT-5.5's agentic coding and tool-use capabilities, significantly boosting enterprise-grade AI efficacy in drug discovery, analysis, and experimental workflows.

入选理由:GPT-Rosalind集成GPT-5.5的Agentic Coding能力,支持自动化药物研发代码生成与调试。

FeaturedTweet#GPT-Rosalind#AI Drug Discovery#GPT-5.5#Agentic Coding英文
OpenAI挖走中科大少年班校友!12岁上大学,哈佛史上最年轻正教授

Yin Xi joins OpenAI on sabbatical to advance AI-theoretical physics research, claiming AI can replicate human intelligence limits and accelerate science by 100x.

入选理由:尹希12岁入中科大少年班,31岁成哈佛最年轻华人正教授,现以学术休假身份加入OpenAI。

FeaturedArticle#OpenAI#AI for Science#Theoretical Physics中文
SWEbench is done.

SWEbench is Done.

Matthew Berman212 字 (约 1 分钟)
55

The article questions the credibility of the SWEbench benchmark, noting that GPT-5.5 significantly outperforms Claude Opus 4.7 in DeepSuite (70% vs 54%), but SWEbench results show the opposite, suggesting the benchmark may be invalid.

入选理由:SWEbench测试结果被质疑,GPT-5.5在DeepSuite中得分为70%,显著高于Claude Opus 4.7的54%。

FeaturedVideo#SWEbench#DeepSuite#GPT-5.5#Claude Opus#AI Evaluation英文
[AINews] Founders and Forward Deployed Engineers

[AINews] Founders and Forward Deployed Engineers

Latent Space1866 字 (约 8 分钟)
55

Anthropic released Claude Opus 4.8, showing incremental but not dominant gains across benchmarks—especially regressing on document parsing fidelity. Platform updates like mid-conversation system instructions improve engineering usability, yet API pricing remains a major pain point. Hugging Face also exposed a subtle RL training bug where re-tokenization breaks gradient flow in multi-turn tool-use loops.

入选理由:Claude Opus 4.8 在 CursorBench 上效率更高,但相比 4.7 仅小幅提升且在内容忠实性/图表解析上出现退步

FeaturedArticle#Anthropic#RL#Agent#API#Benchmark英文
Viking(@vikingmute) 图标

Recently, DeepSeek-V4 Pro feels really good—especially because it’s cheap!

Viking(@vikingmute)174 字 (约 1 分钟)
52

DeepSeek-V4 Pro is praised for cost-effectiveness in small tasks like code review and writing, replacing expensive Qwen-Max; current primary model ranking: GPT-5.5 > Claude 4.7 > DeepSeek-V4 Pro.

入选理由:DeepSeek-V4 Pro在小任务(如review、写作)中表现良好且价格显著低于Qwen-Max

FeaturedTweet#DeepSeek#Qwen#LLM Selection#Cost Optimization中英混合
DeepSWE 关于 Opus 4.8 的评分来了,强于 4.7 ,而且成本更低,效率更高,但是仍然落后 GPT5.5 很多,我还没有深度使用。甚至我还在用 4.6,没别的原因,就是便宜。

而且我现...

DeepSWE’s evaluation shows Opus 4.8 outperforms 4.7 in performance, cost, and efficiency, yet still lags far behind GPT-5.5; the author continues using cheaper 4.6 without deep testing of 4.8 or 5.5, and expresses skepticism toward benchmarks, preferring real user feedback from social media.

入选理由:Opus 4.8 性能强于 4.7,同时具备更低推理成本与更高效率,但未达 GPT-5.5 水平。

FeaturedTweet#Large Language Model#Benchmark#Opus#GPT-5.5#Cost-Efficiency中文
SWEbench is done.

SWEbench is done.

Matthew Berman212 字 (约 1 分钟)
45

SWEbench benchmark is invalid as GPT 5.5 scores 70% on Deep Suite versus Opus 4.7's 54%, showing opposite trends in SWEbench, indicating unreliability.

入选理由:GPT 5.5 achieves 70% accuracy on Deep Suite, significantly outperforming Opus 4.7 at 54%.

FeaturedVideo#SWEbench#Deep Suite#GPT#Opus#Gemini英文
11 is an even row window according to GPT 5.5 thinking.

11 is an even row window according to GPT 5.5 thinking.

Suhail(@Suhail)50 字 (约 1 分钟)
20

The fictional GPT-5.5 incorrectly classifies the number 11 as an 'even row window', revealing severe flaws in basic math and terminology understanding.

入选理由:GPT-5.5被指称将11误判为‘even row window’,实为对‘even’与‘row/window’等术语的语义混淆。

FeaturedTweet#AI Hallucination#LLM#Math Literacy英文

跨材料问答 · GPT-5.5

回答基于:GPT-5.5 相关 30 条材料
    0 / 500

    AI may generate inaccurate information. Please verify important content.