T
traeai
Sign in

公司

Together AI

别名:togetherai

云服务提供商,专注于为大型语言模型提供高效的推理平台。

已跟踪 9 条高相关材料

TraeAI 观察

相关材料

已收录 9 条与 Together AI 相关的内容,按评分排序。

Serving MiniMax-M3 for efficient inference: Unlocking 1M-Token Context and Multimodality Without Regrets

Together AI optimized the deployment of MiniMax M3, achieving 81–125% throughput improvements through architectural and engineering innovations.

入选理由:MiniMax M3 supports 1M-token context and native multimodality, making it suitable for complex real-world tasks.

FeaturedArticle#MiniMax#M3#Sparse Attention#Multimodality#Inference Optimization英文
Engineering voice agents: Latency, quality, and scale — Rishabh Bhargava, Together AI

Building high-quality, low-latency, scalable voice agents is now an engineering challenge requiring real-time response (<500ms), complex instruction handling, and tool calling — supported by Together AI’s infrastructure.

入选理由:语音代理必须在500毫秒内响应,否则用户会挂断电话,实时性是核心指标。

FeaturedVideo#Voice AI#Latency Optimization#Together AI#Agent Engineering英文
How Together AI built the world’s fastest speech-to-text stack

How Together AI built the world’s fastest speech-to-text stack

Together AI Blog1720 字 (约 7 分钟)
85

Together AI optimized their speech-to-text stack, achieving faster transcription speeds by using profile-aware TensorRT, optimizing the decoder loop, and improving CPU paths. They serve the two lowest-latency models, with the fastest model transcribing 20 hours of speech in under 10 seconds.

入选理由:Together AI built the world's fastest speech-to-text stack.

FeaturedArticle#Together AI#speech-to-text英文
Benchmarking inference at scale: coding agents

Benchmarking inference at scale: coding agents

Together AI Blog1358 字 (约 6 分钟)
85

Together Inference Engine delivers 31% more TPS than next fastest OSS engine on same hardware, maintains 2× better TTFT at saturation. Performance gains come from full-stack optimization.

入选理由:ThunderMLA、自定义内核重写和端到端优化使Together引擎比其他OSS引擎多31%的TPS

FeaturedArticle#Together AI#Inference Engine#Coding Agent#Performance Optimization#TTFT英文
Together AI and Pearl Research Labs Team Up to Reduce the Cost of AI Inference

Together AI and Pearl Research Labs Team Up to Reduce the Cost of AI Inference

Together AI Blog979 字 (约 4 分钟)
85

Together AI and Pearl Research Labs have partnered to reduce AI inference costs through technologies like FlashAttention-4 and ATLAS.

入选理由:FlashAttention-4 提升推理速度达 1.3 倍。

FeaturedArticle#AI#Inference Optimization英文
Violin: An open-source video translation skill that breaks language barriers

Violin: An open-source video translation skill that breaks language barriers

Together AI Blog1617 字 (约 7 分钟)
75

Violin is an open-source video translation tool developed by Together AI, using multimodal models to achieve high-quality video content localization.

入选理由:Violin 支持多语言视频翻译,提升跨语言内容可访问性。

FeaturedArticle#AI#Video Processing#Natural Language Processing英文
DeepSeek-V4 Pro now available on Together AI

DeepSeek-V4 Pro Now Available on Together AI

Together AI Blog1895 字 (约 8 分钟)
75

Together AI launches DeepSeek-V4 Pro model with high-performance inference and multiple computing options.

入选理由:DeepSeek-V4 Pro 在 NVIDIA Blackwell 上实现 1.3 倍速度提升。

FeaturedArticle#AI#Model Deployment#Deep Learning中文
Foundational research powering efficient inference at scale

Foundational research powering efficient inference at scale

Together AI Blog2272 字 (约 10 分钟)
75

文章介绍了Together AI的多项技术进展,包括FlashAttention-4、ATLAS加速器和Batch Inference API更新,显著提升了大规模推理效率。

入选理由:FlashAttention-4比cuDNN快1.3倍

FeaturedArticle#AI#Inference#Efficiency#Together AI英文

跨材料问答 · Together AI

回答基于:Together AI 相关 9 条材料
    0 / 500

    AI may generate inaccurate information. Please verify important content.