T
traeai
登录
返回首页
Latent Space

[AINews] not much happened today

7.0Score
[AINews] not much happened today
AI 深度提炼
  • vLLM 0.20提升内存效率和推理性能,支持多种硬件加速。
  • DeepSeek V4在多硬件上优化显著,B300比H200快8倍。
  • Poolside发布轻量化开源模型Laguna XS.2,适合单GPU部署。
#AI#深度学习#推理优化#开源模型
打开原文

[AINews] not much happened today - Latent.Space

![Image 1: Latent.Space](http://www.latent.space/)

[![Image 2: Latent.Space](https://substackcdn.com/image/fetch/$s_!1PJi!,e_trim:10:white/e_trim:10:transparent/h_72,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa4fe1182-38af-4a5d-bacc-439c36225e87_5000x1200.png)](http://www.latent.space/)

Subscribe Sign in

AINews: Weekday Roundups

[AINews] not much happened today

a quiet day.

Apr 29, 2026

∙ Paid

18

Share

When we made the AINews → Substack move, we committed to having Matt Levine style op-eds every day, but some days there just isn’t much going on and we will just say so - we are working on small essays around inference demand and multiagents, but today is not that day.

Interesting model releases from Nvidia Nemotron, Poolside, and Alec Radford, but it’s unclear any of them will stand the test of time. GPT-6 hype is beginning.

AI News for 4/27/2026-4/28/2026. We checked 12 subreddits, 544 Twitters and no further Discords. AINews’ website lets you search all past issues. As a reminder, AINews is now a section of Latent Space. You can opt in/out of email frequencies!
  • * *

**AI Twitter Recap**

**Inference Systems, vLLM 0.20, and the Hardware/Kernel Race Around DeepSeek V4**

  • **vLLM’s latest release is heavily about memory and MoE serving efficiency**: vLLM v0.20.0 shipped with **TurboQuant 2-bit KV cache** for **4× KV capacity**, FA4 re-enabled for MLA prefill on **SM90+**, a new **vLLM IR** foundation, fused RMSNorm for a reported **2.1% end-to-end latency improvement**, plus support updates spanning **DeepSeek V4 MegaMoE on Blackwell**, Jetson Thor, ROCm, Intel XPU, and easier GB200/Grace-Blackwell setup. In parallel, SemiAnalysis highlighted early DeepSeek V4 Pro serving results on **B200/B300/H200/GB200 disaggregated setups**, claiming **B300 can be up to 8× faster than H200** for this workload and pointing to upcoming vLLM 0.20 benchmarking with **DeepGEMM MegaMoE**, which fuses **EP dispatch + EP combine + GEMMs + SwiGLU** into a single mega-kernel.

**Open Model Releases: Poolside Laguna XS.2, NVIDIA Nemotron 3 Nano Omni, and TRELLIS.2**

  • **Poolside made its first public model release with an unusually deployment-friendly open-weight coder**: @poolsideai announced Laguna XS.2, a **33B total / 3B active MoE** coding model trained fully in-house, released under **Apache 2.0**, and advertised as able to run on a **single GPU**. Poolside’s broader release also included **Laguna M.1** and an agent harness, emphasizing that the company trained from scratch on its own **data, training infra, RL, and inference stack**. Community summaries added more color: Aymeric Roucher described two coder models—**225B/23B active** and **33B/3B active**—with **hybrid attention**, **FP8 KV cache**, and claimed performance near **Qwen-3.5**; Ollama shipped it immediately.
  • **NVIDIA’s Nemotron 3 Nano Omni was the day’s biggest infra-native model launch**: @NVIDIAAI introduced Nemotron 3 Nano Omni, an open **30B / A3B multimodal MoE** with **256K context** built for agentic workloads spanning **text, image, video, audio, and documents**. Distribution was immediate across the stack: OpenRouter, LM Studio, Ollama, Unsloth, fal, Fireworks, DeepInfra, Together, Baseten, Canonical, and others all announced same-day availability. Key specs surfaced in follow-on posts: Piotr Żelasko described it as NVIDIA’s first **omni** release with speech/audio understanding backed by a **Parakeet encoder**, **English-only** for now, and a **5.95% WER** on the Open ASR leaderboard. Several hosts cited **~9× throughput** versus comparable open omni models.
  • **Other notable model/paper releases**: Microsoft’s TRELLIS.2 is an open-source **4B image-to-3D model** producing up to **1536³ PBR textured assets**, built on native 3D VAEs with **16× spatial compression**. On the world-model side, World-R1 claims existing video models already encode **3D structure** and can be “woken up” with **RL**, requiring **no architecture changes, no extra video training data, and no added inference cost**.

**Agents, Local-First Tooling, and Production Orchestration**

**Benchmarks, Evals, and Research Findings Worth Watching**

**Platform Economics, API Pricing, and Closed-Model Reliability Concerns**

Keep reading with a 7-day free trial

Subscribe to Latent.Space to keep reading this post and get 7 days of free access to the full post archives.

Start trial

Already a paid subscriber? **Sign in**

Previous

© 2026 Latent.Space · PrivacyTermsCollection notice

Start your SubstackGet the app

Substack is the home for great culture