T
traeai
Sign in

产品

llama.cpp

别名:llama-server

一个支持 CPU、GPU 和 Apple 芯片的 C++ 推理引擎,用于本地运行大语言模型。

已跟踪 7 条高相关材料

TraeAI 观察

相关材料

已收录 7 条与 llama.cpp 相关的内容,按评分排序。

Gemma 4 12B: The Developer Guide

Gemma 4 12B: The Developer Guide

Google Developers Blog1171 字 (约 5 分钟)
92

Gemma 4 12B features an encoder-free multimodal architecture that runs locally on 16GB VRAM devices with native audio support. By eliminating separate vision and audio encoders, it reduces latency and pairs with a dedicated MTP model for faster inference, marking the first mid-sized multimodal model with a macOS desktop app for fully offline interaction.

入选理由:Gemma 4 12B移除独立编码器,视觉仅用35M参数嵌入层,音频直接线性投影至LLM输入空间

FeaturedArticle#Gemma 4#Multimodal LLM#Encoder-Free Architecture#Local AI#Google英文
How to Run LLMs Locally (Great For Learning and Privacy)

How to Run LLMs Locally (Great For Learning and Privacy)

ByteByteGo1316 字 (约 6 分钟)
85

本地运行大语言模型(LLMs)可通过 llama.cpp、Ollama 和 LM Studio 等工具实现,兼顾隐私与学习。

入选理由:使用 llama.cpp 可在消费级硬件上运行大型模型,支持 4-bit 量化。

FeaturedVideo#LLM#本地运行#AI#量化#Ollama英文
Hugging Face Blog 图标

Reachy Mini goes fully local

Hugging Face Blog1966 字 (约 8 分钟)
85

Reachy Mini now runs its voice backend locally, eliminating the need for cloud servers.

入选理由:部署本地语音后端于 Reachy Mini 上。

FeaturedArticle#Reachy Mini#Voice Backend#Local Service中文
This is where we are right now. And i’m not gonna lie it feels pretty magical 🧚‍♀️

Qwen3.6 27B run...

Julien Chaumond 展示 Qwen3.6-27B 模型通过 Llama.cpp 在 MacBook Pro 上本地运行 Pi 编程代理,处理 Hugging Face 代码库任务时性能逼近 Claude Opus,且完全离线。

入选理由:Qwen3.6-27B 已可在消费级 Mac 本地高效运行编程任务

FeaturedTweet#Qwen#Llama.cpp#Pi Agent#Local LLM#Hugging Face中文
llama.cpp with MTP support makes local models fast enough to use as daily drivers 🚀 

Qwen3.6-27B d...

llama.cpp with MTP Support Makes Local Models Fast Enough for Daily Use

clem 🤗(@ClementDelangue)92 字 (约 1 分钟)
75

With MTP support, llama.cpp improves local model inference speed by 78%, boosting Qwen3.6-27B from 25 to 45 tokens/sec on A10G.

入选理由:MTP 支持使 llama.cpp 推理速度提升 78%

FeaturedTweet#llama.cpp#MTP#Qwen#local model#inference speed英文
I've seen some confusion online on how to run llama.cpp with MTP (Multi-token prediction) in the sim...

How to Run llama.cpp with MTP (Multi-token Prediction)

Julien Chaumond(@julien_c)255 字 (约 2 分钟)
75

MTP is a new speculative decoding feature built into llama.cpp that can approximately double token generation speed for most use cases, achieving ~30 tok/sec with the Dense 27B model and ~100 tok/sec with the MoE model.

入选理由:MTP是内置于模型本身的投机解码新特性,可将token生成速度提升约2倍

FeaturedTweet#llama.cpp#MTP#Speculative Decoding#Qwen#LLM Inference Optimization英文
> Ecosystem: Compatible with llama.cpp, MLX, @LMStudio, vLLM, @ollama, @UnslothAI, and SGLang.
&g...

Google AI Developers: Gemma 4 Ecosystem Compatibility and Downloads

Google AI Developers(@googleaidevs)78 字 (约 1 分钟)
65

Google announces its model weights are compatible with major open-source ecosystems and can be directly downloaded from Hugging Face and Kaggle, lowering deployment barriers.

入选理由:Gemma 4 权重与 llama.cpp、vLLM、Ollama 等生态兼容,便于本地部署与推理。

FeaturedTweet#Gemma#Open-source Ecosystem#Model Deployment#Hugging Face#Kaggle英文

跨材料问答 · llama.cpp

回答基于:llama.cpp 相关 7 条材料
    0 / 500

    AI may generate inaccurate information. Please verify important content.