The Small Model Infrastructure Nobody Built (So We Did) — Filip Makraduli, Superlinked

AI EngineerVideo2026年5月5日

7.5Score

Watchable video resourceOpen original video

TL;DR · AI Summary

This article introduces the motivation, challenges, and solutions behind Superlinked's development of inference infrastructure for small models.

Current infrastructure lacks sufficient support for small models, leading to per
Superlinked built its own inference engine to optimize deployment and execution
The infrastructure supports multiple model formats with low latency and high thr

Jump quickly between sections.

§Introduction and Problem Background
Overview of current limitations in small model inference infrastructure
·Why a New Infrastructure Is Needed
Analysis of limitations in existing systems for small models
·Superlinked's Solution
Design goals and core features of the self-developed inference engine
›Performance Optimization Strategies
Achieving low latency and high throughput
›Model Format Support
Support for ONNX, TorchScript, and other formats
·Future Outlook and Open Plans
Plans to open-source and continuously improve the infrastructure

See how the topics connect at a glance.

查看大纲文本（无障碍 / 无 JS 友好）

Key sentences worth saving and sharing.

We found that existing inference systems fall short in supporting small models, prompting us to build our own engine.
— Mid-presentation
⬇︎ 下载 PNG 𝕏 分享到 X
Our goal is to achieve millisecond-level latency and thousands of inferences per second throughput.
— Performance section
⬇︎ 下载 PNG 𝕏 分享到 X
We support multiple model formats to enhance flexibility and compatibility.
— Model support section
⬇︎ 下载 PNG 𝕏 分享到 X

#AI Engineering#Model Deployment#Infrastructure#Small Models