T
traeai
Sign in
返回首页
Together AI Blog

Violin: An open-source video translation skill that breaks language barriers

7.5Score
Violin: An open-source video translation skill that breaks language barriers

TL;DR · AI Summary

Violin is an open-source video translation tool developed by Together AI, using multimodal models to achieve high-quality video content localization.

Key Takeaways

  • Violin supports multilingual video translation, enhancing cross-language content
  • Based on the Transformer architecture, it combines speech recognition and text t
  • Provides open-source code and pre-trained models for developers to further devel

Outline

Jump quickly between sections.

  1. Introduce the background and goal of Violin, solving the problem of cross-language video content dissemination.

  2. Describe the multimodal model architecture and key technologies used by Violin.

  3. List potential applications of Violin in education, entertainment, and business.

  4. Explain the open source resources provided by Violin and support for developers.

Mindmap

See how the topics connect at a glance.

查看大纲文本(无障碍 / 无 JS 友好)
  • Violin 视频翻译工具
    • 核心技术
      • 多模态模型
      • Transformer 架构
      • 语音识别
    • 应用场景
      • 教育
      • 娱乐
      • 商业
    • 开源支持
      • 预训练模型
      • 代码库
      • 开发者社区

Highlights

Key sentences worth saving and sharing.

  • Violin is an open-source video translation tool that supports multiple languages, enhancing cross-language content accessibility.

    Paragraph 1

    ⬇︎ 下载 PNG𝕏 分享到 X
  • It uses a multimodal model combining speech recognition and text translation to achieve high-quality video localization.

    Paragraph 2

    ⬇︎ 下载 PNG𝕏 分享到 X
  • It provides pre-trained models and complete code libraries for developers to perform secondary development and customization.

    Paragraph 3

    ⬇︎ 下载 PNG𝕏 分享到 X
#AI#Video Processing#Natural Language Processing
Open original article

Violin: An open-source video translation skill that breaks language barriers

Image 1⚡️ FlashAttention-4: up to 1.3× faster than cuDNN on NVIDIA Blackwell →

Image 2Introducing Together AI's new look →

Image 3🔎 ATLAS: runtime-learning accelerators delivering up to 4x faster LLM inference →

Image 4⚡ Together GPU Clusters: self-service NVIDIA GPUs, now generally available →

Image 5📦 Batch Inference API: Process billions of tokens at 50% lower cost for most models →

Image 6🪛 Fine-Tuning Platform Upgrades: Larger Models, Longer Contexts →

[](https://www.together.ai/)

  • ![Image 7 Serverless Inference High-performance inference as APIs](https://www.together.ai/serverless-inference)
  • ![Image 8 Batch Inference Inference for batch workloads](https://www.together.ai/batch-inference)
  • ![Image 9 Dedicated Model Inference Inference on custom hardware](https://www.together.ai/dedicated-model-inference)
  • ![Image 10 Dedicated Container Inference Inference for custom models](https://www.together.ai/dedicated-container-inference)

![Image 11 MiniMax M2.5 Image 12 Nano Banana Pro Image 13 Qwen3.5-397B Image 14 GLM-5 Image 15 kimi k2.5 Image 16 gpt-oss-120B Model library Explore the top open-source models](https://www.together.ai/models)

Accelerated Compute

  • ![Image 17 GPU Clusters Reliable GPU clusters at scale](https://www.together.ai/gpu-clusters)
  • ![Image 18 AI Factory Custom infrastructure at frontier scale](https://www.together.ai/ai-factory)

Developer Environments

  • ![Image 19 Sandbox Build development environments for AI](https://www.together.ai/sandbox)

Storage

  • ![Image 20 Managed Storage Store model weights & data securely](https://www.together.ai/managed-storage)
  • ![Image 21 Fine-Tuning Shape models with your data](https://www.together.ai/fine-tuning)
  • ![Image 22 Evaluations Measure model quality](https://www.together.ai/evaluations)

![Image 23 DeepSeek V3.1 Image 24 GLM 5 FP4 Image 25 Qwen3-VL 32B Image 26 gpt-oss-120b Image 27 kimi k2.5 Image 28 Llama 4 Maverick Model library Fine-tune top open-source models](https://www.together.ai/models)

  • ![Image 29 Research Systems research for production AI](https://www.together.ai/research)
  • ![Image 30 Research blog All our research publications](https://www.together.ai/research-blog)

Featured publications

Show all

  • ![Image 31 Documentation Technical docs for Together AI](https://docs.together.ai/)
  • ![Image 32 Demos Our open-source demo apps](https://www.together.ai/demos)
  • ![Image 33 Cookbooks Practical implementation guides](https://www.together.ai/cookbooks)
  • ![Image 34 Voice Agents Build voice agents for production](https://www.together.ai/solutions/voice)

Resources

  • ![Image 35 Customer stories Testimonials from AI Natives](https://www.together.ai/customers)
  • ![Image 36 Startup accelerator Build and scale your startup](https://www.together.ai/startup-accelerator)
  • ![Image 37 Customer support Find answers to your questions](https://www.together.ai/support)
  • ![Image 38 Blog Our latest news & blog posts](https://www.together.ai/blog)
  • ![Image 39 Events Explore our events calendar](https://www.together.ai/events)

Company

  • ![Image 40 About Get to know us](https://www.together.ai/about-us)
  • ![Image 41 Careers Join our mission](https://www.together.ai/careers)

*

  • ![Image 42 Serverless Inference High-performance inference as APIs](https://www.together.ai/serverless-inference)
  • ![Image 43 Batch Inference Inference for batch workloads](https://www.together.ai/batch-inference)
  • ![Image 44 Dedicated Model Inference Inference on custom hardware](https://www.together.ai/dedicated-model-inference)
  • ![Image 45 Dedicated Container Inference Inference for custom models](https://www.together.ai/dedicated-container-inference)

![Image 46 MiniMax M2.5 Image 47 Nano Banana Pro Image 48 Qwen3.5-397B Image 49 GLM-5 Image 50 kimi k2.5 Image 51 gpt-oss-120B Model library Explore the top open-source models](https://www.together.ai/models)

* Accelerated Compute

  • ![Image 52 GPU Clusters Reliable GPU clusters at scale](https://www.together.ai/gpu-clusters)
  • ![Image 53 AI Factory Custom infrastructure at frontier scale](https://www.together.ai/ai-factory)

Developer Environments

  • ![Image 54 Sandbox Build development environments for AI](https://www.together.ai/sandbox)

Storage

  • ![Image 55 Managed Storage Store model weights & data securely](https://www.together.ai/managed-storage)

*

  • ![Image 56 Fine-Tuning Shape models with your data](https://www.together.ai/fine-tuning)
  • ![Image 57 Evaluations Measure model quality](https://www.together.ai/evaluations)

![Image 58 DeepSeek V3.1 Image 59 GLM 5 FP4 Image 60 Qwen3-VL 32B Image 61 gpt-oss-120b Image 62 kimi k2.5 Image 63 Llama 4 Maverick Model library Fine-tune top open-source models](https://www.together.ai/models)

*

  • ![Image 64 Research Systems research for production AI](https://www.together.ai/research)
  • ![Image 65 Research blog All our research publications](https://www.together.ai/research-blog)

Featured publications

Show all

*

  • ![Image 66 Documentation Technical docs for Together AI](https://docs.together.ai/)
  • ![Image 67 Demos Our open-source demo apps](https://www.together.ai/demos)
  • ![Image 68 Cookbooks Practical implementation guides](https://www.together.ai/cookbooks)
  • ![Image 69 Voice Agents Build voice agents for production](https://www.together.ai/solutions/voice)

* Resources

  • ![Image 70 Customer stories Testimonials from AI Natives](https://www.together.ai/customers)
  • ![Image 71 Startup accelerator Build and scale your startup](https://www.together.ai/startup-accelerator)
  • ![Image 72 Customer support Find answers to your questions](https://www.together.ai/support)
  • ![Image 73 Blog Our latest news & blog posts](https://www.together.ai/blog)
  • ![Image 74 Events Explore our events calendar](https://www.together.ai/events)

Company

  • ![Image 75 About Get to know us](https://www.together.ai/about-us)
  • ![Image 76 Careers Join our mission](https://www.together.ai/careers)

Contact sales

Contact sales

Sign in

All blog posts

Research

Published 5/14/2026

Violin: An open-source video translation skill that breaks language barriers

Repo

Video has become one of the most popular mediums for information sharing. Yet, the language distribution of popular video contents on the internet does not necessarily reflect the diversity of global audiences. For example, a prior study found that 66% of videos from the top 250 YouTube channels are in English, while Spanish, the second most common language, accounts for only 15% [1,2], leaving much of this content inaccessible to viewers around the world. This gap highlights the need for scalable video translation solutions.

Can cutting-edge AI help break down language barriers, making video content more accessible to global audiences?

Today, we are excited to introduce Violin — a fully open-source video translation tool, powered by Together API. The violin pipeline uses state-of-the-art speech recognition, large language models, and speech synthesis to achieve high-quality video translation.

Beyond standard translation, we develop interactive and personalized features, such as a video-content–aware chat assistant and natural language voice picker. We hope Violin can empower users across languages to access information more easily and can help high-quality video content travel further across the web.

**Violin: Breaking the language barriers of video sharing**

To illustrate Violin’s capabilities, we took a recent technical talk from Together AI and translated it into a different language.

Video 1

Before translation

Video 2

After translation in Chinese

Watch the introduction of Dr. Percy Liang’s Together Talks series before translation (Left), and after translation in Chinese (Right).

Chat with the video. Violin also includes a built-in multimodal chat assistant that can answer questions based on the video’s content. Users can query details from the video, ask for summaries, or dive deeper into specific topics — all within the same interface.

Image 77

_The Violin Video Assistant: Ask any question about the video, and get answers grounded in the audio and visual contents._

**How Violin works**

Image 78

_How Violin Works:__From input video to fully translated output, Violin orchestrates three core stages: ASR (Automatic Speech Recognition), LLM translation, and TTS (Text-to-Speech) voice synthesis, while supporting Video chat assistant, and voice style personalization. All in Together AI cloud._

Violin works in three straightforward stages:

First, it extracts and transcribes the video's audio into timestamped text. We use Together’s Whisper V3 large endpoint that provides high quality multi-lingual transcription at an optimized speed.

Then a large language model translates that transcript. Here we leverage the latest advances of Deepseek V4 Pro as default translator. We also enable the user's input of a predefined list of translation rules to maintain the faithfulness and accuracy.

Finally, the TTS model generates translated speech, allowing users to specify their desired voice characteristics in plain text. Together-hosted Cartesia’s Sonic 3 supports a wide range of native speaker’s voices such as Korean, Dutch, Italian, and Chinese, making the translated video sound natural. Note that we do not allow voice cloning in our tool, but rather using a distinct voice than the original speaker and by default overlaying the new voice on top of the original voice at a low volume.

Besides, the video chat module lets you ask questions about the video, powered by a vision-language model that understands both what was said in audio and shown on screen. This is implemented by sampling the recent video frame as well as the subtitle context and sent to a vision-language model like Qwen3.5-397B-A17B for free-form question-answering. In this way, the model can return the proper response based on these contexts.

**Designed for everyone: Web app, CLI, and agent skills**

We built Violin with usability at its core. Whether you’re a content creator who prefers a simple web interface, a developer who lives in the command line, or an AI practitioner integrating tools into autonomous agents, Violin has you covered:

  • Web App – A clean, minimal frontend for uploading videos, selecting translation options, previewing results, and interacting with the video assistant. No code required.
  • CLI Tool – A straightforward command-line interface for scripting, batch processing, and integration into existing pipelines.
  • Agent Skills – We packaged Violin’s capabilities as a skill that can be dropped into common agent frameworks.

Everything — from the GUI to the backend models to the agent skills — is fully open source. We’re releasing the codebase under a permissive MIT license, inviting the community to adapt, extend, and improve. We believe open collaboration is the fastest path toward making video content truly language-agnostic.

**Get involved**

We’re just getting started, and we’d love your help. If you find Violin useful, or if you have ideas for how it could be better:

  • Visit our GitHub repository: github.com/shang-zhu/violin
  • Drop us a line at: [heyviolinai@gmail.com](mailto:heyviolinai@gmail.com)
  • Open a GitHub issue or start a discussion — we value every piece of feedback.
  • Try our demo app here(this will be hosted for a short period of time after the release)

Acknowledgments

We are grateful to Martijn Bartelds, Yongchan Kwon, Federico Bianchi, and Kaitlyn Zhou for their thoughtful feedback. We thank the open-source model builders behind Whisper, DeepSeek, Qwen, and Cartesia, whose work forms the foundation of Violin. Special thanks to Hassan El Mghari and Percy Liang for providing videos and feedback during the development.

Disclaimer

Violin provides the translation tool; users are solely responsible for the content they translate, including compliance with copyright and other applicable laws. Uploaded videos are deleted after 24 hours in the demo app.

[1] Wikipedia, "Languages used on the Internet," accessed May 8, 2026. https://en.wikipedia.org/wiki/Languages_used_on_the_Internet

[2] Brian Yang, "6 Common Features Of Top 250 YouTube Channels," Twinword, accessed May 12, 2026. https://www.twinword.com/blog/features-of-top-250-youtube-channels/

Start building on Together AI

From optimized training and model shaping to large-scale production inference

Get Started now

Image 79

* Products

  • Models

See all modelsDeepSeek Meta Qwen Google OpenAI Mistral AI Custom models * Developers

Pricing

* Resources

© 2026 Together AI. All Rights Reserved.

  • [](https://discord.gg/9Rk6sSeWEG)
  • [](https://x.com/togethercompute)
  • [](https://www.linkedin.com/company/togethercomputer/)

AI may generate inaccurate information. Please verify important content.