SkillOS: Learning Skill Curation for Self-Evolving Agents
SkillOS is a skill orchestration system for self-evolving agents, achieving 34% higher accuracy through dynamic skill library and meta-learning mechanisms.
入选理由:SkillOS 采用动态技能库,支持实时技能增删与更新。
人物
也叫:_akhaliq
技术专家,专注于 GPU 和 AI 加速领域的研究。
已收录 18 篇与「AK」相关的 AI 资讯和分析。
SkillOS is a skill orchestration system for self-evolving agents, achieving 34% higher accuracy through dynamic skill library and meta-learning mechanisms.
入选理由:SkillOS 采用动态技能库,支持实时技能增删与更新。
This article explores a new approach to GPU kernel runtime optimization using language models as selective surrogates, achieving significant performance improvements by predicting and selecting optimal kernel configurations.
入选理由:语言模型被用作选择性代理,预测 GPU 内核的最佳配置。
This article explores the limitations of Visual Language Models (VLMs) in handling spatial questions, highlighting their tendency to confidently generate answers even when visual cues are ambiguous, and suggests introducing uncertainty mechanisms to improve model robustness.
入选理由:VLMs 在缺乏明确视觉线索时,仍可能自信地生成空间问题的答案。
LongMINT is a new benchmark testing framework for evaluating memory capabilities under multi-target interference in long-horizon agent systems, which has gained attention through academic sharing on Twitter. This framework specifically addresses memory interference issues in AI agents during long-term tasks and provides standardized testing methods for measuring continuous learning and memory management capabilities of agent systems.
入选理由:LongMINT是专门评估长视界智能体记忆干扰的新基准测试框架
Mix-Quant technology significantly improves the efficiency and precision balance of agentic LLMs through a hybrid strategy of quantized prefilling and precise decoding, providing new optimization directions for large model deployment.
入选理由:Mix-Quant采用量化预填充和精确解码的混合策略优化LLM性能
MulTaBench is a benchmark for evaluating multimodal tabular learning with text and image.
入选理由:MulTaBench 包含 12 个数据集和 3 种任务类型。
The article introduces the MACE-Dance model for music-driven dance video generation.
入选理由:MACE-Dance 是一种音乐驱动的舞蹈视频生成模型。
MiniCPM-o 4.5 提出了一种实时全双工多模态交互的新方法,但缺乏详细的技术实现细节。
入选理由:MiniCPM-o 4.5 支持实时全双工多模态交互。
ESI-Bench is a novel benchmark focused on evaluating embodied spatial intelligence models in perception-action loops, offering more challenging scenarios and metrics than existing tests.
入选理由:ESI-Bench 采用连续 3D 轨迹预测任务,比现有基准更具挑战性
企业系统是否需要学习世界模型?文章探讨了上下文对推断动态的重要性,强调了在复杂环境中理解背景信息的价值。
入选理由:在企业系统中,上下文对于推断系统的动态行为至关重要。
PhyMotion introduces a structured 3D motion reward mechanism grounded in physics to enhance the realism of human video generation.
入选理由:PhyMotion 引入物理约束以增强视频生成的真实性。
Research shows that a single neuron can bypass the safety alignment of large language models.
入选理由:单个神经元可破坏模型安全对齐
AK 在推特上分享了一种新的视频叙事生成方法 CausalCine,利用实时自回归生成技术。
入选理由:实时生成多镜头视频故事
文章推荐了一篇关于企业系统是否需要学习世界模型的研究论文,探讨了上下文对推理的重要性。
入选理由:论文《Do Enterprise Systems Need Learned World Models?》探讨了企业系统中学习世界模型的需求。
SVGS enhances Gaussian splatting using primitives with spatially varying colors, but the article has limited information.
入选理由:SVGS利用空间变化颜色提升渲染效果。
This tweet only provides a paper link without specific content, making it impossible to evaluate the actual value of the LongMINT framework in memory evaluation for long-horizon agents, with low information density.
入选理由:仅有论文标题LongMINT: Evaluating Memory under Multi-Target Interference提示研究方向
AK shared a paper on TMAS, but the specific content was not displayed in the tweet.
入选理由:AK 分享了一篇论文
The tweet only provides a link to a paper without any specific content.
入选理由:推文提供了论文链接
与「AK」经常一起出现的 AI 术语。
💡 想追踪「AK」的长期趋势?去 实体雷达 · AK 查看详细分析和跨材料问答。