Towards Data Science 还有哪些别名？

Towards Data Science 也被称为：TDS。

Towards Data Science 最近有什么新动态？

traeai 已收录 28 篇与 Towards Data Science 相关的内容。最新一篇是「RAG Is Burning Money — I Built a Cost Control Layer to Fix It」，由 Towards Data Science 发布。

公司

什么是 Towards Data Science？

Q: 什么是 Towards Data Science？

提供数据科学与AI技术教程的平台

也叫：TDS

提供数据科学与AI技术教程的平台

为什么现在值得关注？

如果只读 3 篇

RAG Is Burning Money — I Built a Cost Control Layer to Fix It

Towards Data Science · 9.2 分

From Regex to Vision Models: Which RAG Technique Fits Which Problem

Towards Data Science · 9 分

RAG Is Not Machine Learning, and the ML Toolkit Solves the Wrong Problem

Towards Data Science · 8.7 分

📰 Towards Data Science 最新动态

已收录 28 篇与「Towards Data Science」相关的 AI 资讯和分析。

RAG Is Burning Money — I Built a Cost Control Layer to Fix It

Towards Data Science5月30日4995 字 (约 20 分钟)

RAG systems often incur hidden costs due to context over-fetching, lack of caching, and no model routing; the author built a cost control layer using semantic caching (98.5% hit rate), query routing (81% requests shifted to low-cost models), and token-budget circuit breaking, achieving 85.8% cost reduction at 10k requests/day without quality loss.

入选理由：上下文过取使每查询平均多消耗350 tokens，10k请求/日造成$52.5/日浪费（按$0.015/1K tokens计）

FeaturedArticle#RAG#Cost Optimization#Semantic Caching#Model Routing#LLM英文

From Regex to Vision Models: Which RAG Technique Fits Which Problem

Towards Data Science6月2日4997 字 (约 20 分钟)

RAG techniques are not universal; choose based on document structure and query control: use regex for templated docs, LLMs for sarcasm detection in transcripts, and vision models for schematics.

入选理由：模板化文档（如保险单、银行流水）适合用正则表达式提取字段，避免使用高成本的 RAG 流程。

FeaturedArticle#RAG#LLM#Document Intelligence#Vision Models#Enterprise AI英文

RAG Is Not Machine Learning, and the ML Toolkit Solves the Wrong Problem

Towards Data Science6月2日6346 字 (约 26 分钟)

RAG is not machine learning, and the ML toolkit solves the wrong problem. The article argues that despite its resemblance to ML, RAG is fundamentally a search system, not a model, making hyperparameter tuning and embedding fine-tuning ineffective and misleading.

入选理由：RAG 解决的是确定性答案查找问题，而非预测未知结果，因此不能用 ML 方法优化。

FeaturedArticle#RAG#Machine Learning#Enterprise AI#Information Retrieval#LLM英文

Embeddings Aren’t Magic: The Predictable Failure Modes of RAG Retrieval

Towards Data Science6月1日9526 字 (约 39 分钟)

RAG systems rely on embeddings that fail predictably: when queries use different terms than docs (e.g., ‘overtime’ vs ‘non-employee labor’), contain negations, or depend on exact IDs/codes, retrieval fails. The article argues enterprise reliability comes from upstream filtering (expert keywords, doc structure), not rerankers atop weak retrieval.

入选理由：嵌入模型在处理同义词/拼写变体时表现优异（如‘cancel’→‘termination procedures’），但对术语不一致问题无能为力

FeaturedArticle#RAG#Embedding#Retrieval#Enterprise AI#Document Intelligence英文

Proxy-Pointer RAG: Solving Entity and Relationship Sprawl in Large Knowledge Graphs

Towards Data Science5月21日3847 字 (约 16 分钟)

Proxy-Pointer RAG reduces the computational cost of entity and relationship reconciliation in knowledge graphs by over 90% by preserving document structure, enabling millisecond-scale ingestion without full-graph traversal.

入选理由：Proxy-Pointer RAG 使用 Skeleton Tree 和 Breadcrumb Injection 技术，使向量检索能精准定位文档完整结构段，而非碎片化块。

FeaturedArticle#RAG#Knowledge Graph#Proxy-Pointer#Entity Resolution#Vector Retrieval英文

Backpropagation Explained for Beginners (Part 1): Building the Intuition

Towards Data Science7月21日3374 字 (约 14 分钟)

反向传播是神经网络训练的核心机制，通过梯度下降优化参数。本文以直观方式拆解其数学原理，适合深度学习入门者。

入选理由：反向传播通过链式法则计算梯度，优化模型参数

FeaturedArticle#深度学习#神经网络#反向传播#梯度下降英文

Automatically Assign a Category to Uncategorized Rows in Power Query and DAX

Towards Data Science7月21日2493 字 (约 10 分钟)

本文介绍如何在Power Query和DAX中自动为未分类数据分配类别，通过分析座位分配案例，提供具体实现步骤和代码。

入选理由：使用Power Query的M函数[CheckMax_ForSeat]处理最新数据文件

FeaturedArticle#Power Query#DAX#数据工程#Excel#数据转换英文

Water Cooler Small Talk, Ep. 12: Byzantine Fault Tolerance

Towards Data Science7月21日2010 字 (约 9 分钟)

拜占庭容错机制是分布式系统应对恶意节点的核心方案，区块链是其优雅的现代应用。

入选理由：拜占庭容错要求系统在最多1/3节点作恶时仍能达成共识

FeaturedArticle#区块链#分布式系统#共识算法#拜占庭容错英文

Loop Engineering with Adaptive Parsing in Action: Parsing Flat Tables with Azure and Figures with a Vision LLM

Towards Data Science7月21日5583 字 (约 23 分钟)

自适应解析结合视觉LLM能显著提升表格和图表的文档处理准确率，Azure与PyMuPDF的协同解析可平衡成本与效率。

入选理由：PyMuPDF解析速度达5ms/页，而视觉LLM解析成本高1万倍且耗时10秒

FeaturedArticle#自适应解析#LLM#Azure#文档处理#企业RAG英文

How to Run Claude Code Agents for 24+ Hours

Towards Data Science7月21日2234 字 (约 9 分钟)

长时间运行代码代理能减少人工审核时间，需优化环境、权限和验证机制以提升工程效率。

入选理由：人工审核是当前编程瓶颈，减少审核时间可提升30%以上工作效率

FeaturedArticle#Claude#代码代理#AI工程#效率优化英文

Building Trustworthy Production RAG Systems Through Continuous Evaluation

Towards Data Science7月15日2285 字 (约 10 分钟)

持续评估是构建可靠RAG系统的关键，通过黄金数据集、自动化工具和人工审核可有效检测系统缺陷。

入选理由：构建黄金数据集需包含问题、正确答案及来源文档三要素

FeaturedArticle#RAG#持续评估#系统可靠性#自动化工具英文

A Gentle Introduction to Autoencoders & Latent Space

Towards Data Science7月15日1191 字 (约 5 分钟)

自编码器通过编码-解码结构实现数据压缩，其瓶颈层的潜在表示是关键。文章详解了其工作原理与训练逻辑。

入选理由：自编码器由编码器、瓶颈层和解码器组成，用于数据压缩与特征提取

FeaturedArticle#Autoencoder#机器学习#神经网络#数据压缩英文

How I’m Making Sure My Analytics Career Doesn’t Get Eaten by AI

Towards Data Science7月15日1382 字 (约 6 分钟)

数据分析职业需转向业务理解与判断力，AI将模糊角色界限但不会取代分析价值。

入选理由：AI工具如Copilot使非技术人员可完成数据可视化任务，基础技术门槛降低

FeaturedArticle#数据分析#AI影响#职业发展#技术变革英文

The Three Dimensions of Custom Agentic Alignment: Purpose, Principles and Practices

Towards Data Science7月14日3750 字 (约 15 分钟)

企业需通过目的、原则和实践三维对齐机制，确保代理AI行为符合组织意图，防止自主性引发的内部威胁。

入选理由：定制对齐需结合企业目的、原则和实践三维度，确保AI行为符合组织意图。

FeaturedArticle#AI对齐#企业AI#代理AI#技术架构英文

Inside the Subspace Where Spurious Correlations Are Born

Towards Data Science7月8日2683 字 (约 11 分钟)

小样本和高维数据易产生虚假相关性，理解其几何机制可提升数据分析的可靠性。

入选理由：小样本中变量独立时样本相关性仍可能达到0.62

FeaturedArticle#统计学#数据科学#高维数据#相关性分析英文

We Built a Routing Layer to Cut Our AI Costs. It Broke the Product.

Towards Data Science6月28日4331 字 (约 18 分钟)

成本优化路由层虽降低AI成本，但导致产品质量下降和用户满意度下滑，需警惕Pareto陷阱。

入选理由：分类器模型误判率高达35%，导致复杂查询被错误路由至低价模型

FeaturedArticle#AI#成本优化#路由层#产品失败#工程实践英文

Neural Networks, Explained for Beginners: Start Here If They’ve Confused You

Towards Data Science6月23日4265 字 (约 18 分钟)

神经网络通过激活函数建模复杂数据，本文用简单数据集解释其工作原理。

入选理由：使用简单数据集可以更清晰地理解神经网络的内部机制。

FeaturedArticle#神经网络#深度学习#激活函数#机器学习英文

Making a PDF’s Images Searchable for RAG, Without Paying to Read Them All

Towards Data Science6月20日3465 字 (约 14 分钟)

无需对PDF中所有图片进行模型调用，即可实现图像内容的可搜索性，通过分层过滤和OCR技术降低成本。

入选理由：使用图像大小和形状作为初步过滤条件，可排除无检索价值的图片。

FeaturedArticle#RAG#PDF解析#图像处理#OCR#企业文档智能英文

GPU-Resident Top-K for Agentic RAG: I Built a CUDA Kernel So My Retrieval Step Would Stop Bouncing Off the GPU

Towards Data Science6月19日6904 字 (约 28 分钟)

通过自定义CUDA内核将检索过程保留在GPU上，可实现多跳RAG的微秒级延迟，比CPU基线快8.6倍。

入选理由：将检索循环保留在GPU上，可消除PCIe传输税，实现8.6倍的加速。

FeaturedArticle#CUDA#RAG#GPU优化#AI推理英文

Drilling Into AI’s Financial Sustainability

Towards Data Science6月17日1593 字 (约 7 分钟)

AI技术的财务可持续性面临挑战，企业过度使用导致成本激增，需重新评估其商业价值。

入选理由：企业过度使用AI工具导致巨额成本，如Uber的AI令牌支出令人震惊。

FeaturedArticle#AI#财务#技术成本#可持续性英文

The Exact ML Project I’d Build to Get Hired in 2026

Towards Data Science6月10日1642 字 (约 7 分钟)

构建一个个性化、创新、相关且可实际运行的机器学习项目，是获得2026年数据科学职位的关键。

入选理由：优秀的机器学习项目需具备个性化、创新性、相关性和实际运行性。

FeaturedArticle#机器学习#数据科学#项目构建#招聘英文

10 Common RAG Mistakes We Keep Seeing in Production

Towards Data Science6月10日5639 字 (约 23 分钟)

RAG系统在生产环境中常见错误包括解析失败、忽略文档结构、固定窗口分块等，影响检索精度。

入选理由：解析文档时应保留结构，避免将表格转换为字符串。

FeaturedArticle#RAG#AI#文档解析#企业应用英文

Sequential Fitting: A Different Perspective on the Spectral Bias of Neural Networks

Towards Data Science6月8日3803 字 (约 16 分钟)

Neural networks exhibit a 'spectral bias' when fitting high-frequency functions, fitting low-frequency components first, which leads to inefficient training. This article analyzes this phenomenon from different perspectives and provides explanations.

入选理由：神经网络在拟合高频率函数时需要更多训练轮次，导致效率低下。

FeaturedArticle#Neural Networks#Spectral Bias#Machine Learning#Activation Functions英文

FPN Paper Walkthrough: Leveraging the Internal Pyramid

Towards Data Science6月5日4625 字 (约 19 分钟)

FPN solves small object detection by introducing a Neck structure to fuse multi-scale features. This article details the Backbone-Neck-Head evolution and provides a from-scratch implementation guide connecting FPN with CNN and RPN, essential for understanding modern detection optimization.

入选理由：FPN作为Neck组件位于Backbone与Head之间，通过特征增强机制显著提升小物体检测精度。

FeaturedArticle#FPN#Object Detection#YOLOv3#Feature Pyramid#Computer Vision英文

Small Data, Big Maps: Training Geospatial ML Models When Samples Are Scarce

Towards Data Science6月5日1566 字 (约 7 分钟)

The core bottleneck in geospatial ML is expensive field samples, not compute; solving small-sample issues requires increasing per-sample information density via multi-source feature engineering and prioritizing low-variance models like Random Forest to control overfitting.

入选理由：亚马逊雨林单个森林清查样地成本相当于一台ML训练计算机，实地标签稀缺是核心约束。

FeaturedArticle#Geospatial ML#Small Data#Feature Engineering#Random Forest#Remote Sensing英文

Five Ways to Fine-Tune Chronos-2, the Time Series Foundation Model

Towards Data Science6月5日4139 字 (约 17 分钟)

Chronos-2 TSFM can be fine-tuned via LoRA to address zero-shot gaps, detailing five scenarios including single-building adaptation, portfolio pooling, and covariate injection with strict data splitting.

入选理由：使用LoRA冻结120M参数主模型，仅训练低秩适配器以高效适配私有数据。

FeaturedArticle#Chronos-2#Time Series Foundation Model#LoRA#Fine-tuning#Forecasting英文

Why Gradient Descent Became Stochastic

Towards Data Science5月30日4695 字 (约 19 分钟)

The core reason gradient descent evolved into stochastic gradient descent (SGD) is computational scalability: as dataset size grows, batch gradient descent (BGD) becomes prohibitively expensive, while SGD updates parameters using only one or a few samples per iteration—reducing cost and leveraging noise to escape local minima; the article illustrates this via linear regression, deriving the closed-form solution from MSE and naturally motivating iterative optimization.

入选理由：线性回归中β₀=27315.74、β₁=9020.66的解析解可通过MSE对β₀/β₁求偏导并令其为0推导得出

FeaturedArticle#Gradient Descent#Stochastic Gradient Descent#Linear Regression#Optimization#Machine Learning英文

Can Machine Learning Predict the World Cup?

Towards Data Science6月10日3800 字 (约 16 分钟)

机器学习模型在预测世界杯比赛结果上表现有限，86%的主场胜利预测表明模型存在偏差。

入选理由：使用了包括多元回归、LightGBM等模型进行预测。

FeaturedArticle#机器学习#足球预测#数据科学#R语言英文

与「Towards Data Science」经常一起出现的 AI 术语。

Claude Opus GPT-4 phi-3 RAG LLM Vision Models Optuna Machine Learning Embedding Model GloVe-avg all-MiniLM-L6-v2 OpenAI

💡 想追踪「Towards Data Science」的长期趋势？去实体雷达 · Towards Data Science 查看详细分析和跨材料问答。

什么是 Towards Data Science？

为什么现在值得关注？

如果只读 3 篇

📰 Towards Data Science 最新动态

🔗 相关术语