# Product Experimentation with Propensity Scores: Causal Inference for LLM-Based Features in Python

Canonical URL: https://www.traeai.com/articles/f3a9dd9b-5ef2-4974-8428-80fb346bee09
Original source: https://www.freecodecamp.org/news/product-experimentation-with-propensity-scores-causal-inference-for-llm-based-features-in-python/
Source name: freeCodeCamp.org
Content type: article
Language: 中文
Score: 9.2
Reading time: 18 分钟
Published: 2026-04-30T23:01:26+00:00
Tags: 因果推断, LLM产品化, 倾向得分, Python, A/B测试

## Summary

本文系统讲解如何用倾向得分法（PSM）解决LLM功能上线后的因果推断偏差问题，提供Python端到端实现与诊断方法，直击AI产品实验中‘Opt-In Trap’核心痛点。

## Key Takeaways

- 用户主动开启LLM功能会引入严重选择偏差，导致传统对比指标失真
- 倾向得分法通过重加权或匹配构造准随机对照组，分离真实因果效应
- 需严格进行协变量平衡检验与Bootstrap置信区间评估，否则方法可能静默失效

## Outline

- 引言：Opt-In Trap问题 — 指出LLM功能启用依赖用户主动操作，导致观测组天然不均衡，传统指标无法反映真实因果效应。
  - 倾向得分原理 — 解释PSM如何通过建模用户启用概率，构建可比对照组，逼近随机实验的理想条件。
  - 五步实操流程 — 涵盖倾向得分估计、逆概率加权、近邻匹配、协变量平衡检验、Bootstrap置信区间。
    - 合成数据验证 — 基于5万用户已知因果效应的SaaS合成数据集，量化方法有效性与失效边界。
    - 代码与可复现性 — 配套Jupyter Notebook完整实现，所有输出预执行，支持GitHub在线阅读与本地运行。

## Highlights

- > Running an AI feature behind a toggle is a product experiment. The hypothesis: the feature improves outcomes for users who adopt it. — 第4段
- > Propensity score methods are statistical tools that data scientists use to separate adoption bias from the feature's actual effect. — 第5段
- > You'll estimate it, quantify uncertainty, and see where the approach silently breaks. — 第6段
- > The notebook (psm_demo.ipynb) has all outputs pre-executed, so you can read along on GitHub before running anything locally. — 第7段

## Citation Guidance

When citing this item, prefer the canonical traeai article URL for the AI-readable summary and include the original source URL when discussing the underlying source material.