T
traeai
登录
返回首页
OpenAI Blog

Our commitment to community safety

7.0Score
Our commitment to community safety
AI 深度提炼
  • ChatGPT 通过训练拒绝提供可能促成暴力的指令或计划。
  • 模型不断优化以识别潜在危害的微妙信号并做出适当响应。
  • OpenAI 致力于在敏感对话中帮助缓解用户情绪并避免有害行为。
#OpenAI#ChatGPT#AI安全#自然语言处理
打开原文

Our commitment to community safety | OpenAI

Skip to main content

[](http://openai.com/)

Log inTry ChatGPT(opens in a new window)

Try ChatGPT(opens in a new window)Login

OpenAI

Table of contents

April 28, 2026

Safety

Our commitment to community safety

Loading…

Share

Mass shootings, threats against public officials, bombing attempts, and attacks on communities and individuals are an unacceptable and grave reality in today’s world. These incidents are a reminder of how real the threat of violence is—and how quickly violent intent can move from words to action.

People may also bring these moments and feelings into ChatGPT. They may ask questions about the news, try to understand what happened, express fear or anger, or talk about violence in ways that are fictional, historical, political, personal, or potentially dangerous. We work to train ChatGPT to recognize the difference—and to draw lines when a conversation starts to move toward threats, potential harm to others, or real-world planning.

We’re sharing what we do to minimize uses of our services in furtherance of violence or other harm: how our models are trained to respond safely, how our systems detect potential risk of harm, and what actions we take when someone violates our policies. We are constantly improving the steps we take to help protect people and communities, guided by input from psychologists, psychiatrists, civil liberties and law enforcement experts, and others who help us navigate difficult decisions around safety, privacy, and democratized access.

How we mitigate risks of harm in ChatGPT.

Our Model Spec⁠(opens in a new window) lays out our long-standing principles for how we want our models to behave: maximizing helpfulness and user freedom while minimizing the risk of harm through sensible defaults.

We work to train our models to refuse requests for instructions, tactics, or planning that could meaningfully enable violence.At the same time, people may ask neutral questions about violence for factual, historical, educational, or preventive reasons, and we aim to allow those discussions while maintaining clear safety boundaries—for example, by omitting detailed, operational instructions that could facilitate harm. The line between benign and harmful uses can be subtle, so we continually refine our approach and work with experts to help distinguish between safe, bounded responses and actionable steps for carrying out violence or other real-world harm.

As part of this ongoing work, we’ve continued expanding our safeguards to help ChatGPT better recognize subtle signs of risk of harm across different contexts. Some safety risks only become clear over time: a single message may seem harmless on its own, but a broader pattern within a long conversation—or across conversations—can suggest something more concerning. Building on years of work in model training, evaluations and red teaming, and ongoing expert input, we have strengthened how ChatGPT recognizes subtle warning signs across long, high-stakes conversations and carefully responds. We’ll share more about this work in the coming weeks.

Our safety work also extends to situations where users may be in distress⁠ or at risk of self-harm. In these moments, our goal is to avoid facilitating harmful acts, and also to help de-escalate the situation and guide people to real-world support. ChatGPT surfaces localized crisis resources, encourages people to reach out to mental health professionals or trusted loved ones, and in the most serious cases directs people to seek emergency help.

How we monitor and enforce our rules.

We assume the best of our users, but when we detect that someone is attempting to use our tools to potentially plan or carry out violence, we take action, including revoking access to OpenAI’s services. Our Usage Policies⁠ set clear expectations for acceptable use and that we may prohibit use for threats, intimidation, harassment, terrorism or violence, weapons development, illicit activity, destruction of property or systems, and attempts to circumvent our safeguards. We take those policies seriously and work hard to enforce them.

We use automated detection systems to identify potentially concerning activity at scale. These systems analyze user content and behavior using a range of tools designed to identify signals that may indicate policy violations or harmful activity, including classifiers, reasoning models, hash-matching technologies, blocklists, and other monitoring systems.

When an account or conversation is flagged, it is assessed in context by trained personnel. These human reviewers are trained on our policies and protocols, and operate within established privacy and security safeguards, meaning their access to user information is limited, conducted within secure systems, and subject to confidentiality and data protection requirements. Their role is to assess the flagged activity in context, including the content of the interaction, surrounding conversation, and any relevant patterns of behavior over time. This contextual review is important because automated systems may identify signals of potential concern without fully capturing intent or nuance.

The goal is to determine whether the flagged activity violates our policies and/or indicates that a user may carry out an act of violence, requires escalation for more detailed human review, or can be dismissed or deprioritized as low risk or non-violative. When we determine that a bannable offense has occurred, we aim to immediately revoke access to OpenAI’s services. That may include disabling the account, banning other accounts of the same user, and taking steps to detect and stop the opening of new accounts. We have a zero-tolerance policy for using our tools to assist in committing violence. People can appeal enforcement decisions, and we review those appeals to confirm the outcome.

We surface real-world support and refer to law enforcement when appropriate.

Most enforcement actions, including bans for violence, happen directly between OpenAI and the user, making clear they have crossed a line. But in some sensitive cases, we may contact others who are best positioned to help.

Where we assess that a case presents indicators of potentially serious, real-world harm, it is escalated for a more in-depth investigation, including assessing the overall level of risk using structured criteria. This stage is reserved for a limited subset of cases and is intended to ensure higher-risk scenarios are assessed with additional context and expertise. When conversations indicate an imminent and credible risk of harm to others, we notify law enforcement. Mental health and behavioral experts help us assess difficult cases and our referral criteria is flexible to account for the fact that a user may not explicitly discuss the target, means, and timing of planned violence in a ChatGPT conversation but that there may still be potential risk of imminent and credible violence.

Last Fall, we introduced Parental Controls⁠ to help families guide how ChatGPT works in their homes. Parental controls allow parents to link their account with their teen’s account and customize settings for a safe, age-appropriate experience. Parents don’t have access to their teen’s conversations, and in rare cases where our system and trained human reviewers detect possible signs of acute distress, parents may be notified—but only with the information needed to support their teen’s safety. Parents are automatically notified by either email, SMS, push notification, or all three.

Working closely with experts from our Council on Well-Being and AI and our Global Physicians Network, we will also soon be introducing a trusted contact feature, which will allow adult users to designate someone to receive notifications when they may need additional support.

We learn, improve and course-correct.

We continue to strengthen our models, detection methods, review processes, and escalation criteria in response to observed usage, emerging risks, and input from internal and external experts. We are especially focused on hard cases: for example, where it is not clear whether a particular input is legitimate or poses a risk of harm; sophisticated attempts to evade safeguards; or when people repeatedly try to misuse our services. We will continue to prioritize safety⁠ while balancing privacy and other civil liberties so we can act on serious risks.

You can read more about our safety work and commitments⁠ and sign up to receive updates⁠ on our policies.

Author

OpenAI

Keep reading

View all

Image 1: System Card Card SEO 1x1

GPT-5.5 System Card Safety Apr 23, 2026

Image 2: GPT-5.5 Bio Bug Bounty > art card

GPT-5.5 Bio Bug Bounty Safety Apr 23, 2026

Image 3: accelerating-cyber-defense-ecosystem-1x1

Accelerating the cyber defense ecosystem that protects us all Security Apr 16, 2026

Our Research

Latest Advancements

Safety

ChatGPT

API Platform

For Business

Company

Support

More

Terms & Policies

(opens in a new window)(opens in a new window)(opens in a new window)(opens in a new window)(opens in a new window)(opens in a new window)(opens in a new window)

OpenAI © 2015–2026 Manage Cookies

English United States