Process Financial Documents Using Amazon Bedrock Data Automation

TL;DR · AI Summary
Amazon Bedrock Data Automation simplifies the extraction, validation, and analysis of data from various financial documents using advanced foundation models.
Key Takeaways
- Amazon Bedrock Data Automation automates the extraction, validation, and analysi
- It uses foundation models to understand document context, recognize relationship
- Custom blueprints allow organizations to tailor extraction patterns for specific
Outline
Jump quickly between sections.
Introduce the Amazon Bedrock Data Automation solution and its application scenarios.
Detail the working flow and configuration methods of Amazon Bedrock Data Automation.
Guide on how to create custom blueprints for bank statements, W-2 forms, 1099-B forms, and vendor contracts.
List the preparations required to create custom blueprints.
Detailed steps on how to create custom blueprints in the Amazon Bedrock Data Automation console.
Showcase the application effects and output results of custom blueprints.
Mindmap
See how the topics connect at a glance.
查看大纲文本(无障碍 / 无 JS 友好)
- Amazon Bedrock Data Automation
- 解决方案概述
- 工作流程
- 配置方式
- 如何开发针对四种财务文档类型的蓝图
- 先决条件
- 创建自定义蓝图
- 结果分析
Highlights
Key sentences worth saving and sharing.
Amazon Bedrock Data Automation uses foundation models to understand document context, recognize relationships, and extract structured data.
Through custom blueprints, organizations can tailor extraction patterns for specific needs.
A single custom blueprint usually suffices for a specific document type when extracting consistent fields. However, if workflow requirements vary or document formats change significantly, multiple cus
URL 源: https://aws.amazon.com/blogs/machine-learning/process-financial-documents-using-amazon-bedrock-data-automation/
发布时间: 2026-05-27T13:28:53-08:00
Markdown 内容: 金融机构每天处理成千上万份文件,包括税表、贷款单据和采购订单。每种文件都有独特的格式、结构和字段名称,使得使用光学字符识别(OCR)软件创建自动化工作流变得具有挑战性。Amazon Bedrock 数据自动化(BDA)通过自动化从财务文件中提取、验证和分析数据来解决这些问题。BDA 超越了简单的 OCR,通过使用能够:
- 理解文档上下文
- 识别不同部分之间的关系
- 提取结构化、可操作的数据
- 在多个来源中验证信息
虽然像 Anthropic Claude 这样的基础模型可以从 PDF 中提取内容,但 Amazon Bedrock 数据自动化提供了自定义提取,具有行业领先的准确性且成本更低,并附带诸如视觉定位带有置信度评分的解释性和内置幻觉缓解等功能。
在这篇文章中,我们将探讨 Amazon Bedrock 数据自动化如何准确地从四种常见的财务文件类型中提取信息:银行对账单、W-2 表格、1099-B 税表和供应商合同。我们将突出文件的复杂性,详细描述在 Amazon Bedrock 数据自动化中创建的自定义提取,并描述提取过程的结果。
解决方案概述
Amazon Bedrock 数据自动化允许您根据您的处理需求使用蓝图配置输出。在 Amazon Bedrock 数据自动化中,蓝图是一个配置模板,定义了如何从文档中提取数据。它指定了:
- 正在处理的文档类型
- 要提取的数据字段
- 提取数据的验证规则
- 输出的结构和格式
将其视为一张地图,告诉 Amazon Bedrock 数据自动化确切需要查找什么信息以及如何处理这些信息。当使用蓝图进行提取时,您可以使用目录蓝图或自定义创建的蓝图。自定义蓝图允许组织为其特定需求创建提取模式。在本文中,我们创建了自定义蓝图并在 BDA 控制台中生成并验证了输出。

如何开发用于四种财务文件类型的蓝图
以下部分将指导您创建银行对账单、W-2 表格、1099-B 表格和供应商合同的自定义蓝图。
先决条件
- 拥有适当 IAM 权限的活动 AWS 账户(来自 BDA 工作坊的示例策略)
- 必须授予模型访问权限(通过 AWS 控制台请求访问)
- 按照使用 Amazon Bedrock 开始指南设置 Amazon Bedrock 数据自动化
- 用于测试的样本财务文件
如果您不熟悉自定义蓝图的创建工作,请参考Amazon Bedrock 文档中的说明。对于我们的评估,我们在 BDA 控制台上上传了文件,优化了由 AI 生成的提示,并下载了结果。通常,一个单一的自定义蓝图足以在提取一致字段时处理特定的文档类型。然而,如果工作流要求各异或文档格式显著变化,可能需要创建多个自定义蓝图以适应这些差异。创建完蓝图后,可以将其用作一致下游处理的一部分。对于同一蓝图,如果输入文件有不同的数据,则 BDA 可能会返回略有不同的输出(例如,某些银行对账单可能包含总借方和贷方)。然而,由于 BDA 的输出是结构化的 JSON,可以根据下游处理工作流轻松创建适当的规则(例如,在分类会计交易时丢弃总金额)。
以下截图展示了其中一种文档类型的蓝图提示配置。

下一节描述了作为本项目一部分尝试的四种文件及其基于需求创建的自定义蓝图提取。输出以 JSON、CSV 和原始数据格式提供,突显了解决方案对多样化集成和报告需求的适应性。
财务文件类型和自定义蓝图
1. Bank Statements – Documents from banks detailing an account's financial activity, including deposits, withdrawals, and fees, over a specific period, typically a month.
Bank statements present a complex challenge: they contain numerous monthly transactions, often spanning multiple pages, with varying formats and details. In many workflows, the critical task is to precisely capture transaction data, including dates, amounts, descriptions, and reference numbers, which can then feed directly into automated accounting workflows like categorizing transactions in an accounting ledger. This automated extraction minimizes manual data entry errors and streamlines the reconciliation process. As part of our evaluation process, we selected the following bank statement for a trial of the extraction process:

_Account Statement generated using Amazon Nova Pro Foundational Model_
Tailored blueprint instructions for Amazon Bedrock Data Automation:
Create a transaction log blueprint with the following structure:
Main Field:
- Transactions: [TRANSACTION_DETAILS]
Custom Type:
1. TRANSACTION_DETAILS type containing:
- Date
- Description
- Debit: number
- Credit: numberCode
Extraction results from table.csv:

Upon review, we can confirm that the system successfully extracted the transactions accurately.
2. Form W-2 – Reports income and tax withheld for an individual or a business.
W-2 tax forms present unique extraction challenges due to their standardized yet complex structure. As part of our evaluation process, we used the following W-2 for a trial of the extraction process:

_W2 generated using Amazon Nova Pro Foundational Model_
Tailored blueprint instructions for Amazon Bedrock Data Automation:
Create a detailed W2 form blueprint with the following structure:
Main Fields:
- employer_info: EmployerInfo
- employee_general_info: EmployeeInfo
- federal_tax_info: FederalTaxInfo
- federal_wage_info: FederalWageInfo
- filing_info: FilingInfo
- state_taxes_table: [StateTaxInfo]
- codes: [CodeAmount]
- nonqualified_plans_income: number
- other
Custom Types:
1. EmployerInfo type containing:
- ein
- employer_name
- employer_address
- employer_zip_code: number
- control_number
2. EmployeeInfo type containing:
- ssn
- first_name
- employee_last_name
- employee_name_suffix
- employee_address
- employee_zip_code: number
3. FederalWageInfo type containing:
- wages_tips_other_compensation: number
- social_security_wages: number
- medicare_wages_tips: number
- social_security_tips: number
4. FederalTaxInfo type containing:
- federal_income_tax: number
- social_security_tax: number
- medicare_tax: number
- allocated_tips: number
5. StateTaxInfo type containing:
- state_name
- employer_state_id_number: number
- state_wages_and_tips: number
- state_income_tax: number
- local_wages_tips: number
- local_income_tax: number
- locality_name
6. CodeAmount type containing:
- code
- amount: number
7. FilingInfo type containing:
- omb_number
- verification_codeCode
Extraction results from result.json:


Upon review, we can confirm that the system successfully extracted the transactions accurately. Several extraction complexities were specifically verified in the project:
- There is no specific grouping on the form for Federal Tax and State Tax information but they need to be processed together so extraction results should bring them together.
- In a single Box 12 of W2 there can be up to 26 codes to report certain compensation and benefit amounts. It is important to extract code and value as a pair.
- Employers can put just about anything in box 14. It helps catch items that don’t have their own dedicated box on the W-2, so these should be grouped separately.
3. IRS Form 1099-B: Proceeds from Broker and Barter Exchange Transactions – This tax document tracks:
- Securities trading activity
- Broker-facilitated transactions
- Barter exchange participation
As part of our evaluation process, we used the following 1099-B for a trial of the extraction process:

_1099-B statement generated using Amazon Nova Pro Foundational Model_
Tailored blueprint instructions for Amazon Bedrock Data Automation:
Create a financial transaction blueprint with the following structure:
TRANSACTION_DETAILS type containing:
- security_description
- quantity_sold: number
- date_acquired
- date_sold_or_disposed
- proceeds: number
- cost_or_other_basis: number
- gainloss_amount: number
- additional_informationCode
Extraction results from table.csv:

A significant validation of BDA's contextual understanding capabilities is that the system accurately identified and extracted 'TSLA' as the security descriptor across the stock transactions, even if it appeared as a common descriptor for the transactions. This consistent extraction demonstrates BDA's ability to maintain contextual accuracy throughout the document processing.
4. Vendor contract – This extraction process is applicable to a wide range of vendor contracts. The specific details to be captured need to be tailored to each company’s unique operational workflows and requirements.
As part of our evaluation process, we selected the following vendor contract for a trial of the extraction process:




Tailored blueprint instructions for Amazon Bedrock Data Automation:
Create an agreement blueprint with the following structure:
Main Fields:
- PARTICIPANT_DETAILS: PARTICIPANT_DETAILS
- effective_date
- time_period
- participant_requirements: PARTICIPANT_REQUIREMENTS
- confidentiality_obligations
- TERM_AND_TERMINATION: TERM_AND_TERMINATION
Custom Types:
1. PARTICIPANT_DETAILS type containing:
- participant_name
- participant_authorized_representative
2. PARTICIPANT_REQUIREMENTS type containing:
- assigned_resources
- participant_obligations
- participant_restrictions
3. TERM_AND_TERMINATION type containing:
- term
- termination_conditionsCode
Extraction results from result.json:

The system successfully identified and extracted the blueprint-specified elements present within the contract.
Conclusion
In this post, we demonstrated how you can use Amazon Bedrock Data Automation to accurately extract key information from financial documents including bank statements, W-2 forms, 1099-B forms, and vendor contracts to automate downstream processing. You learned how to:
- Create custom blueprints for different document types
- Extract structured data from complex financial documents
- Validate Amazon Bedrock Data Automation outputs for downstream processing
To learn more about implementing document processing with Amazon Bedrock, review the Amazon Bedrock Data Automation documentation. For production workflows involving sensitive information, follow your organization’s cybersecurity and legal guidelines to verify compliance with all applicable regulations, including but not limited to GDPR in Europe or any other regional or industry-specific requirements.
- * *