T
traeai
Sign in
返回首页
AWS Machine Learning Blog

Process Financial Documents Using Amazon Bedrock Data Automation

8.5Score
Process Financial Documents Using Amazon Bedrock Data Automation

TL;DR · AI Summary

Amazon Bedrock Data Automation simplifies the extraction, validation, and analysis of data from various financial documents using advanced foundation models.

Key Takeaways

  • Amazon Bedrock Data Automation automates the extraction, validation, and analysi
  • It uses foundation models to understand document context, recognize relationship
  • Custom blueprints allow organizations to tailor extraction patterns for specific

Outline

Jump quickly between sections.

  1. Introduce the Amazon Bedrock Data Automation solution and its application scenarios.

  2. Detail the working flow and configuration methods of Amazon Bedrock Data Automation.

  3. Guide on how to create custom blueprints for bank statements, W-2 forms, 1099-B forms, and vendor contracts.

  4. List the preparations required to create custom blueprints.

  5. Detailed steps on how to create custom blueprints in the Amazon Bedrock Data Automation console.

  6. Showcase the application effects and output results of custom blueprints.

Mindmap

See how the topics connect at a glance.

查看大纲文本(无障碍 / 无 JS 友好)
  • Amazon Bedrock Data Automation
    • 解决方案概述
      • 工作流程
      • 配置方式
    • 如何开发针对四种财务文档类型的蓝图
      • 先决条件
      • 创建自定义蓝图
      • 结果分析

Highlights

Key sentences worth saving and sharing.

  • Amazon Bedrock Data Automation uses foundation models to understand document context, recognize relationships, and extract structured data.

    Paragraph 2

    ⬇︎ 下载 PNG𝕏 分享到 X
  • Through custom blueprints, organizations can tailor extraction patterns for specific needs.

    Paragraph 4

    ⬇︎ 下载 PNG𝕏 分享到 X
  • A single custom blueprint usually suffices for a specific document type when extracting consistent fields. However, if workflow requirements vary or document formats change significantly, multiple cus

    Paragraph 4

    ⬇︎ 下载 PNG𝕏 分享到 X
#Amazon Bedrock Data Automation#financial documents#foundation models
Open original article

URL 源: https://aws.amazon.com/blogs/machine-learning/process-financial-documents-using-amazon-bedrock-data-automation/

发布时间: 2026-05-27T13:28:53-08:00

Markdown 内容: 金融机构每天处理成千上万份文件,包括税表、贷款单据和采购订单。每种文件都有独特的格式、结构和字段名称,使得使用光学字符识别(OCR)软件创建自动化工作流变得具有挑战性。Amazon Bedrock 数据自动化(BDA)通过自动化从财务文件中提取、验证和分析数据来解决这些问题。BDA 超越了简单的 OCR,通过使用能够:

  • 理解文档上下文
  • 识别不同部分之间的关系
  • 提取结构化、可操作的数据
  • 在多个来源中验证信息

虽然像 Anthropic Claude 这样的基础模型可以从 PDF 中提取内容,但 Amazon Bedrock 数据自动化提供了自定义提取,具有行业领先的准确性且成本更低,并附带诸如视觉定位带有置信度评分的解释性和内置幻觉缓解等功能。

在这篇文章中,我们将探讨 Amazon Bedrock 数据自动化如何准确地从四种常见的财务文件类型中提取信息:银行对账单、W-2 表格、1099-B 税表和供应商合同。我们将突出文件的复杂性,详细描述在 Amazon Bedrock 数据自动化中创建的自定义提取,并描述提取过程的结果。

解决方案概述

Amazon Bedrock 数据自动化允许您根据您的处理需求使用蓝图配置输出。在 Amazon Bedrock 数据自动化中,蓝图是一个配置模板,定义了如何从文档中提取数据。它指定了:

  • 正在处理的文档类型
  • 要提取的数据字段
  • 提取数据的验证规则
  • 输出的结构和格式

将其视为一张地图,告诉 Amazon Bedrock 数据自动化确切需要查找什么信息以及如何处理这些信息。当使用蓝图进行提取时,您可以使用目录蓝图或自定义创建的蓝图。自定义蓝图允许组织为其特定需求创建提取模式。在本文中,我们创建了自定义蓝图并在 BDA 控制台中生成并验证了输出。

图像 1:显示 Amazon Bedrock 数据自动化工作流程的解决方案架构图

如何开发用于四种财务文件类型的蓝图

以下部分将指导您创建银行对账单、W-2 表格、1099-B 表格和供应商合同的自定义蓝图。

先决条件

如果您不熟悉自定义蓝图的创建工作,请参考Amazon Bedrock 文档中的说明。对于我们的评估,我们在 BDA 控制台上上传了文件,优化了由 AI 生成的提示,并下载了结果。通常,一个单一的自定义蓝图足以在提取一致字段时处理特定的文档类型。然而,如果工作流要求各异或文档格式显著变化,可能需要创建多个自定义蓝图以适应这些差异。创建完蓝图后,可以将其用作一致下游处理的一部分。对于同一蓝图,如果输入文件有不同的数据,则 BDA 可能会返回略有不同的输出(例如,某些银行对账单可能包含总借方和贷方)。然而,由于 BDA 的输出是结构化的 JSON,可以根据下游处理工作流轻松创建适当的规则(例如,在分类会计交易时丢弃总金额)。

以下截图展示了其中一种文档类型的蓝图提示配置。

图像 2:在 Amazon Bedrock 数据自动化控制台中的蓝图提示配置

下一节描述了作为本项目一部分尝试的四种文件及其基于需求创建的自定义蓝图提取。输出以 JSON、CSV 和原始数据格式提供,突显了解决方案对多样化集成和报告需求的适应性。

财务文件类型和自定义蓝图

1. Bank Statements – Documents from banks detailing an account's financial activity, including deposits, withdrawals, and fees, over a specific period, typically a month.

Bank statements present a complex challenge: they contain numerous monthly transactions, often spanning multiple pages, with varying formats and details. In many workflows, the critical task is to precisely capture transaction data, including dates, amounts, descriptions, and reference numbers, which can then feed directly into automated accounting workflows like categorizing transactions in an accounting ledger. This automated extraction minimizes manual data entry errors and streamlines the reconciliation process. As part of our evaluation process, we selected the following bank statement for a trial of the extraction process:

Image 3: Sample bank statement used for extraction testing

_Account Statement generated using Amazon Nova Pro Foundational Model_

Tailored blueprint instructions for Amazon Bedrock Data Automation:

code
Create a transaction log blueprint with the following structure:

Main Field:
- Transactions: [TRANSACTION_DETAILS]

Custom Type:
1. TRANSACTION_DETAILS type containing:
   - Date
   - Description
   - Debit: number
   - Credit: number

Code

Extraction results from table.csv:

Image 4: Extraction results showing transaction data in CSV format

Upon review, we can confirm that the system successfully extracted the transactions accurately.

2. Form W-2 – Reports income and tax withheld for an individual or a business.

W-2 tax forms present unique extraction challenges due to their standardized yet complex structure. As part of our evaluation process, we used the following W-2 for a trial of the extraction process:

Image 5: Sample W-2 form used for extraction testing

_W2 generated using Amazon Nova Pro Foundational Model_

Tailored blueprint instructions for Amazon Bedrock Data Automation:

code
Create a detailed W2 form blueprint with the following structure:

Main Fields:
- employer_info: EmployerInfo
- employee_general_info: EmployeeInfo
- federal_tax_info: FederalTaxInfo
- federal_wage_info: FederalWageInfo
- filing_info: FilingInfo
- state_taxes_table: [StateTaxInfo]
- codes: [CodeAmount]
- nonqualified_plans_income: number
- other

Custom Types:
1. EmployerInfo type containing:
   - ein
   - employer_name
   - employer_address
   - employer_zip_code: number
   - control_number

2. EmployeeInfo type containing:
   - ssn
   - first_name
   - employee_last_name
   - employee_name_suffix
   - employee_address
   - employee_zip_code: number

3. FederalWageInfo type containing:
   - wages_tips_other_compensation: number
   - social_security_wages: number
   - medicare_wages_tips: number
   - social_security_tips: number

4. FederalTaxInfo type containing:
   - federal_income_tax: number
   - social_security_tax: number
   - medicare_tax: number
   - allocated_tips: number

5. StateTaxInfo type containing:
   - state_name
   - employer_state_id_number: number
   - state_wages_and_tips: number
   - state_income_tax: number
   - local_wages_tips: number
   - local_income_tax: number
   - locality_name

6. CodeAmount type containing:
   - code
   - amount: number

7. FilingInfo type containing:
   - omb_number
   - verification_code

Code

Extraction results from result.json:

Image 6: W-2 extraction results showing employer and employee information in JSON format
Image 7: W-2 extraction results showing tax and code information in JSON format

Upon review, we can confirm that the system successfully extracted the transactions accurately. Several extraction complexities were specifically verified in the project:

  • There is no specific grouping on the form for Federal Tax and State Tax information but they need to be processed together so extraction results should bring them together.
  • In a single Box 12 of W2 there can be up to 26 codes to report certain compensation and benefit amounts. It is important to extract code and value as a pair.
  • Employers can put just about anything in box 14. It helps catch items that don’t have their own dedicated box on the W-2, so these should be grouped separately.

3. IRS Form 1099-B: Proceeds from Broker and Barter Exchange Transactions – This tax document tracks:

  • Securities trading activity
  • Broker-facilitated transactions
  • Barter exchange participation

As part of our evaluation process, we used the following 1099-B for a trial of the extraction process:

Image 8: Sample 1099-B form used for extraction testing

_1099-B statement generated using Amazon Nova Pro Foundational Model_

Tailored blueprint instructions for Amazon Bedrock Data Automation:

code
Create a financial transaction blueprint with the following structure:

TRANSACTION_DETAILS type containing:
- security_description
- quantity_sold: number
- date_acquired
- date_sold_or_disposed
- proceeds: number
- cost_or_other_basis: number
- gainloss_amount: number
- additional_information

Code

Extraction results from table.csv:

Image 9: 1099-B extraction results showing transaction details in CSV format

A significant validation of BDA's contextual understanding capabilities is that the system accurately identified and extracted 'TSLA' as the security descriptor across the stock transactions, even if it appeared as a common descriptor for the transactions. This consistent extraction demonstrates BDA's ability to maintain contextual accuracy throughout the document processing.

4. Vendor contract – This extraction process is applicable to a wide range of vendor contracts. The specific details to be captured need to be tailored to each company’s unique operational workflows and requirements.

As part of our evaluation process, we selected the following vendor contract for a trial of the extraction process:

Image 10: Sample vendor contract page 1
Image 11: Sample vendor contract page 2
Image 12: Sample vendor contract page 3
Image 13: Sample vendor contract page 4

Tailored blueprint instructions for Amazon Bedrock Data Automation:

code
Create an agreement blueprint with the following structure:

Main Fields:
- PARTICIPANT_DETAILS: PARTICIPANT_DETAILS
- effective_date
- time_period
- participant_requirements: PARTICIPANT_REQUIREMENTS
- confidentiality_obligations
- TERM_AND_TERMINATION: TERM_AND_TERMINATION

Custom Types:
1. PARTICIPANT_DETAILS type containing:
   - participant_name
   - participant_authorized_representative

2. PARTICIPANT_REQUIREMENTS type containing:
   - assigned_resources
   - participant_obligations
   - participant_restrictions

3. TERM_AND_TERMINATION type containing:
   - term
   - termination_conditions

Code

Extraction results from result.json:

Image 14: Vendor contract extraction results in JSON format

The system successfully identified and extracted the blueprint-specified elements present within the contract.

Conclusion

In this post, we demonstrated how you can use Amazon Bedrock Data Automation to accurately extract key information from financial documents including bank statements, W-2 forms, 1099-B forms, and vendor contracts to automate downstream processing. You learned how to:

  • Create custom blueprints for different document types
  • Extract structured data from complex financial documents
  • Validate Amazon Bedrock Data Automation outputs for downstream processing

To learn more about implementing document processing with Amazon Bedrock, review the Amazon Bedrock Data Automation documentation. For production workflows involving sensitive information, follow your organization’s cybersecurity and legal guidelines to verify compliance with all applicable regulations, including but not limited to GDPR in Europe or any other regional or industry-specific requirements.

  • * *

About the authors

AI may generate inaccurate information. Please verify important content.