Building Highly Available Oracle Databases with Amazon FSx for NetApp ONTAP

TL;DR · AI Summary
This post shows how to build a highly available Oracle database architecture using Amazon FSx for NetApp ONTAP shared storage, Auto Scaling groups with dynamic AMI updates, and serverless orchestration to help reduce recovery times with current configurations.
Key Takeaways
- Building a highly available Oracle database architecture using Amazon FSx for Ne
- This method can help achieve 2-5 minutes of recovery time objective (RTO) with t
- Near-zero recovery point objective (RPO) through synchronous Multi-AZ replicatio
Outline
Jump quickly between sections.
Introduce the high availability requirements of Oracle databases and the complexity of traditional solutions.
Explain how modern cloud architectures combine Amazon FSx for NetApp ONTAP, Amazon EC2 Auto Scaling groups, AWS Lambda, and AWS Systems Manager Parameter Store.
Describe how to build a highly available Oracle database architecture using Amazon FSx for NetApp ONTAP shared storage, Auto Scaling groups with dynamic AMI updates, and serverless orchestration to he
List the recovery time objective (RTO), recovery point objective (RPO), configuration consistency, and automated AMI management as key benefits.
Walkthrough the implementation of Oracle HA using Amazon FSx for NetApp ONTAP shared storage, AWS Backup-driven AMI creation, Lambda orchestration, and Auto Scaling groups with Parameter Store integra
Summarize how to build a highly available Oracle database architecture using Amazon FSx for NetApp ONTAP shared storage, Auto Scaling groups with dynamic AMI updates, and serverless orchestration to h
Mindmap
See how the topics connect at a glance.
查看大纲文本(无障碍 / 无 JS 友好)
- 构建高可用性 Oracle 数据库架构
- Amazon FSx for NetApp ONTAP 共享存储
- 提供持久的共享存储
- 支持 Oracle 数据库文件、软件和配置
- Auto Scaling 组
- 自动实例生命周期管理
- 快速替换失败实例
- AWS Lambda
- 驱动 AMI 创建
- orchestrating 配置管理工作流
- AWS Systems Manager Parameter Store
- 存储当前 AMI ID
- 用于 Auto Scaling 组启动模板
Highlights
Key sentences worth saving and sharing.
Building a highly available Oracle database architecture using Amazon FSx for NetApp ONTAP shared storage, Auto Scaling groups with dynamic AMI updates, and serverless orchestration to help reduce rec
Oracle databases power mission-critical enterprise applications, making their continuous availability essential for business operations. Traditional Oracle high availability (HA) solutions require complex clustering software, expensive shared storage arrays, and specialized database administration teams. These conventional approaches often introduce single points of failure while demanding significant operational overhead.
Modern cloud architectures offer a transformative approach that combines Amazon FSx for NetApp ONTAP (FSxN) with Amazon EC2 Auto Scaling groups, automated AMI creation, AWS Lambda-driven orchestration, and AWS Systems Manager Parameter Store (SSM Parameter). This solution removes traditional Oracle HA complexities while delivering enterprise-grade availability, automated recovery, and ensures new instances launch with the latest Oracle configuration.
This post shows how to build a highly available Oracle database architecture using FSxN shared storage, Auto Scaling groups with dynamic AMI updates, and serverless orchestration to help reduce recovery times with current configurations.
Solution Overview
The solution uses multiple AWS services working together to create a comprehensive high availability architecture. FSxN Multi-AZ provides persistent shared storage spanning availability zones for Oracle database files, software, and configurations, ensuring data remains accessible when EC2 instances are replaced. Auto Scaling groups deliver automated instance lifecycle management with the latest AMI configurations, so failed instances are quickly replaced with identical configurations that can immediately access the existing Oracle database files on FSxN. AWS Backup creates AMIs that capture the latest Oracle host configurations including patches and settings, preserving the complete server state for consistent deployments. AWS Lambda extracts the AMI ID from backup recovery points and updates the SSM Parameter, orchestrating the entire configuration management workflow. Systems Manager Parameter Store stores the current AMI ID for Auto Scaling group launch templates, so new instances always launch with the most recent configuration and can immediately connect to the Oracle database on shared storage.
The following diagram shows the complete architecture with all AWS services and their interactions:

Key benefits include:
- Recovery Time Objective (RTO): Can help achieve 2–5 minutes with latest Oracle configuration
- Recovery Point Objective (RPO): Near-zero through synchronous Multi-AZ replication
- Configuration consistency: New instances launch with identical Oracle host setup
- Automated AMI management: Scheduled AMI creation with Parameter Store updates
Walkthrough
This walkthrough demonstrates implementing Oracle HA using Amazon FSx for NetApp ONTAP shared storage, AWS Backup-driven AMI creation, Lambda orchestration, and Auto Scaling groups with Parameter Store integration for configuration consistency and automated failover.
Prerequisites
For this walkthrough, you should have the following prerequisites:
- An AWS account with appropriate permissions for Amazon FSx, Auto Scaling, EC2, Lambda, and Systems Manager
- A VPC with subnets in at least two Availability Zones
- Oracle database software
Keep in mind that customers are responsible for their own Oracle licensing compliance.
- An EC2 instance with Oracle database installed and configured
- AWS Identity and Access Management (IAM) roles for AMI creation and cross-service communication
- Basic knowledge of Oracle database administration and AWS automation
Assumptions
This post is a conceptual illustration of the architecture. Your specific implementation will vary based on your VPC layout, Oracle version, storage requirements, and organizational security policies.
We assume the reader is familiar with:
- Creating and configuring Amazon FSx for NetApp ONTAP file systems through the AWS console
- iSCSI concepts including initiators, targets, and multipath I/O
- Oracle database startup and shutdown procedures
- AWS Backup, Lambda, and Auto Scaling group fundamentals
For detailed step-by-step instructions on specific AWS services, refer to the additional resources section.
Step 1: Create an Amazon FSx for NetApp ONTAP File System
FSxN Multi-AZ provides the persistent shared storage foundation for this architecture. Unlike Amazon Elastic Block Store (Amazon EBS) volumes, which are bound to a single AZ, FSxN Multi-AZ replicates data synchronously across two AZs with automatic failover. This means that when an EC2 instance is replaced (whether in the same AZ or a different one), the new instance can immediately access the existing Oracle database files without restoring from backup.
To create the file system, navigate to the Amazon FSx console and select Amazon FSx for NetApp ONTAP as the file system type.
The critical configuration choice is selecting Multi-AZ deployment, which places an active file server in one AZ and a standby in another.

_FSxN console showing Multi-AZ deployment type selection with preferred and standby subnets in separate availability zones._
After the file system is created, you need to set up a Storage Virtual Machine (SVM), which acts as a logical storage container providing data access to your Oracle instances. The SVM creation is done from the FSx console under your file system’s details. With the SVM in place, the next step is configuring iSCSI access. FSxN exposes iSCSI endpoints—these are IP addresses (one per AZ) that your EC2 instances use to connect to the storage over the iSCSI protocol. You can find these endpoint addresses in the FSx console under your SVM’s Endpoints tab.

_SVM Endpoints tab showing iSCSI endpoint IP addresses for each availability zone. These addresses are used in the EC2 instance’s iSCSI discovery configuration._
The iSCSI setup involves creating iGroups (which define which EC2 instances can access the storage) and LUNs (logical storage units mapped to those groups) through the NetApp ONTAP CLI. On the EC2 side, you configure the iSCSI initiator to discover and connect to the FSxN endpoints, then mount the resulting block devices. Using multipath I/O with both endpoints makes sure that Oracle data remains accessible even during an AZ failover. For detailed iSCSI configuration steps, see mounting iSCSI LUNs on Linux clients.
A dedicated security group is required for FSxN access. At minimum, the security group must allow inbound traffic on ports 111 (NFS portmapper), 635 (NFS mountd), 2049 (NFS), 3260 (iSCSI), 4045–4046 (NFS lock), 443 (HTTPS for management), and 22 (SSH for ONTAP CLI). Restrict the source to only your Oracle EC2 instances’ security group.
Step 2: Set up AWS Backup for EC2 instance protection
AWS Backup captures the complete state of your Oracle EC2 instance. The key design choice here is using tag-based resource selection rather than specifying instance IDs directly. Because Auto Scaling groups replace instances (and generate new instance IDs), tag-based selection makes sure that any new instance with the correct tags are automatically included in the backup plan.Configure a backup plan with a frequency appropriate for your environment and set the resource assignment to select EC2 instances matching your application tag (for example, ‘Application: Oracle’).
Step 2: Set up AWS Backup for EC2 instance protection
AWS Backup captures the complete state of your Oracle EC2 instance. The key design choice here is using tag-based resource selection rather than specify instance IDs directly. Because Auto Scaling groups replace instances (and generate new instance IDs), tag-based selection makes sure that any new instance with the correct tags are automatically included in the backup plan.Configure a backup plan with a frequency appropriate for your environment and set the resource assignment to select EC2 instances matching your application tag (for example Step 3: Configure Lambda for AMI management**
When AWS Backup completes an EC2 backup, it creates an AMI as the recovery point. An Amazon EventBridge rule detects this completion event and triggers a Lambda function. The function extract the AMI ID from the backup recovery point, then mount the resulting block devices. Using multipath I/O with both endpoints makes sure that Oracle data remains accessible even during an AZ failover. For detailed iSCSI configuration steps, see [mounting iSCSI LUNs on Linux clients](https://docs.aws.com/fsx/latest/ONTapguide/mount-iscsi luns on Linux clientses)Step 3: Configure Lambda for AMI management
When AWS Backup completes an EC2 backup, it creates an AMI as the recovery point. An Amazon EventBridge trigger a Lambda function. The function extract the AMI ID from the backup recovery point, then mount the resulting block devices. This event-driven approach means the latest AMI is available without manual intervention. The Lambda function needs IAM permissions for EC2 (to manage AMIs), SSM (to update the SSM parameter Store parameter with the new AMI ID from the backup points metadata.
Step 4: Configure the Systems Manager Parameter store
The SSM Parameter Store holds the current AMI ID from the the AMI as the recovery point. An dedicated security group is required for FSN access. At minimum, the security group must allow inbound traffic on ports 111 (NFS portmapper), 635 (NFS mountd), 2049 (NFS), 3260 (iSCSI), 4045–4046 (NFS lock), 443 (HTTPS for management), and 22 (SSH for ONTAP CLI). Restrict the source to only your Oracle EC2 instances’ security group.
Step 2: Set up AWS Backup for EC2 instance protection
AWS Backup captures the complete state of your Oracle EC2 instance. The key design choice here is using tag-based resource selection rather than specifying instance IDs directly. Because Auto Scaling groups replace instances (and generate new instance IDs), tag-based selection makes sure that any new instance with the correct tags are automatically included in the backup plan.Configure a backup plan with a frequency appropriate for your environment and set the resource assignment to select EC2 instances matching your application tag (for example, ‘Application: Oracle’).

_AWS Backup resource assignment configured with tag-based selection. Any EC2 instances tagged with the application tag are automatically included in the backup plan._
Step 3: Configure Lambda for AMI management
When AWS Backup completes an EC2 backup, it creates an AMI as the recovery point. An Amazon EventBridge rule detects this completion event and triggers a Lambda function. The function extract the AMI ID from the backup recovery point, updates the SSM Parameter Store parameter with the new AMI ID, and cleans up older AMIs to control storage costs.

_Lambda function overview showing the EventBridge trigger, Python 3.11 runtime, and function description indicating its role in processing backup completions and updating AMI references in SSM._
This event-driven approach means the latest AMI is available without manual intervention. The Lambda function needs IAM permissions for EC2 (to manage AMIs), SSM (to update the parameter), and Backup (to read recovery point metadata).

_EventBridge rule configured to match AWS Backup job completion events for EC2 resources, with the Lambda function as the target._
**Step 4: Configure the Systems
#