T
traeai
登录
返回首页
AWS Machine Learning Blog

使用 SOCI 索引在 DLAMI 和 DLC 上减少容器冷启动时间

8.5Score
使用 SOCI 索引在 DLAMI 和 DLC 上减少容器冷启动时间

TL;DR · AI 摘要

使用 SOCI 索引在 DLAMI 和 DLC 上减少容器冷启动时间

核心要点

  • SOCI 索引支持的 Deep Learning AMI 和 Containers 可以显著减少冷启动时间,提高生产环境的效率。
  • SOCI 索引通过层状索引系统实现文件位置映射,允许容器在需要时加载文件,从而减少网络带宽使用和提高启动速度。
  • 选择 SOCI 模式取决于镜像、实例规格和存储配置,合理选择可以避免资源浪费和性能瓶颈。

结构提纲

按章节快速跳转。

  1. 介绍 SOCI 索引在 DLAMI 和 DLC 上的应用背景和重要性。

  2. 描述大规模部署 AI 和 ML 工作负载时遇到的冷启动时间问题。

  3. 介绍 Docker、SOCI 并行拉取和 SOCI 懒加载三种机制及其适用场景。

  4. 展示 SOCI 与 DLAMI 和 Deep Learning Containers 集成的架构图。

思维导图

用一张图看清主题之间的关系。

查看大纲文本(无障碍 / 无 JS 友好)
  • 使用 SOCI 索引在 DLAMI 和 DLC 上减少容器冷启动时间
    • 背景
      • 大规模部署 AI 和 ML 工作负载时遇到的冷启动时间问题
    • 容器拉取机制
      • Docker、SOCI 并行拉取和 SOCI 懒加载三种机制及其适用场景
    • 解决方案架构
      • SOCI 与 DLAMI 和 Deep Learning Containers 集成的架构图

金句 / Highlights

值得收藏与分享的关键句。

  • SOCI 索引支持的 Deep Learning AMI 和 Containers 可以显著减少冷启动时间,提高生产环境的效率。

    第一段

    ⬇︎ 下载 PNG𝕏 分享到 X
#AWS#containerization#deep learning#performance optimization
打开原文

Deep Learning AMI and AWS Deep Learning Containers are now enabled with support for SOCI snapshotter and index. Seekable OCI (SOCI) is a technology that enables efficient container image management through selective file downloading. It uses a layer-based indexing system to map file locations within container images, allowing containers to start with only the necessary files loaded (lazy loading). This approach reduces network bandwidth usage and improves container startup times, making it particularly valuable for organizations managing large container images in cloud environments.

In this post, we look at how to use SOCI on publicly available Deep Learning AMIs and Containers, when to use the various SOCI modes provided by the tool, and how to quickly and efficiently use this tool in your workloads today.

Background

As organizations deploy artificial intelligence (AI) and machine learning (ML) workloads at scale, container startup time has become a bottleneck in production environments. Whether it’s spinning up training jobs, serving inference endpoints, or scaling GPU clusters automatically, the time spent downloading multi-gigabyte container images directly impacts cost, user experience, and operational efficiency. Traditional container deployment approaches force teams to download entire images before workloads can begin. This process can take multiple minutes to start up images commonly used in production. During development, a few minutes of wait time is barely noticeable. In production, those same minutes add up fast.

Organizations deploying deep learning infrastructure at scale typically encounter several critical challenges:

  • Prolonged cold start times. Standard Docker image pulls of 15–20 GB can take 4–6 minutes per instance, delaying training jobs and inference endpoints during scaling events.
  • Wasted compute resources. GPU instances sit idle during image pulls, burning through expensive compute hours while waiting for container initialization to finish.
  • Scaling bottlenecks. When demand spikes trigger automatic scaling, slow container startup times prevent rapid response, leading to degraded performance or dropped requests.
  • Bandwidth constraints. Large-scale deployments pulling massive images simultaneously can saturate network bandwidth, creating cascading delays across the infrastructure.
  • Developer productivity. Data scientists and ML engineers waste valuable time waiting for containers to start during iterative development and experimentation cycles.

Container pulling mechanisms

When pulling a container for your workloads, AWS Deep Learning AMIs (DLAMI) and Deep Learning Containers offer three options: the standard Docker pull, SOCI parallel pull, and SOCI lazy loading through SOCI index. Think of these as a sliding scale of tradeoffs. Docker pulls are sequential and slow. SOCI parallel pull provides faster startup times by chunking downloads at the cost of compute resources. SOCI lazy loading provides near-instant container loading but requires files to be fetched on demand. You can use the following guide to choose the right mechanism for your workloads:

  • The choice between lazy loading and parallel pull modes depends on the image, instance specifications, and storage configuration. Lazy loading requires images to have a SOCI index. Without one, the system falls back to standard pulling.
  • Lower-spec instances should use lazy loading to conserve resources, while high-spec instances with multiple vCPUs and high network bandwidth benefit from parallel pull mode. Storage performance varies: EBS volumes are bounded by their provisioned IOPS and volume type, potentially creating bottlenecks during unpacking, while NVMe instance store delivers maximum I/O performance at the cost of data persistence across instance stop/start cycles.

The following example shows the various mechanisms based on the vLLM Deep Learning Container:

Image 1: Comparison of container pull mechanisms showing Docker sequential pull, SOCI parallel pull, and SOCI lazy loading with relative startup times

_Deep Learning Container Pull Mechanisms_

Solution architecture

The following diagram shows the architecture for using SOCI with DLAMI and Deep Learning Containers.

Image 2: Solution architecture showing SOCI snapshotter integration with DLAMI and Deep Learning Containers on Amazon EC2

Container startup time comparison with SOCI snapshotter

The following benchmarks compare standard Docker pulls against SOCI snapshotter in both lazy loading and parallel pull modes.

Lazy loading mode

Lazy loading mode starts containers immediately by fetching only the necessary data on demand, with remaining layers loaded in the background as needed.

#### Prerequisites

SOCI index required

Important: Lazy loading mode requires the container image to have a SOCI index stored in the registry. Without a SOCI index, the snapshotter will fall back to standard pull behavior, and you won’t see any performance improvement. AWS Deep Learning Containers (DLCs) with the -soci tag suffix come with SOCI indexes pre-created and pushed to the registry, enabling lazy loading out of the box. For custom images, you must create and push SOCI indexes

#### Environment

  • Instance Type: g5.2xlarge
  • EBS: Size 500GiB, IOPS 3000, Throughput 125
  • AMI: Deep Learning Base OSS Nvidia Driver GPU AMI (Ubuntu 24.04) 20260413 (ami-06abbbf2049359343)
  • Docker Image: public.ecr.aws/deep-learning-containers/vllm:0.19.0-gpu-py312-ec2-soci
  • Image Size: 9.72GB (compressed), 32.7GB (disk usage)
  • Network: Corp

#### Start container with Docker (non-SOCI)

We use Docker to start the inference server directly. Since no image exists locally, Docker pulls and extracts the entire image before starting the container.

Total time: 6m59.099s.

code
#!/bin/bash
time docker run \
    --gpus all \
    -d \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HUGGING_FACE_HUB_TOKEN=$HUGGING_FACE_HUB_TOKEN" \
    -p 8000:8000 \
    --ipc=host \
    public.ecr.aws/deep-learning-containers/vllm:0.19.0-gpu-py312-ec2-soci \
    --model mistralai/Mistral-7B-v0.1
# output
Unable to find image 'public.ecr.aws/deep-learning-containers/vllm:0.19.0-gpu-py312-ec2-soci' locally
0.19.0-gpu-py312-ec2-soci: Pulling from deep-learning-containers/vllm
340d44d2921c: Pull complete
....2001a2421bf1: Pull complete
Digest: sha256:a6344c96a33ef98a32a27f89b41b8c0529d4fbbba248eb57f811725d415f68fc
Status: Downloaded newer image for public.ecr.aws/deep-learning-containers/vllm:0.19.0-gpu-py312-ec2-soci
e12d969eb71517d9a6a23b9b11cfa22ddda26a95f6a0f0d8df00cd5c4fdfe912

real    6m59.099s
user    0m0.391s
sys     0m0.452s

Code

#### Start container with SOCI snapshotter (lazy loading)

We use nerdctl with SOCI snapshotter to start the inference container. Although no image exists locally, the SOCI-indexed image allows nerdctl to pull only the index and necessary layers to start the container, enabling lazy loading of remaining layers. Total time: 21.125s.

code
#!/bin/bash
time sudo nerdctl run \
     --snapshotter soci \
    --gpus all \
    -d \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HUGGING_FACE_HUB_TOKEN=$HUGGING_FACE_HUB_TOKEN" \
    -p 8000:8000 \
    --ipc=host \
    public.ecr.aws/deep-learning-containers/vllm:0.19.0-gpu-py312-ec2-soci \
    --model mistralai/Mistral-7B-v0.1
# output
public.ecr.aws/deep-learning-containers/vllm:0.19.0-gpu-py312-ec2-soci:           resolved       |++++++++++++++++++++++++++++++++++++++|
index-sha256:a6344c96a33ef98a32a27f89b41b8c0529d4fbbba248eb57f811725d415f68fc:    done           |++++++++++++++++++++++++++++++++++++++|
manifest-sha256:d91ad3b46204eace6de2fb27c46d9600337fa9c124b4c82fe0f335d391017daa: done           |++++++++++++++++++++++++++++++++++++++|
config-sha256:886ed36d57c44081a74a0ab052f57366d96ab2c0fe39bb3e2f8a46cc20db8ec2:   done           |++++++++++++++++++++++++++++++++++++++|
elapsed: 10.5s                                                                    total:  48.1 K (4.6 KiB/s)
189307b7899438415f3df4288b3fbb26bcc4cd43678e88ec3b062bc6330e3e3b

real    0m21.125s
user    0m0.004s
sys     0m0.011s

Code

#### Lazy loading summary

Using SOCI snapshotter with lazy loading, the container started in 21.125 seconds, compared to 6 minutes 59.099 seconds with standard Docker. This improvement is achieved because SOCI pulls only the necessary layers to start the container, with remaining layers loaded on demand as needed.

Parallel pull mode

While lazy loading mode starts containers immediately by fetching only the required data on-demand, parallel pull mode downloads the entire image before startup but does so with higher concurrency than standard Docker pulls. This mode is ideal when you need the full image available at startup or when running I/O-intensive workloads.

#### Environment

  • Instance Type: g5.4xlarge
  • EBS: 500GiB gp3, 16000 IOPS, 1000 MB/s Throughput
  • AMI: Deep Learning Base OSS Nvidia Driver GPU AMI (Ubuntu 24.04) 20260413 (ami-06abbbf2049359343)
  • Docker Image:763104351884.dkr.ecr.us-east-1.amazonaws.com/sglang:0.5.10-gpu-py312-cu129-ubuntu24.04-sagemaker
  • Image Size: 19.32GB (compressed), 60.4GB (Disk Usage)
  • Network: Corp

Note: We use a private ECR image for this benchmark because public ECR is fronted by Amazon CloudFront, which limits network bandwidth and affects parallel mode performance. Private ECR is served directly from Amazon Simple Storage Service (Amazon S3), providing higher throughput.

#### Enabling parallel pull mode

The SOCI snapshotter on Deep Learning AMI defaults to lazy loading mode. To enable parallel pull mode, modify the configuration file at /etc/soci-snapshotter-grpc/config.toml:

code
# Parallel Pull Mode - significantly improves image pull times for large AI/ML images
# These are conservative defaults recommended by AWS for ECR
[pull_modes.parallel_pull_unpack]
enable = true # false(default): lazy loading/true: parallel mode
max_concurrent_downloads = -1 # unlimited global cap across all images
max_concurrent_downloads_per_image = 20 # per-image download connections
concurrent_download_chunk_size = "16mb"
max_concurrent_unpacks = -1 # unlimited global cap across all images
max_concurrent_unpacks_per_image = 10 # per-image parallel unpack threads
discard_unpacked_layers = true

Code

Apply the configuration by restarting the service:

sudo systemctl restart soci-snapshotter.service

Code

Tip: You can tune max_concurrent_downloads_per_image and max_concurrent_unpacks_per_image based on your instance type and network bandwidth. For detailed tuning guidance, see Introducing Seekable OCI Parallel Pull Mode for Amazon EKS.

#### Verifying parallel mode is active

Monitor the SOCI snapshotter logs during image pull to confirm parallel mode is enabled:

journalctl -u soci-snapshotter -f

Code

Look for log entries indicating parallel pull/unpack:

code
Apr 16 23:59:08 ip-172-31-86-91 soci-snapshotter-grpc[3108]:
  {"layerDigest":"sha256:e87500e698966458d9dfc34df84602985c9821f39666619792fe6282aa6df5d4",
   "level":"info",
   "msg":"preparing snapshot with parallel pull/unpack",
   "time":"2026-04-16T23:59:08.654819383Z"}

Code

#### Pull image with Docker (non-SOCI)

Standard Docker pull downloads and extracts layers with limited concurrency.

Total time: 4m 44.163s

code
time docker pull \
  763104351884.dkr.ecr.us-east-1.amazonaws.com/sglang:0.5.10-gpu-py312-cu129-ubuntu24.04-sagemaker

Digest: sha256:fd0cf60bbb34a5d30f22595215a633e5d4a7260fc0868aabe3f04b1174b7365d
Status: Downloaded newer image for
  763104351884.dkr.ecr.us-east-1.amazonaws.com/sglang:0.5.10-gpu-py312-cu129-ubuntu24.04-sagemaker
763104351884.dkr.ecr.us-east-1.amazonaws.com/sglang:0.5.10-gpu-py312-cu129-ubuntu24.04-sagemaker

real    4m44.163s
user    0m0.339s
sys     0m0.423s

Code

#### Pull image with SOCI parallel mode

Using nerdctl with SOCI parallel pull mode uses increased concurrency for both downloads and unpacking operations.

Total time: 2m 12.846s

code
time sudo nerdctl pull --snapshotter soci \
  763104351884.dkr.ecr.us-east-1.amazonaws.com/sglang:0.5.10-gpu-py312-cu129-ubuntu24.04-sagemaker

763104351884.dkr.ecr.us-east-1.amazonaws.com/sglang:0.5.10-gpu-py312-cu129-ubuntu24.04-sagemaker:
  resolved       |++++++++++++++++++++++++++++++++++++++|
manifest-sha256:fd0cf60bbb34a5d30f22595215a633e5d4a7260fc0868aabe3f04b1174b7365d:
  done           |++++++++++++++++++++++++++++++++++++++|
config-sha256:5e6a53b7478b0631dd3c4222ab6619dae3a3dd32a565921f10b0b03fdc316d46:
  done           |++++++++++++++++++++++++++++++++++++++|
elapsed: 132.8s    total:  89.3 K (688.0 B/s)

real    2m12.846s
user    0m0.018s
sys     0m0.075s

Code

#### Parallel pull summary

Using SOCI parallel pull mode reduced image pull time from 4 minutes 44 seconds to 2 minutes 12 seconds, representing a 2.2x improvement in pull performance.

Conclusion

SOCI snapshotter provides improvements for both container startup and image pull operations:

  • Lazy loading mode — Achieved a 20x improvement in container startup time (from 6+ minutes to ~21 seconds)
  • Parallel pull mode — Achieved a 2.2x improvement in image pull time (from 4 minutes 44 seconds to 2 minutes 12 seconds)

Choose lazy loading mode when you need the fastest possible container startup, or parallel pull mode when you need the full image available before your workload begins.

Clean up

If you launched EC2 instances to test SOCI snapshotter, terminate them to avoid incurring ongoing charges. Delete any container images you pushed to Amazon Elastic Container Registry (Amazon ECR) during testing, and remove any SOCI indexes you no longer need.

Getting started with SOCI

DLAMI and Deep Learning Containers are publicly available today with SOCI snapshotter and SOCI index. For more information on publicly available DLAMI and Deep Learning Containers, you can check out SOCI Index DLAMI to select the images that support SOCI, and check out the Deep Learning Container repository to get more information on supported images with SOCI index.

For detailed configuration guidance and best practices, refer to the SOCI documentation and the Deep Learning Container SOCI documentation.

About the authors

AI 可能会生成不准确的信息,请核实重要内容

使用 SOCI 索引在 DLAMI 和 DLC 上减少容器冷启动时间 | AWS Machine Learning Blog | traeai