Skip to main content

Preload Data for High-Performance Computing

Stage data before compute starts

Hydrate datasets to local NVMe or parallel filesystems. Keep Tigris as the durable source of truth.

HPC workloads — large-scale simulations, genomics pipelines, climate modeling — need datasets staged on fast local storage before compute begins. Network latency during computation is unacceptable, but keeping terabytes on high-performance filesystems around the clock is expensive.

Tigris acts as the globally accessible, durable store. Hydrate data out of it into local NVMe or a parallel filesystem for the duration of a job, then write results back. Pay for fast storage only when you're using it.

Get started →

Tigris Object Storageglobal buckets3 syncHPC ClusterLocal NVMehydrated dataCompute Node 0Compute Node 1Compute Node 2write resultsResults → TigrisHydrate once · Compute at NVMe speed · Write results back

Benefits

Global data, local compute

A single global bucket makes datasets available from any region. Hydration jobs pull from the nearest replica automatically — no per-region copies to manage, no cross-region prefetch delays.

Zero egress costs

Tigris doesn't charge for egress. Hydrating the same 500 GB dataset across 50 compute nodes costs nothing in transfer fees, regardless of which cloud or region those nodes are in.

Incremental sync

For datasets that don't change between runs, aws s3 sync is incremental — only new or modified objects transfer. Add --size-only to skip unchanged files based on size rather than checksumming every object, cutting hydration time on repeat runs.

Hydrate to a parallel filesystem

HPC clusters typically use a high-performance parallel filesystem — Weka, VAST, DDN, or a managed Lustre product — for shared data access across compute nodes. Tigris serves as the durable, globally accessible source of truth. A separate ingestion job copies data from Tigris into the parallel filesystem before compute begins, and results are written back when the job completes.

Because Tigris serves reads from the nearest replica, the hydration job saturates the available link regardless of where the cluster is located. For datasets that don't change between runs, aws s3 sync is incremental — only new or modified objects transfer.

# Hydrate from Tigris into the parallel filesystem
aws s3 sync s3://my-dataset /mnt/weka/data \
--endpoint-url https://t3.storage.dev

# Compute nodes mount /mnt/weka/data via CSI at full filesystem speed
# ... run HPC workload ...

# Write results back to Tigris when done
aws s3 cp /mnt/weka/results/ s3://my-results/ --recursive \
--endpoint-url https://t3.storage.dev

Weka, VAST, DDN, and managed Lustre products all provide S3 data-import features and Kubernetes CSI plugins that make the hydrate → mount → compute → writeback cycle straightforward.

NVIDIA GPUDirect Storage

For GPU-heavy HPC workloads, NVIDIA's GPUDirect Storage enables direct NVMe-to-GPU data paths, bypassing the CPU during data loading. Combine with Tigris hydration to the parallel filesystem for maximum throughput.