# Preload Data for High-Performance Computing

## Stage data before compute starts[​](#stage-data-before-compute-starts "Direct link to Stage data before compute starts")

*Copy datasets to local NVMe or parallel filesystems. Keep Tigris as the durable source of truth.*

HPC workloads — large-scale simulations, genomics pipelines, climate modeling — need datasets staged on fast local storage before compute begins. Network latency during computation is unacceptable, but keeping terabytes on high-performance filesystems around the clock is expensive.

Tigris acts as the globally accessible, durable store. Use standard S3 tools to copy data into local NVMe or a parallel filesystem for the duration of a job, then write results back. Pay for fast storage only when you're using it.

[Get started →](/docs/guides/weka/.md)

### Benefits[​](#benefits "Direct link to Benefits")

Global data, local compute

A single [global bucket](/docs/buckets/locations/.md) makes datasets available from any region. Reads pull from the nearest replica automatically — no per-region copies to manage, no cross-region prefetch delays.

Zero egress costs

Tigris doesn't charge for egress. Syncing the same dataset across many compute nodes costs nothing in transfer fees, regardless of which cloud or region those nodes are in.

Incremental sync

For datasets that don't change between runs, `aws s3 sync` is incremental — only new or modified objects transfer. Add `--size-only` to skip unchanged files based on size rather than checksumming every object, cutting sync time on repeat runs.

### Pattern: Weka tiering with Tigris[​](#pattern-weka-tiering-with-tigris "Direct link to Pattern: Weka tiering with Tigris")

[Weka](https://www.weka.io/)'s built-in tiering connects directly to S3-compatible object stores. Data flows between Weka's local SSD tier and Tigris automatically based on access patterns and retention policies.

#### 1. Register Tigris as an object store[​](#1-register-tigris-as-an-object-store "Direct link to 1. Register Tigris as an object store")

```
weka fs tier s3 add tigris-store \

  --hostname t3.storage.dev \

  --port 443 \

  --bucket my-dataset \

  --auth-method AWSSignature4 \

  --access-key-id <TIGRIS_ACCESS_KEY_ID> \

  --secret-key <TIGRIS_SECRET_ACCESS_KEY> \

  --region auto \

  --protocol HTTPS
```

#### 2. Attach the object store to a filesystem[​](#2-attach-the-object-store-to-a-filesystem "Direct link to 2. Attach the object store to a filesystem")

```
weka fs tier s3 attach my-fs tigris-store
```

This enables writable tiering by default — Weka caches hot data on local SSDs and tiers cold data to Tigris. For read-only access to existing Tigris data, use `--mode remote`.

#### 3. Prefetch data before a job starts[​](#3-prefetch-data-before-a-job-starts "Direct link to 3. Prefetch data before a job starts")

Reads from tiered files automatically pull data from Tigris, but you can prefetch explicitly to avoid any latency during compute:

```
# Fetch a specific directory

weka fs tier fetch /mnt/weka/data/


# Batch fetch for large datasets

find -L /mnt/weka/data -type f | xargs -r -n512 -P64 weka fs tier fetch -v
```

#### 4. Release local copies when done[​](#4-release-local-copies-when-done "Direct link to 4. Release local copies when done")

After a job completes, release local copies to free SSD space. The data remains durable in Tigris.

```
weka fs tier release /mnt/weka/results/
```

note

When using tiering, do not manually delete or apply lifecycle policies to objects Weka writes to the Tigris bucket. Weka manages those objects internally — manual interference risks data loss.

### Pattern: Direct sync with aws s3 sync[​](#pattern-direct-sync-with-aws-s3-sync "Direct link to Pattern: Direct sync with aws s3 sync")

If your parallel filesystem doesn't support S3-backed tiering, or you want a simpler workflow, sync data directly from Tigris before compute starts.

```
# Sync from Tigris into the parallel filesystem

aws s3 sync s3://my-dataset /mnt/weka/data \

  --endpoint-url https://t3.storage.dev


# Run your compute workload

# ...


# Write results back to Tigris

aws s3 cp /mnt/weka/results/ s3://my-results/ --recursive \

  --endpoint-url https://t3.storage.dev
```

This works with any parallel filesystem that exposes a POSIX mount — Weka, VAST, managed Lustre — and with local NVMe directly.