# Bundle API

<!-- -->

The Bundle API lets you fetch multiple objects from a bucket as a streaming tar archive in a single HTTP request. Instead of making one request per object, you send a list of keys and receive a tar stream — assembled on the fly with no server-side buffering.

This is designed for **ML training workloads** where dataloaders need to fetch thousands of images or samples per batch. The Bundle API eliminates per-object HTTP overhead and removes the need to pre-materialize shard files (tarballs, parquet files, etc.).

## SDK examples[​](#sdk-examples "Direct link to SDK examples")

* Python
* Go
* JavaScript

Install the Tigris boto3 extension:

```
pip install tigris-boto3-ext
```

### Basic usage[​](#basic-usage "Direct link to Basic usage")

```
import tarfile

from tigris_boto3_ext import bundle_objects



response = bundle_objects(s3_client, "my-bucket", [

    "dataset/train/img_001.jpg",

    "dataset/train/img_002.jpg",

])



with tarfile.open(fileobj=response, mode="r|") as tar:

    for member in tar:

        if member.name == "__bundle_errors.json":

            continue

        f = tar.extractfile(member)

        if f is not None:

            image_bytes = f.read()
```

`bundle_objects` returns a `BundleResponse` that works as a context manager for automatic connection cleanup:

```
with bundle_objects(s3_client, "my-bucket", keys) as response:

    with tarfile.open(fileobj=response, mode="r|") as tar:

        for member in tar:

            if member.name == "__bundle_errors.json":

                continue

            f = tar.extractfile(member)

            if f is not None:

                image_bytes = f.read()
```

### Error handling[​](#error-handling "Direct link to Error handling")

By default, missing objects are silently skipped and listed in a `__bundle_errors.json` entry at the end of the archive. Set `on_error=BUNDLE_ON_ERROR_FAIL` to raise an error when any key is missing:

```
from tigris_boto3_ext import bundle_objects, BundleError, BUNDLE_ON_ERROR_FAIL



try:

    response = bundle_objects(

        s3_client, "my-bucket", keys, on_error=BUNDLE_ON_ERROR_FAIL

    )

except BundleError as e:

    print(f"Bundle failed (HTTP {e.status_code}): {e.body}")
```

### Response metadata[​](#response-metadata "Direct link to Response metadata")

After consuming the tar stream, `BundleResponse` exposes metadata about the bundle:

```
response = bundle_objects(s3_client, "my-bucket", keys)



with tarfile.open(fileobj=response, mode="r|") as tar:

    for member in tar:

        pass  # consume the stream



print(response.object_count)   # number of objects in the bundle

print(response.bundle_bytes)   # total bytes streamed

print(response.skipped_count)  # number of skipped keys (skip mode)
```

Install the SDK:

```
go get github.com/tigrisdata/storage-go
```

```
import (

    "archive/tar"

    "io"

    "log"



    storage "github.com/tigrisdata/storage-go"

)



output, err := client.BundleObjects(ctx, &storage.BundleObjectsInput{

    Bucket: "my-bucket",

    Keys: []string{

        "dataset/train/img_001.jpg",

        "dataset/train/img_002.jpg",

        "dataset/train/img_003.jpg",

    },

})

if err != nil {

    log.Fatal(err)

}

defer output.Body.Close()



tr := tar.NewReader(output.Body)

for {

    hdr, err := tr.Next()

    if err == io.EOF {

        break

    }

    if err != nil {

        log.Fatal(err)

    }

    if hdr.Name == "__bundle_errors.json" {

        continue

    }



    data, _ := io.ReadAll(tr)

    // process hdr.Name, data

}
```

```
npm install @tigrisdata/storage tar-stream
```

```
import { bundle } from "@tigrisdata/storage/server";

import tar from "tar-stream"; // npm install tar-stream



const result = await bundle("my-bucket", [

  "dataset/train/img_001.jpg",

  "dataset/train/img_002.jpg",

]);



if (result.error) {

  throw result.error;

}



// Pipe the streaming response through a tar parser

const extract = tar.extract();



extract.on("entry", (header, stream, next) => {

  if (header.name === "__bundle_errors.json") {

    stream.resume();

    next();

    return;

  }



  const chunks = [];

  stream.on("data", (chunk) => chunks.push(chunk));

  stream.on("end", () => {

    const data = Buffer.concat(chunks);

    console.log(`${header.name}: ${data.length} bytes`);

    next();

  });

  stream.resume();

});



// Convert ReadableStream to Node stream and pipe

const { Readable } = await import("stream");

Readable.fromWeb(result.data.body).pipe(extract);
```

## PyTorch DataLoader integration[​](#pytorch-dataloader-integration "Direct link to PyTorch DataLoader integration")

The Bundle API integrates naturally with PyTorch dataloaders. Instead of fetching one image per `__getitem__` call, fetch a batch at a time:

```
import random

import tarfile

from io import BytesIO



import torch

from PIL import Image

from tigris_boto3_ext import bundle_objects





def build_batches(metadata_path, batch_size):

    """Load a list of object keys from a metadata file and split into batches.



    Returns a list of lists, where each inner list is a batch of dicts

    with at least a "key" field pointing to the object key in the bucket.

    """

    ...





class TigrisBundleDataset(torch.utils.data.IterableDataset):

    def __init__(self, s3_client, metadata_path, bucket, batch_size=32, prefetch=20):

        self.s3_client = s3_client

        self.bucket = bucket

        self.batch_size = batch_size

        self.prefetch = prefetch

        self.batches = build_batches(metadata_path, batch_size)



    def __iter__(self):

        worker_info = torch.utils.data.get_worker_info()

        if worker_info is None:

            my_batches = self.batches

        else:

            my_batches = self.batches[worker_info.id::worker_info.num_workers]

        random.shuffle(my_batches)



        for i in range(0, len(my_batches), self.prefetch):

            chunk = my_batches[i : i + self.prefetch]

            keys = [row["key"] for batch in chunk for row in batch]



            with bundle_objects(self.s3_client, self.bucket, keys) as response:

                with tarfile.open(fileobj=response, mode="r|") as tar:

                    for member in tar:

                        if member.name == "__bundle_errors.json":

                            continue

                        f = tar.extractfile(member)

                        if f is None:

                            continue

                        image = Image.open(BytesIO(f.read())).convert("RGB")

                        yield {"image": image}
```

## How it works[​](#how-it-works "Direct link to How it works")

The Bundle API is a Tigris extension to the S3 API. You send a `POST` request with a list of object keys and receive a streaming tar archive:

```
POST /{bucket}?bundle HTTP/1.1

x-tigris-bundle-format: tar

Content-Type: application/json



{"keys": ["train/img_001.jpg", "train/img_002.jpg", "train/img_003.jpg"]}
```

The server streams back a tar archive containing those objects, in the order you requested. Each tar entry's filename is the full object key.

### Request headers[​](#request-headers "Direct link to Request headers")

| Header                     | Required | Values           | Default |
| -------------------------- | -------- | ---------------- | ------- |
| `x-tigris-bundle-format`   | Yes      | `tar`            | —       |
| `x-tigris-bundle-on-error` | No       | `fail` \| `skip` | `skip`  |

### Request body[​](#request-body "Direct link to Request body")

Send a JSON array of object keys:

```
{

  "keys": [

    "dataset/train/img_00001.jpg",

    "dataset/train/img_00002.jpg",

    "dataset/train/img_00003.jpg"

  ]

}
```

XML is also supported:

```
<?xml version="1.0" encoding="UTF-8"?>

<Bundle xmlns="http://s3.amazonaws.com/doc/2006-03-01/">

  <Object><Key>dataset/train/img_00001.jpg</Key></Object>

  <Object><Key>dataset/train/img_00002.jpg</Key></Object>

</Bundle>
```

### Error handling[​](#error-handling-1 "Direct link to Error handling")

#### Skip mode (default)[​](#skip-mode-default "Direct link to Skip mode (default)")

Missing or inaccessible objects are silently omitted from the tar. A `__bundle_errors.json` entry is appended at the end of the archive:

```
{

  "skipped": [{ "key": "dataset/train/img_00002.jpg", "reason": "NoSuchKey" }]

}
```

This is the recommended mode for training pipelines. Dataloaders already handle missing samples gracefully.

#### Fail mode[​](#fail-mode "Direct link to Fail mode")

Set `x-tigris-bundle-on-error: fail` to pre-validate all keys before streaming. If any key is missing, the server returns a **404** error with the list of missing keys — no partial tar is sent.

```
<Error>

  <Code>BundleKeyNotFound</Code>

  <Message>One or more keys could not be resolved</Message>

  <MissingKeys>

    <Key>dataset/train/img_00002.jpg</Key>

  </MissingKeys>

</Error>
```

Use fail mode for inference or serving where every object must be present.

### Response trailers[​](#response-trailers "Direct link to Response trailers")

After the stream completes, the response includes HTTP trailers:

| Trailer                   | Description                        |
| ------------------------- | ---------------------------------- |
| `x-tigris-bundle-count`   | Number of objects in the tar       |
| `x-tigris-bundle-bytes`   | Total bytes streamed               |
| `x-tigris-bundle-skipped` | Number of skipped keys (skip mode) |

## Limits[​](#limits "Direct link to Limits")

| Parameter            | Limit  |
| -------------------- | ------ |
| Max keys per request | 5,000  |
| Max assembled size   | 50 GB  |
| Max request body     | 5 MB   |
| Request timeout      | 15 min |

## Authentication[​](#authentication "Direct link to Authentication")

Standard S3 SigV4 authentication. The caller must have `s3:GetObject` permission on the bucket. No new IAM actions are required.
