Skip to main content

mount -t tigrisfs

· 12 min read
Xe Iaso
Katie Schilling
Ovais Tariq
Yevgeniy Firsov

At Tigris we put your big data close to your compute so you don't have to do it yourself. However, there's been a small problem with that: most of the programs that are built to process that data such as AI training, document indexing, and other kinds of workloads expect to read data from a filesystem.

Not to mention, big data means big data. Bigger than ram. Bigger than your disk. Bigger than any one machine can have on any amount of disks. Sometimes even bigger than human minds can imagine. What if that data was as easy to access as your code folder, but had unlimited storage?

We're proud to announce the immediate availability of tigrisfs, the native filesystem interface for Tigris. This lets you mount Tigris buckets to your laptops, desktops, and servers so you can use data in your buckets as if it was local. This bridges the gap between the cloud and your machine.

Ovais Tariq
Ovais Tariq -- CEO @ Tigris Data

Internally, tigrisfs is a fork of geesefs, another project that converts object storage buckets into mountable storage. geesefs has good performance and makes it easy to access the same bucket from the S3 API and the filesystem without obfuscating object names like juicefs. We have extended geesefs to leverage Tigris-specific features that improve throughput and latency. With tigrisfs you can use the S3 API or the filesystem interchangeably without having to worry about name mangling. tigrisfs is the canonical filesystem implementation for Tigris.

Your data: everywhere

Let's imagine that you have big data in your stack. Not just big, but unimaginably big, we're talking about data the size of Wikipedia, the entire Linux Kernel Mailing List archives, and the entire git history for all the big open source projects. Not to mention small datasets like every scientific paper from arxiv. tigrisfs lets you mount the same dataset in the same place on every machine in your cluster. Imagine just reading from /mnt/tigris/datasets/raw/lkml, processing the data around a bit, and then writing it to /mnt/tigris/datasets/massaged/lkml for the downstream analysis to run. We'll go into more detail about this in the near future, keep an eye out for that!

The really cool part about this is that it lets you have a global filesystem on your local machine. All your data is just there and waiting to be used. If you write that massaged dataset to /mnt/tigris/datasets/massaged/lkml on one machine, it's instantly available to any other machine in the cluster. Any time it's used, it'll be seamlessly cached on the device so that it's hot'n'ready for action! It's like having a ReadWriteMany Kubernetes volume, but without having to set up Ceph.

Dataset TypeExamples
Bigger than any one machine can hold with any number of disksWikipedia, Linux Kernel Mailing List archives, entire git history for all big open source projects, and every book published in the last 100 years
Bigger than any one diskThe entire YouTube upload history of your favorite creator
Smaller than RAMEvery scientific paper from arXiv

If you're dealing with anything bigger than ram, tigrisfs is a great fit.

One of the neat parts about tigrisfs is that using it means you can deal with your files using either the S3 API or the filesystem API. This is in contrast to other tools like JuiceFS which break files into blocks and obfuscate the filenames, meaning you need to spend time and energy reverse-engineering how the block → data mapping works. With tigrisfs you can PUT an object into your bucket with the S3 API, and then open the file in your favorite text editor. This unlocks any number of fun integrations, including:

  • Using inotifywait to process data as it's created in a bucket by your analytics pipeline
  • Backing up your home folder with rsync in a cronjob
  • Using tools like gzcat to read compressed data without having to decompress it
  • Storing TLS certificates across the cluster so that one machine can renew it, and it'll roll out to the rest of the machines instantly
  • Reading your training datasets directly from disk instead of having to set up object storage with the datasets library
  • Reading a raw video out of one bucket and compressing it for global distribution into another bucket using ffmpeg

Let's say you want to edit your secret plans in your Linux VM on your MacBook. First, upload it to Tigris with aws s3 cp:

$ aws s3 cp secretplans.txt s3://pitohui
upload: ./secretplans.txt to s3://pitohui/secretplans.txt

Then you can view it like normal with the shell:

xe@pitohui:~ $ cat /mnt/tigris/pitohui/secretplans.txt
- world domination via the use of hypnodrones
- make there be such a thing as a free lunch
- create more paperclips

And now you can do whatever you want! You can even do backups of your home folder with a single command:

xe@pitohui:~ $ rsync -av ~ /mnt/tigris/pitohui

The cloud's the limit!

Getting started with tigrisfs

If you want to get started, all you need is an aarch64/x86_64 Linux system, a Tigris bucket, and a keypair.

Installing tigrisfs

One-liner install

This will install the latest version of tigrisfs and its dependencies.

curl -sSL https://raw.githubusercontent.com/tigrisdata/tigrisfs/refs/heads/main/install.sh | bash

Or, install using package manager

If you're using a package manager, you can install tigrisfs by following the instructions below.

  • Download the package from the most recent release
  • Install the package using your package manager
    • Debian/Ubuntu: sudo apt install ./tigrisfs-version.deb
    • Alma Linux/Fedora/Red Hat/Rocky Linux: sudo dnf install ./tigrisfs-version.rpm

Mounting the filesystem

We are going to assume that you have a bucket called pitohui and you want to mount it to /mnt/tigris/pitohui.

Open /etc/default/tigrisfs in your favorite text editor as root and uncomment the AWS_ACCESS_KEY and AWS_SECRET_ACCESS_KEY variables and paste in the access key you got from the Tigris dashboard.

First, create the directory you want to mount the bucket to:

mkdir -p /mnt/tigris/pitohui

Then mount the bucket:

tigrisfs pitohui /mnt/tigris/pitohui

This gives me permission to do whatever I want such as touching grass:

$ touch /mnt/tigris/pitohui/grass
$ stat /mnt/tigris/pitohui/grass
File: /mnt/tigris/pitohui/grass
Size: 0 Blocks: 0 IO Block: 4096 regular empty file
Device: 80h/128d Inode: 1631 Links: 1
Access: (0644/-rw-r--r--) Uid: ( 1000/ xe) Gid: ( 1000/ xe)
Context: system_u:object_r:fusefs_t:s0
Access: 2025-04-07 20:15:07.549222957 +0000
Modify: 2025-04-07 20:15:07.549222957 +0000
Change: 2025-04-07 20:15:07.549222957 +0000
Birth: -

And make sure it exists in Tigris:

$ aws s3 ls s3://pitohui | grep grass
2025-04-07 16:15:07 0 grass

Perfect!

If you are looking to crank up performance, there are a few configuration options that you can tweak. Take a look at the documentation for more details.

Under the hood

tigrisfs is a fork of geesefs, a high performance FUSE filesystem adaptor for object storage. We have extended geesefs to leverage Tigris-specific features that improve throughput and latency.

tigrisfs is a high performance FUSE filesystem adaptor for object storage based on geesefs, which is a fork of goofys. GeeseFS solves performance problems which FUSE file systems based on S3 typically have, especially with small files and metadata operations. It solves these problems by using aggressive parallelism and asynchrony.

Improvements over GeeseFS

Our initial release zeroed-in on hardening the codebase for production, focusing on two areas:

  • Security hardening
    • Replaced the bundled, legacy AWS SDK that contained known CVEs
    • Upgraded every dependency to its latest secure version
  • Reliability upgrades
    • Eliminated all race conditions flagged by the Go race detector (now mandatory in tests)
    • Fixed every linter warning and added lint checks to CI
    • Dramatically expanded the test-suite and made the extended tests a default part of CI

Tigris-specific improvements

We also shipped a few features that lean on Tigris internals:

  • POSIX semantics - permissions, special files, and symlinks now behave just like they do on a local disk.
  • Turbo-charged small files - listing a directory automatically batch-fetches and caches tiny objects in a single round-trip.
  • Smart prefetch - directory listings kick off background fetches so the next cat or grep feels instant.

In essence, tigrisfs bridges the gap between the Linux kernel and Tigris. It translates filesystem calls into S3 API calls so that you can explore your bucket with the shell and bridge the gap between the old world of servers and shells with the new world of dynamic infinity in the cloud.

Benchmarks

Benchmarking filesystems is kind of annoying, and networked filesystems can be even more annoying to benchmark. Most of the time, you end up making a lot of assumptions about the system state and network configuration. Here are the specs of our benchmarking machine:

ComponentQuantity
Instance typeVM.Standard.E5.Flex (Oracle Cloud)
CPU cores24
Memory24 gigabytes (24Gi)
Network bandwidth24 gigabits per second

Our benchmarks are done with the flexible i/o tester fio. Note that we are using direct I/O to avoid page caching being an issue.

Read performance

Here is the command we used to test read performance on a bucket:

fio --name=read_throughput \
--directory=/mnt/test-tigrisfs-bucket \
--numjobs=4 \
--size=4G \
--time_based \
--runtime=120s \
--ramp_time=2s \
--ioengine=libaio \
--direct=1 \
--verify=0 \
--bs=1M \
--iodepth=1 \
--rw=read \
--group_reporting=1

This has fio run for 2 minutes with 2 seconds of ramp-up time (during which, the results are not counted in the statistics) trying to read up to 4 gigabytes of data per thread (job) in one megabyte blocks. This reads a total of 16 gigabytes of data. The test was run in permutations of thread count and block size to see if the limitations are on tigrisfs, the Tigris service, and the network card of the machine.

And we got these results for each permutation of the test:

ThreadsBlock SizeThroughput (MiB/sec)
41M1630
44M2446
81M2802 *
84M2732 *
note

The throughput numbers with the asterisk next to them could theoretically be faster, but at this point we saturated out the network card on the test machine.

Write performance

Here is the command we used to test write performance:

fio --name=write_throughput \
--directory=/mnt/test-tigrisfs-bucket \
--numjobs=8 \
--size=4G \
--time_based \
--runtime=120s \
--ramp_time=2s \
--ioengine=libaio \
--direct=1 \
--verify=0 \
--bs=4M \
--iodepth=1 \
--rw=write \
--group_reporting=1

This has fio run for 2 minutes with 2 seconds of ramp-up time (during which the results are not counted in the statistics) trying to write up to 4 gigabytes of data per thread (job) in four-megabyte blocks. This writes a total of 16 gigabytes of data. The test was run in permutations of thread count and block size to see if the limitations are on tigrisfs, the Tigris service, and the network card of the machine.

And we got these results for each permutation of the test:

ThreadsBlock SizeThroughput (MiB/sec)
41M1118
44M1119
81M1269
84M1279

Needless to say, this represents being able to read and write multiple DVDs of data per second per machine with tigrisfs. This combined with the caching that tigrisfs uses means that it should be more than sufficient for anything you can throw at it.

When should I use tigrisfs?

FeatureTigrisFSS3 API
Legacy Tool Integration
Direct Filesystem Access
On-Demand File Fetching
AI Model Training
Global Performance

Personally, I use tigrisfs all the time on my own machines. One of the main things I use it for is running analytics across honeypot logs so that I can fight off evil scrapers and save the internet.

In general, tigrisfs can be slower than the native disk for files that aren't cached yet, but it more than makes up for it by allowing you to make the location of your files irrelevant. All you need to do is run tigrisfs, and you have a single global namespace for your data across all your machines.

tigrisfs is written in Go and is open source on GitHub. We welcome any and all contributions to make it even better!