Metadata Querying for Object Storage feat. Elixir
Introduction
I admit it. My first Tigris blog post about Eager and Lazy caching was kind of basic. It was important to cover the ground-work. The CDN aspect is important and I do like the summon-your-data pre-fetch header a lot. Now we get to the significantly more disruptive stuff. The things that while Tigris is an S3-compatible API it also provides features that enable entirely new use-cases and push the boundaries of what you can do with object storage. Let's see if we can't set your internal constraint-solver aflame with possibilities.
First a bit of setup. Again, this post is also a Livebook which means that you can run all of it either locally with Livebook Desktop or on Fly.
Mix.install(
[
{:ex_aws, "~> 2.5"},
{:ex_aws_s3, "~> 2.5"},
{:hackney, "~> 1.20"},
{:poison, "~> 3.0"},
{:sweet_xml, "~> 0.6"},
:jason,
:req
],
config: [
ex_aws: [
access_key_id: [{:system, "LB_AWS_ACCESS_KEY_ID"}],
secret_access_key: [{:system, "LB_AWS_SECRET_ACCESS_KEY"}],
endpoint_url_s3: [{:system, "LB_AWS_ENDPOINT_URL_S3"}],
region: [{:system, "LB_AWS_REGION"}],
s3: [scheme: "https://", host: "fly.storage.tigris.dev", port: 443]
]
]
)
alias ExAws.S3
bucket = System.fetch_env!("LB_BUCKET_NAME")
# Get some files, upload some files
# If you run this many times github might get cranky
%{
"manifesto.txt" =>
"https://ia800408.us.archive.org/26/items/HackersManifesto/Hackers-manafesto.txt",
"sample.jpg" => "https://underjord.io/assets/images/lawik-square.jpg",
"lawik.json" => "https://api.github.com/users/lawik",
"underjord.svg" => "https://underjord.io/img/logo2.svg",
"globe.webp" =>
"https://cdn.prod.website-files.com/657988158c7fb30f4d9ef37b/6582a4f8d777a7f9c79bee68_Globally%20Distributed%20S3-compatible.webp"
}
|> Enum.map(fn {name, url} ->
%{body: body, headers: headers} = Req.get!(url, decode_body: false)
type = headers["content-type"] |> hd() |> String.split(";") |> hd()
# Spacing in time for demo purposes
:timer.sleep(1000)
S3.put_object(bucket, name, body, content_type: type) |> ExAws.request!()
end)
What is Metadata Querying?
If you read back on the blog the engineering team at Tigris are very excited about Foundation DB and that is the underpinning for the entire metadata system and particularly for this feature. A fast and scalable metadata system lets Tigris find and fetch data with much lower latency than is typical of object storage. A highly capable metadata system allows Tigris to do more with metadata.
Let's talk metadata querying. It allows us to perform SQL-style queries on our object metadata and importantly sort it based on metadata. Currently three fields are supported:
Content-Typemeaning the mimetype, so "image/jpeg", "text/html" or "application/json".Content-Lengthwhich holds the number of bytes the object takes up on disk.Last-Modifieda timestamp for when the object last changed.
It is all done via a custom header to fit within the bounds of the S3 API. You can do a lot with this. Some of it is straight up practical.
Fetching a range of mime types
Fetching a set of content types. Not just specific ones but even based on a prefix. The comparisons specified here are a little bit unusual as they are range queries. Anything between "image/" and "image0" indicates essentially everything that starts with "image/".
This is incredibly awkward to do in many object storage providers.
bucket
|> S3.list_objects_v2(
headers: %{
"X-Tigris-Query" => ~s(`Content-Type` > "image/" and `Content-Type` < "image0")
}
)
|> ExAws.request!()
|> Map.fetch!(:body)
|> Map.fetch!(:contents)
Others are very flexible and quite the "we can't wait to see what you do with it" such as:
