Document Search with LanceDB
We've created an example app that lets you search the Tigris docs using LanceDB and the OpenAI embedding model at tigrisdata-community/docs-search-example. To set this up you need the following:
- An API key for OpenAI
- A Tigris account and bucket
- A Tigris access keypair
- Node.js installed
Clone the repository to your local machine:
git clone https://github.com/tigrisdata-community/docs-search-example
cd docs-search-example
Install all of the NPM dependencies:
npm ci
Then clone the blog and documentation repositories:
cd var
git clone https://github.com/tigrisdata/tigris-blog
git clone https://github.com/tigrisdata/tigris-os-docs
cd ..
Set your OpenAI API key and Tigris credentials in your environment:
export OPENAI_API_KEY=sk-hunter2-hunter2hunter2hunter2
export AWS_ACCESS_KEY_ID=tid_AzureDiamond
export AWS_SECRET_ACCESS_KEY=tid_hunter2hunter2hunter2
export AWS_ENDPOINT_URL_S3=https://t3.storage.dev
export AWS_REGION=auto
export BUCKET_NAME=your-bucket-name-here
Make sure to replace the secrets with your keys!
Ingest the docs with ingest.ts
:
npx tsx ingest.ts
Then you can run the server with node
:
node index.js
Then open http://localhost:3000 and search for whatever you want to know!
Next steps
The following is left as an exercise for the reader:
- The markdown chunkify function doesn't properly handle Markdown front matter. Try adding support for it using the gray-matter package.
- Try integrating this with an AI model by passing the user query through LanceDB to get a list of candidate documents, insert the document details, and then see how it changes the results of your model.
What else can you do with this database? The cloud's the limit!