pgvector: a powerful vector database for LLMs
How can you leverage your personal raw data (such as markdown files and notes) to enable LLMs to answer questions about them?
Answer: Implement Semantic Search using Vector Databases for efficient information retrieval and contextual understanding.
Let me show the easiest way you can achieve this by utilizing Supabase and OpenAI:
- Enable pgvector extension in Supabase:
- Go to your Supabase project dashboard
- Navigate to the SQL Editor
- Run the following SQL command:
CREATE EXTENSION vector;
- Create a table to store your documents:
CREATE TABLE personal_notes ( id BIGSERIAL PRIMARY KEY, content TEXT, embedding VECTOR(1536) );
- Use OpenAI's API to generate embeddings for your documents:
import openai import supabase openai.api_key = "your-openai-api-key" supabase_client = supabase.create_client("your-supabase-url", "your-supabase-key") def insert_personal_note(content): response = openai.Embedding.create(input=content, model="text-embedding-ada-002") embedding = response['data'][0]['embedding'] supabase_client.table("personal_notes").insert({ "content": content, "embedding": embedding }).execute() # Example usage insert_personal_note("Remember to buy groceries: milk, eggs, and bread")
- Perform semantic search:
def search_personal_notes(query, limit=5): query_embedding = openai.Embedding.create(input=query, model="text-embedding-ada-002")['data'][0]['embedding'] result = supabase_client.rpc( "match_personal_notes", { "query_embedding": query_embedding, "match_threshold": 0.5, "match_count": limit } ).execute() return result.data # Example usage results = search_personal_notes("What do I need to buy at the grocery store?")
- Create a stored procedure in Supabase for efficient similarity search:
CREATE FUNCTION match_personal_notes(query_embedding VECTOR(1536), match_threshold FLOAT, match_count INT) RETURNS TABLE ( id BIGINT, content TEXT, similarity FLOAT ) LANGUAGE plpgsql AS $$ BEGIN RETURN QUERY SELECT id, content, 1 - (personal_notes.embedding <=> query_embedding) AS similarity FROM personal_notes WHERE 1 - (personal_notes.embedding <=> query_embedding) > match_threshold ORDER BY personal_notes.embedding <=> query_embedding LIMIT match_count; END; $$;
By following these steps, you can create a powerful semantic search system for your personal data using Supabase and OpenAI, enabling LLMs to efficiently answer questions about your personal notes, reminders, and other markdown files.