pgvector: Scale Semantic Search in PostgreSQL Without New Infrastructure
How can you leverage your personal raw data (such as markdown files and notes) to enable LLMs to answer questions about them?
Answer: Implement Semantic Search using Vector Databases for efficient information retrieval and contextual understanding.
Let me show the easiest way you can achieve this by utilizing Supabase and OpenAI:
Enable pgvector extension in Supabase:
- Go to your Supabase project dashboard
- Navigate to the SQL Editor
- Run the following SQL command:
CREATE EXTENSION vector;
Create a table to store your documents:
CREATE TABLE personal_notes ( id BIGSERIAL PRIMARY KEY, content TEXT, embedding VECTOR(1536) );
Use OpenAI's API to generate embeddings for your documents:
import openai import supabase openai.api_key = "your-openai-api-key" supabase_client = supabase.create_client("your-supabase-url", "your-supabase-key") def insert_personal_note(content): response = openai.Embedding.create(input=content, model="text-embedding-ada-002") embedding = response['data'][0]['embedding'] supabase_client.table("personal_notes").insert({ "content": content, "embedding": embedding }).execute() # Example usage insert_personal_note("Remember to buy groceries: milk, eggs, and bread")
Perform semantic search:
def search_personal_notes(query, limit=5): query_embedding = openai.Embedding.create(input=query, model="text-embedding-ada-002")['data'][0]['embedding'] result = supabase_client.rpc( "match_personal_notes", { "query_embedding": query_embedding, "match_threshold": 0.5, "match_count": limit } ).execute() return result.data # Example usage results = search_personal_notes("What do I need to buy at the grocery store?")
Create a stored procedure in Supabase for efficient similarity search:
CREATE FUNCTION match_personal_notes(query_embedding VECTOR(1536), match_threshold FLOAT, match_count INT) RETURNS TABLE ( id BIGINT, content TEXT, similarity FLOAT ) LANGUAGE plpgsql AS $$ BEGIN RETURN QUERY SELECT id, content, 1 - (personal_notes.embedding <=> query_embedding) AS similarity FROM personal_notes WHERE 1 - (personal_notes.embedding <=> query_embedding) > match_threshold ORDER BY personal_notes.embedding <=> query_embedding LIMIT match_count; END; $$;
By following these steps, you can create a powerful semantic search system for your personal data using Supabase and OpenAI, enabling LLMs to efficiently answer questions about your personal notes, reminders, and other markdown files.