Sarim

Search Blogs...

pgvector: a powerful vector database for LLMs

How can you leverage your personal raw data (such as markdown files and notes) to enable LLMs to answer questions about them?

Answer: Implement Semantic Search using Vector Databases for efficient information retrieval and contextual understanding.

Let me show the easiest way you can achieve this by utilizing Supabase and OpenAI:

  1. Enable pgvector extension in Supabase:
    • Go to your Supabase project dashboard
    • Navigate to the SQL Editor
    • Run the following SQL command: CREATE EXTENSION vector;
  2. Create a table to store your documents:
    CREATE TABLE personal_notes (
      id BIGSERIAL PRIMARY KEY,
      content TEXT,
      embedding VECTOR(1536)
    );
    
  3. Use OpenAI's API to generate embeddings for your documents:
    import openai
    import supabase
    
    openai.api_key = "your-openai-api-key"
    supabase_client = supabase.create_client("your-supabase-url", "your-supabase-key")
    
    def insert_personal_note(content):
        response = openai.Embedding.create(input=content, model="text-embedding-ada-002")
        embedding = response['data'][0]['embedding']
        
        supabase_client.table("personal_notes").insert({
            "content": content,
            "embedding": embedding
        }).execute()
    
    # Example usage
    insert_personal_note("Remember to buy groceries: milk, eggs, and bread")
    
  4. Perform semantic search:
    def search_personal_notes(query, limit=5):
        query_embedding = openai.Embedding.create(input=query, model="text-embedding-ada-002")['data'][0]['embedding']
        
        result = supabase_client.rpc(
            "match_personal_notes",
            {
                "query_embedding": query_embedding,
                "match_threshold": 0.5,
                "match_count": limit
            }
        ).execute()
        
        return result.data
    
    # Example usage
    results = search_personal_notes("What do I need to buy at the grocery store?")
    
  5. Create a stored procedure in Supabase for efficient similarity search:
    CREATE FUNCTION match_personal_notes(query_embedding VECTOR(1536), match_threshold FLOAT, match_count INT)
    RETURNS TABLE (
        id BIGINT,
        content TEXT,
        similarity FLOAT
    )
    LANGUAGE plpgsql
    AS $$
    BEGIN
      RETURN QUERY
      SELECT
        id,
        content,
        1 - (personal_notes.embedding <=> query_embedding) AS similarity
      FROM personal_notes
      WHERE 1 - (personal_notes.embedding <=> query_embedding) > match_threshold
      ORDER BY personal_notes.embedding <=> query_embedding
      LIMIT match_count;
    END;
    $$;
    

By following these steps, you can create a powerful semantic search system for your personal data using Supabase and OpenAI, enabling LLMs to efficiently answer questions about your personal notes, reminders, and other markdown files.

© 2024 Sarim Ahmed