← Back to blog

pgvector: Scale Semantic Search in PostgreSQL Without New Infrastructure

How can you leverage your personal raw data (such as markdown files and notes) to enable LLMs to answer questions about them?

Answer: Implement Semantic Search using Vector Databases for efficient information retrieval and contextual understanding.

Let me show the easiest way you can achieve this by utilizing Supabase and OpenAI:

  1. Enable pgvector extension in Supabase:

    • Go to your Supabase project dashboard
    • Navigate to the SQL Editor
    • Run the following SQL command: CREATE EXTENSION vector;
  2. Create a table to store your documents:

    CREATE TABLE personal_notes (
      id BIGSERIAL PRIMARY KEY,
      content TEXT,
      embedding VECTOR(1536)
    );
  3. Use OpenAI's API to generate embeddings for your documents:

    import openai
    import supabase
    
    openai.api_key = "your-openai-api-key"
    supabase_client = supabase.create_client("your-supabase-url", "your-supabase-key")
    
    def insert_personal_note(content):
        response = openai.Embedding.create(input=content, model="text-embedding-ada-002")
        embedding = response['data'][0]['embedding']
        
        supabase_client.table("personal_notes").insert({
            "content": content,
            "embedding": embedding
        }).execute()
    
    # Example usage
    insert_personal_note("Remember to buy groceries: milk, eggs, and bread")
  4. Perform semantic search:

    def search_personal_notes(query, limit=5):
        query_embedding = openai.Embedding.create(input=query, model="text-embedding-ada-002")['data'][0]['embedding']
        
        result = supabase_client.rpc(
            "match_personal_notes",
            {
                "query_embedding": query_embedding,
                "match_threshold": 0.5,
                "match_count": limit
            }
        ).execute()
        
        return result.data
    
    # Example usage
    results = search_personal_notes("What do I need to buy at the grocery store?")
  5. Create a stored procedure in Supabase for efficient similarity search:

    CREATE FUNCTION match_personal_notes(query_embedding VECTOR(1536), match_threshold FLOAT, match_count INT)
    RETURNS TABLE (
        id BIGINT,
        content TEXT,
        similarity FLOAT
    )
    LANGUAGE plpgsql
    AS $$
    BEGIN
      RETURN QUERY
      SELECT
        id,
        content,
        1 - (personal_notes.embedding <=> query_embedding) AS similarity
      FROM personal_notes
      WHERE 1 - (personal_notes.embedding <=> query_embedding) > match_threshold
      ORDER BY personal_notes.embedding <=> query_embedding
      LIMIT match_count;
    END;
    $$;

By following these steps, you can create a powerful semantic search system for your personal data using Supabase and OpenAI, enabling LLMs to efficiently answer questions about your personal notes, reminders, and other markdown files.