Mastering FastAPI Lifespan: Optimize Your App's Lifecycle

As developers, we're always looking for ways to make our applications more efficient and performant. If you're working with FastAPI, you're in luck! The FastAPI Lifespan feature is a powerful tool that allows you to control the lifecycle of your application, optimizing resource management and improving overall performance. In this post, we'll dive deep into FastAPI Lifespan and explore a practical example using vector embeddings.

What is FastAPI Lifespan?

FastAPI Lifespan is a context manager that enables you to define logic that should be executed before your application starts up and after it shuts down. This feature is particularly useful for:

Setting up shared resources
Initializing database connections
Loading machine learning models
Cleaning up resources when the application shuts down

How Does It Work?

The Lifespan feature uses an async context manager to control the application's lifecycle. Here's a basic structure:

from contextlib import asynccontextmanager
from fastapi import FastAPI

@asynccontextmanager
async def lifespan(app: FastAPI):
    # Startup logic
    print("Application is starting up")
    yield
    # Shutdown logic
    print("Application is shutting down")

app = FastAPI(lifespan=lifespan)

The code before the yield statement is executed during startup, while the code after is executed during shutdown.

Practical Example: Vector Embeddings with FastAPI Lifespan

Let's explore a more complex example using vector embeddings. We'll create a FastAPI application that loads a pre-trained word embedding model during startup and uses it to compute similarities between words.

import numpy as np
from fastapi import FastAPI
from contextlib import asynccontextmanager
from gensim.models import KeyedVectors

# Global variable to store our word embeddings
word_vectors = None

@asynccontextmanager
async def lifespan(app: FastAPI):
    # Startup: Load word embeddings
    global word_vectors
    print("Loading word embeddings...")
    word_vectors = KeyedVectors.load_word2vec_format('path/to/your/embeddings.bin', binary=True)
    print("Word embeddings loaded successfully!")
    yield
    # Shutdown: Clear the model from memory
    print("Clearing word embeddings from memory...")
    word_vectors = None

app = FastAPI(lifespan=lifespan)

@app.get("/similarity/{word1}/{word2}")
async def get_word_similarity(word1: str, word2: str):
    if word_vectors is None:
        return {"error": "Word embeddings not loaded"}
    
    try:
        similarity = word_vectors.similarity(word1, word2)
        return {"similarity": float(similarity)}
    except KeyError:
        return {"error": "One or both words not found in the vocabulary"}

@app.get("/most_similar/{word}")
async def get_most_similar(word: str, n: int = 5):
    if word_vectors is None:
        return {"error": "Word embeddings not loaded"}
    
    try:
        similar_words = word_vectors.most_similar(word, topn=n)
        return {"similar_words": [{"word": w, "similarity": float(s)} for w, s in similar_words]}
    except KeyError:
        return {"error": "Word not found in the vocabulary"}

In this example:

We use the lifespan context manager to load a pre-trained word embedding model when the application starts up.
The model is stored in a global variable word_vectors for use across different endpoints.
We provide two endpoints:
- /similarity/{word1}/{word2} to compute the similarity between two words
- /most_similar/{word} to find the most similar words to a given word
When the application shuts down, we clear the word_vectors from memory.

Benefits of Using FastAPI Lifespan

Efficient Resource Management: Load heavy resources like ML models only once, not on every request.
Improved Performance: By preloading resources, you reduce the latency of individual requests.
Clean Shutdowns: Properly release resources when your application stops, preventing memory leaks.
Separation of Concerns: Keep your startup and shutdown logic separate from your request handling code.

Conclusion

FastAPI Lifespan is a powerful feature that allows you to optimize your application's lifecycle. By leveraging this tool, you can efficiently manage resources, improve performance, and create more robust FastAPI applications.

In our vector embeddings example, we've seen how Lifespan can be used to load a large model once at startup, use it across multiple endpoints, and properly clean up when the application shuts down. This pattern can be applied to various scenarios, from database connections to complex machine learning pipelines.

So, the next time you're building a FastAPI application, remember to harness the power of Lifespan to take your app's performance and resource management to the next level!