Building and Securing an API for LLM Access with FastAPI and Ollama



This lab demonstrate how to create a secure API using Python, FastAPI, and Ollama to control access to a Large Language Model (LLM) running locally.

Objective

In this lab, you will:

  • Build a secure API using FastAPI that interfaces with a locally run LLM via Ollama.
  • Implement API key-based authentication to control access.
  • Understand the importance of securing access to AI models.
  • Test the API using tools like Postman or curl.
  • Optionally, deploy the API to a cloud platform using Docker and set up a CI/CD pipeline.

Prerequisites

Before starting, ensure you have:

  • Basic Python programming knowledge: Familiarity with Python syntax and libraries.
  • Command-line proficiency: Ability to navigate and execute commands in a terminal.
  • Understanding of HTTP methods and APIs: Knowledge of GET, POST requests, and API concepts.
  • System requirements: A computer with sufficient hardware to run an LLM locally (e.g., 8GB+ RAM, depending on the model).

Lab Setup

Step 1: Install Python and pip

  • Ensure Python 3.x and pip are installed on your system.
  • Verify installation:
    python3 --version
    pip3 --version
  • If not installed, download from python.org.

Step 2: Install Required Libraries

  • Create a file named requirements.txt with the following content:
    fastapi
    uvicorn
    ollama
    python-dotenv
    requests
  • Install the dependencies:
    pip3 install -r requirements.txt
    • fastapi: Framework for building the API.
    • uvicorn: ASGI server to run the FastAPI app.
    • ollama: Library to interface with the Ollama LLM.
    • python-dotenv: For loading environment variables.
    • requests: For testing the API from Python.

Step 3: Set Up Ollama

  • Download Ollama: Visit ollama.com/download and install it for your OS.
  • Pull an LLM model: Open a terminal and run:
    ollama pull mistral
    This downloads the Mistral model (a lightweight, open-source LLM).
  • Verify Ollama: Test it by running:
    ollama run mistral
    Type a prompt (e.g., “Hello World”) and confirm you get a response. Exit with /bye.

Lab Steps

Step 1: Create a Simple FastAPI Application

  1. Create a Project Directory:

    mkdir llm-api-lab
    cd llm-api-lab
  2. Initialize Version Control (DevOps Practice):

    git init
  3. Write the API Code:

    • Create a file named main.py:

      from fastapi import FastAPI
      import ollama
      
      app = FastAPI()
      
      @app.post("/generate")
      def generate(prompt: str):
          response = ollama.chat(model="mistral", messages=[{"role": "user", "content": prompt}])
          return {"response": response["message"]["content"]}
    • This defines a POST endpoint /generate that accepts a prompt parameter and returns an LLM response.

  4. Run the Application:

    uvicorn main:app --reload
    • main:app refers to the app object in main.py.
    • --reload enables auto-reloading for development.
  5. Commit Your Work:

    git add main.py requirements.txt
    git commit -m "Initial FastAPI app with /generate endpoint"

Step 2: Test the Unsecured API

  1. Using curl:

    curl -X POST "http://localhost:8000/generate?prompt=Hello%20World"

    You should see a JSON response like {"response": "Hello World! ..."}.

  2. Using Postman (Alternative):

    • Download Postman from postman.com.
    • Create a new request:
      • Method: POST
      • URL: http://localhost:8000/generate?prompt=Hello World
      • Click “Send” and verify the response.

Step 3: Secure the API with an API Key

  1. Update main.py for Security:

    from fastapi import FastAPI, Depends, HTTPException, Header
    import ollama
    import os
    from dotenv import load_dotenv
    
    app = FastAPI()
    
    # Load environment variables
    load_dotenv()
    API_KEYS = {os.getenv("API_KEY")}
    
    # Dependency to verify API key
    def verify_api_key(x_api_key: str = Header(None)):
        if x_api_key not in API_KEYS:
            raise HTTPException(status_code=401, detail="Invalid API Key")
        return x_api_key
    
    @app.post("/generate")
    def generate(prompt: str, api_key: str = Depends(verify_api_key)):
        response = ollama.chat(model="mistral", messages=[{"role": "user", "content": prompt}])
        return {"response": response["message"]["content"]}
  2. Create an .env File:

    • In the project directory, create .env:
      API_KEY=your_secret_key
    • Replace your_secret_key with a secure key (e.g., mysecret123).
  3. Add .env to .gitignore (DevOps Practice):

    • Create .gitignore:
      .env
    • Commit changes:
      git add main.py .gitignore
      git commit -m "Added API key authentication"
  4. Restart the Server:

    uvicorn main:app --reload

Step 4: Test the Secured API

  1. Test with Correct API Key:

    • Using curl:
      curl -X POST "http://localhost:8000/generate?prompt=Hello%20World" -H "x-api-key: your_secret_key"
      Replace your_secret_key with the value from .env. You should get a valid response.
  2. Test with Incorrect/No API Key:

    • Without header:
      curl -X POST "http://localhost:8000/generate?prompt=Hello%20World"
      Expect a 401 Unauthorized error: {"detail": "Invalid API Key"}.
    • With wrong key:
      curl -X POST "http://localhost:8000/generate?prompt=Hello%20World" -H "x-api-key: wrongkey"
      Same error should appear.
  3. Using Postman:

    • Add a header: Key = x-api-key, Value = your_secret_key.
    • Send the request and verify success.
    • Remove the header or use an incorrect key and confirm the 401 error.

Step 5: Implement a Credit System (Optional)

  1. Modify main.py for Credits:

    from fastapi import FastAPI, Depends, HTTPException, Header
    import ollama
    import os
    from dotenv import load_dotenv
    
    app = FastAPI()
    load_dotenv()
    
    # Dictionary to track credits per API key
    API_KEY_CREDITS = {os.getenv("API_KEY"): 5}  # 5 credits initially
    
    def verify_api_key(x_api_key: str = Header(None)):
        credits = API_KEY_CREDITS.get(x_api_key, 0)
        if credits <= 0:
            raise HTTPException(status_code=401, detail="Invalid API Key or No Credits")
        return x_api_key
    
    @app.post("/generate")
    def generate(prompt: str, api_key: str = Depends(verify_api_key)):
        # Deduct a credit
        API_KEY_CREDITS[api_key] -= 1
        response = ollama.chat(model="mistral", messages=[{"role": "user", "content": prompt}])
        return {"response": response["message"]["content"]}
  2. Test the Credit System:

    • Send the POST request with the correct API key 5 times (e.g., using curl or Postman).
    • On the 6th attempt, you should receive a 401 error due to no remaining credits.
    • Restart the server to reset credits (since this is an in-memory implementation).
  3. Commit Changes:

    git add main.py
    git commit -m "Added credit system to API"

Step 6: Deploy the API to the Cloud (Advanced Optional)

  1. Containerize with Docker:

    • Create a Dockerfile:
      FROM python:3.9-slim
      WORKDIR /app
      COPY requirements.txt .
      RUN pip install --no-cache-dir -r requirements.txt
      COPY . .
      CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]
    • Build and test locally:
      docker build -t llm-api .
      docker run -p 8000:8000 --env-file .env llm-api
  2. Deploy to AWS ECS (Example):

    • Push the Docker image to a registry (e.g., AWS ECR):
      aws ecr create-repository --repository-name llm-api
      docker tag llm-api:latest <your-ecr-uri>:latest
      docker push <your-ecr-uri>:latest
    • Deploy using AWS ECS (Fargate) via the AWS console or CLI.
  3. Set Up CI/CD with GitHub Actions:

    • Create .github/workflows/deploy.yml:
      name: Deploy to AWS ECS
      on:
        push:
          branches: [main]
      jobs:
        build-and-deploy:
          runs-on: ubuntu-latest
          steps:
            - uses: actions/checkout@v3
            - name: Build and Push Docker Image
              run: |
                docker build -t llm-api .
                docker tag llm-api:latest <your-ecr-uri>:latest
                docker push <your-ecr-uri>:latest          
    • Configure AWS credentials in GitHub Secrets.
  4. Commit Docker and CI/CD Files:

    git add Dockerfile .github/workflows/deploy.yml
    git commit -m "Added Docker and CI/CD configuration"

Deliverables

Submit the following:

  1. Working FastAPI Application: The main.py file with all implemented features.
  2. API Documentation: A brief document (e.g., Markdown) describing:
    • Endpoint: POST /generate
    • Parameters: prompt (string), x-api-key (header)
    • Example request/response
  3. Security Report: A 1-page explanation of:
    • Why securing API access to AI models is critical.
    • How API key authentication and credits achieve this in your implementation.

Assessment Criteria

  • Functionality: API works correctly with unsecured and secured endpoints.
  • Security: API key authentication is properly implemented and enforced.
  • Understanding: Security report demonstrates clear comprehension of concepts.
  • Code Quality: Code is well-structured, commented, and version-controlled.
  • Optional: Successful cloud deployment and CI/CD setup (if attempted).

Conclusion

This lab provides practical experience in building and securing an API for LLM access, integrating DevOps practices like version control and optional cloud deployment. You’ve learned to use FastAPI, Ollama, and security mechanisms, preparing you for real-world DevOps and Cloud challenges in AI-driven applications. For further exploration, consider enhancing the credit system with a persistent database or integrating more advanced authentication methods like OAuth.


By Wahid Hamdi