Build in public

Blog

Contact

Subscribe

May 26, 2024

Load Testing the OpenAI LLM Completions Endpoint with Locust

Hey there! Ready to put the OpenAI LLM Completions Endpoint through its paces? Let’s dive into how you can load test this bad boy using Locust, the OpenAI SDK, and a custom router. Don’t worry, I’ve got your back every step of the way. Let’s do this!

‍

Step 1: Install the Necessary Goodies

First things first, we need to grab some tools. Open up your terminal and run:

bash

pip install locust openai

Boom! Now you’ve got Locust and the OpenAI SDK ready to roll.

‍

Step 2: Set Up OpenAI SDK

Now, let’s tell the OpenAI SDK who’s boss by giving it your API key:

python

import openai
openai.api_key = 'YOUR_API_KEY'

‍

Don’t forget to replace

'YOUR_API_KEY'

with, well, your actual API key.

‍

Step 3: Create a Custom Router

Time to get fancy with a custom router. This little buddy will handle all the requests to the OpenAI completions endpoint, including retries and collecting detailed metrics.

Custom Router Class:

python

import openai
import time
import logging
class OpenAIRouter:
def __init__(self, api_key):
self.api_key = api_key
openai.api_key = self.api_key

def get_completion(self, prompt, model="text-davinci-002", max_tokens=50):
try:
response = openai.Completion.create(
model=model,
prompt=prompt,
max_tokens=max_tokens
)
return response
except openai.error.RateLimitError as e:
logging.warning(f"Rate limit exceeded: {e}. Retrying after a delay.")
time.sleep(5) # Back off for 5 seconds
return self.get_completion(prompt, model, max_tokens)
except openai.error.OpenAIError as e:
logging.error(f"OpenAI error: {e}")
return None
except Exception as e:
logging.error(f"Unexpected error: {e}")
return None

‍

This code ensures we’re handling errors gracefully, like a pro.

‍

Step 4: Integrate the Custom Router with Locust

Let’s plug this router into Locust. Time to unleash the power!

Locust Test Script Using Custom Router:

python

‍

from locust import HttpUser, TaskSet, task, between
import logging
from openai_router import OpenAIRouter
class UserBehavior(TaskSet):
def on_start(self):
self.router = OpenAIRouter(api_key="YOUR_API_KEY")

@task
def test_completion(self):
prompt = "Once upon a time" # Mix it up in a real test!
response = self.router.get_completion(prompt)
if response:
print(response.choices[0].text.strip())

class WebsiteUser(HttpUser):
tasks = [UserBehavior]
wait_time = between(1, 5)

def on_start(self):
logging.basicConfig(level=logging.INFO)

‍

Step 5: Run the Test

Alright, let’s light this candle. Run the Locust test with:

bash

‍

locust -f locustfile.py

‍

Fire up your browser and head to http://localhost:8089. Configure the number of users and the spawn rate, then sit back and watch the magic happen.

‍

Step 6: Monitor and Analyze Results

While Locust does its thing, keep an eye on:

Response Time: How quickly are we getting answers?
Success Rate: How often are we hitting the mark versus crashing and burning?
Throughput: How many requests are we churning through per second?

Locust’s web interface will show you all this in real-time. It’s like watching a thrilling data-driven movie!

‍

Step 7: Optimize and Iterate

Found some bottlenecks? Time to tinker:

‍

Scale up your resources.
Tweak your prompt handling.
Improve network configs.

‍

Run the tests again to see if you’ve made things better. Rinse and repeat until you’re happy with the results.

‍

Bonus Tips for Smooth Sailing

API Rate Limits:
- Respect the rate limits, or face the wrath of throttling. Implement client-side rate limiting and handle those “slow down” messages gracefully.
Resource Management:
- Don’t hog all the resources! Run tests in an isolated environment or on dedicated hardware.
Scalability:
- For massive loads, go distributed. Use a master node with multiple worker nodes to really push the limits.
Data Variability:
- Mix up your prompts to simulate real-world usage. Don’t be that person who only tests with “Hello, world.”
Logging and Monitoring:
- Log everything! Monitor everything! Use tools like Grafana and Prometheus to keep tabs on performance in real-time.

‍

Conclusion

Using Locust with the OpenAI SDK and a custom router is like having a supercharged toolkit for load testing the OpenAI LLM Completions Endpoint. Follow these steps, keep tweaking, and you’ll ensure your endpoint can handle whatever you throw at it. Happy testing, and may the load be ever in your favor!