Performance Testing
Previously: Monorepo Patterns. We structured our packages. Now how do we prove they perform under load?
Your code works. Tests pass. The traces look clean. You ship to production.
Then Black Friday hits. Traffic spikes 10x. Response times climb from 200ms to 8 seconds. The database connection pool is exhausted. Users see timeouts. Revenue drops by the minute.
You open your traces. They show the same thing they always showed: single requests completing successfully. But under load, everything’s different. Where’s the bottleneck?
You don’t know because you never tested under load.
Why Load Testing Matters
Section titled “Why Load Testing Matters”Unit tests verify correctness. Integration tests verify the stack works together. But neither answers:
- How many concurrent users can we handle?
- What breaks first under load?
- Where do we spend the most time when the system is stressed?
Your OpenTelemetry traces from post 5 are essential here. But traces from single requests show “happy path” performance. Under load, different problems emerge:
graph TD
subgraph Single["Single Request<br/>getOrder: 15ms<br/>getInventory: 20ms<br/>processPayment: 100ms<br/>Total: 135ms OK"]
end
subgraph Load["Under 1000 concurrent requests<br/>getOrder: 450ms connection pool waiting<br/>getInventory: 2000ms external API rate limited<br/>processPayment: 100ms still fast<br/>Total: 2550ms FAILED"]
end
Single --> Load
style Single fill:#475569,stroke:#0f172a,stroke-width:2px,color:#fff
style Load fill:#64748b,stroke:#0f172a,stroke-width:2px,color:#fff
linkStyle 0 stroke:#0f172a,stroke-width:3px
Load testing reveals:
- Connection pool exhaustion: 10 connections serving 1000 requests
- N+1 queries that explode: 1 query becomes 1000 queries
- External API rate limits: Fine at 10 RPS, blocked at 100 RPS
- Memory leaks: Invisible in short tests, catastrophic over hours
- Race conditions: Never trigger with single requests
Load Testing with k6
Section titled “Load Testing with k6”k6 is a modern load testing tool. It’s JavaScript-based, outputs metrics, and integrates with CI/CD. Install it:
# macOS
brew install k6
# Or download from https://k6.io/docs/get-started/installation/Your First Load Test
Section titled “Your First Load Test”// load-tests/orders-api.js
import http from 'k6/http';
import { check, sleep } from 'k6';
import { randomUUID } from 'https://jslib.k6.io/k6-utils/1.4.0/index.js';
// Test configuration
export const options = {
vus: 10, // 10 virtual users
duration: '30s', // Run for 30 seconds
thresholds: {
http_req_duration: ['p(95)<500'], // 95% of requests under 500ms
http_req_failed: ['rate<0.01'], // Less than 1% failure rate
},
};
const BASE_URL = __ENV.BASE_URL || 'http://localhost:3000';
const API_KEY = __ENV.API_KEY || 'test-key';
export default function () {
// Create an order
const payload = JSON.stringify({
customerId: 'cust-123',
items: [
{ productId: 'prod-1', quantity: 2, unitPrice: 1500 },
{ productId: 'prod-2', quantity: 1, unitPrice: 3000 },
],
});
const response = http.post(`${BASE_URL}/api/orders`, payload, {
headers: {
'Content-Type': 'application/json',
'x-api-key': API_KEY,
'x-request-id': randomUUID(), // For trace correlation
},
});
// Verify the response
check(response, {
'status is 201': (r) => r.status === 201,
'has order id': (r) => JSON.parse(r.body).id !== undefined,
});
// Simulate user think time
sleep(1);
}Why
sleep()? In k6, a Virtual User (VU) executes the script in a loop. Withoutsleep, a single VU could generate hundreds of requests per second, accidentally DDoS-ing your local machine.sleepsimulates realistic human behavior (“think time”) between actions, like a user reading a page before clicking. Without it, you’re not testing “10 users”. You’re testing “10 infinite loops hammering your API.”
Run it:
k6 run load-tests/orders-api.jsOutput:
scenarios: (100.00%) 1 scenario, 10 max VUs, 1m0s max duration
exec: default
✓ status is 201
✓ has order id
checks.........................: 100.00% ✓ 284 ✗ 0
http_req_duration..............: avg=127ms min=45ms max=892ms p(95)=312ms
http_req_failed................: 0.00% ✓ 0 ✗ 142
http_reqs......................: 142 4.7/s
vus............................: 10 min=10 max=10
✓ http_req_duration: p(95)<500ms
✓ http_req_failed: rate<0.01Your thresholds passed. But this is just 10 users. What happens at 100? 1000?
Progressive Load Profiles
Section titled “Progressive Load Profiles”Don’t jump straight to stress testing. Use progressive profiles to understand your system:
1. Smoke Test: Does It Work?
Section titled “1. Smoke Test: Does It Work?”// load-tests/smoke.js
export const options = {
vus: 1,
duration: '1m',
thresholds: {
http_req_failed: ['rate<0.01'],
},
};One user, one minute. If this fails, you have a functional bug, not a performance problem.
2. Load Test: Expected Traffic
Section titled “2. Load Test: Expected Traffic”// load-tests/load.js
export const options = {
stages: [
{ duration: '2m', target: 50 }, // Ramp up to 50 users
{ duration: '5m', target: 50 }, // Stay at 50 users
{ duration: '2m', target: 0 }, // Ramp down
],
thresholds: {
http_req_duration: ['p(95)<500', 'p(99)<1000'],
http_req_failed: ['rate<0.01'],
},
};This simulates your expected production traffic. If this fails, you can’t handle normal load.
3. Stress Test: Find the Breaking Point
Section titled “3. Stress Test: Find the Breaking Point”// load-tests/stress.js
export const options = {
stages: [
{ duration: '2m', target: 100 }, // Ramp to 100
{ duration: '5m', target: 100 }, // Hold
{ duration: '2m', target: 200 }, // Push to 200
{ duration: '5m', target: 200 }, // Hold
{ duration: '2m', target: 300 }, // Push to 300
{ duration: '5m', target: 300 }, // Hold - where does it break?
{ duration: '2m', target: 0 }, // Ramp down
],
thresholds: {
http_req_duration: ['p(95)<2000'], // Relaxed threshold
},
};Keep pushing until something breaks. Note what failed first: database? memory? external API?
4. Soak Test: Memory Leaks and Degradation
Section titled “4. Soak Test: Memory Leaks and Degradation”// load-tests/soak.js
export const options = {
stages: [
{ duration: '5m', target: 50 }, // Ramp up
{ duration: '4h', target: 50 }, // Hold for 4 hours
{ duration: '5m', target: 0 }, // Ramp down
],
thresholds: {
http_req_duration: ['p(95)<500'],
},
};Run for hours at moderate load. Watch for:
- Memory usage creeping up
- Response times gradually increasing
- Connection leaks
- File handle exhaustion
Some bugs only appear after hours of operation.
5. Spike Test: Sudden Bursts
Section titled “5. Spike Test: Sudden Bursts”What if 500 users arrive in 10 seconds? A flash sale, a viral tweet, a marketing email blast.
// load-tests/spike.js
export const options = {
stages: [
{ duration: '10s', target: 10 }, // Warm up
{ duration: '1m', target: 10 }, // Baseline
{ duration: '10s', target: 500 }, // SPIKE!
{ duration: '3m', target: 500 }, // Hold the spike
{ duration: '10s', target: 10 }, // Scale back down
{ duration: '3m', target: 10 }, // Recovery period
{ duration: '5s', target: 0 }, // Ramp down
],
thresholds: {
http_req_duration: ['p(95)<3000'], // Allow slower responses during spike
http_req_failed: ['rate<0.05'], // Allow up to 5% errors during spike
},
};Spike tests reveal:
- Does your autoscaler react fast enough?
- Does your load balancer drop connections during the surge?
- Do database connection pools handle the sudden demand?
- Does the system recover after the spike ends?
Load Profiles at a Glance
Section titled “Load Profiles at a Glance”| Profile | Goal | VU Pattern | Success Metric |
|---|---|---|---|
| Smoke | Correctness | Constant (1 VU) | 0% errors |
| Load | Normal capacity | Ramp to target | http_req_duration p95 < 500ms |
| Stress | Find breaking point | Continuous ramp | Identify first failures |
| Soak | Endurance | Steady for hours | Constant memory, no degradation |
| Spike | Burst handling | Sudden jump | Recovery within SLO |
Connecting Load Tests to Traces
Section titled “Connecting Load Tests to Traces”Here’s where OpenTelemetry becomes invaluable. Your traces show where time is spent under load.
Pass Trace Context from k6
Section titled “Pass Trace Context from k6”// load-tests/orders-with-tracing.js
import http from 'k6/http';
import { randomUUID } from 'https://jslib.k6.io/k6-utils/1.4.0/index.js';
export default function () {
const traceId = randomUUID().replace(/-/g, '');
const spanId = randomUUID().replace(/-/g, '').slice(0, 16);
const response = http.post(`${BASE_URL}/api/orders`, payload, {
headers: {
'Content-Type': 'application/json',
'x-api-key': API_KEY,
// W3C Trace Context header
'traceparent': `00-${traceId}-${spanId}-01`,
// Custom header for correlation
'x-load-test-id': __ENV.TEST_RUN_ID || 'local',
},
});
}Now you can find your load test requests in Jaeger/Honeycomb/your tracing backend.
Analyzing Traces Under Load
Section titled “Analyzing Traces Under Load”Run your load test, then query your tracing backend for slow requests:
service.name = "orders-api"
duration > 1s
attributes.x-load-test-id = "stress-test-2024-01-15"You’ll see traces like:
graph TD
A[createOrder<br/>2.3s total] --> B[validateInput<br/>45ms]
B --> C[db.customer.findUnique<br/>1,800ms 🔥 BOTTLENECK<br/>Waiting for connection pool 1,750ms]
C --> D[db.order.create<br/>350ms]
D --> E[paymentProvider.charge<br/>100ms]
E --> F[Root cause: Connection pool size 10 < concurrent requests 100]
style A fill:#475569,stroke:#0f172a,stroke-width:2px,color:#fff
style B fill:#64748b,stroke:#0f172a,stroke-width:2px,color:#fff
style C fill:#1e293b,stroke:#0f172a,stroke-width:2px,color:#fff
style D fill:#64748b,stroke:#0f172a,stroke-width:2px,color:#fff
style E fill:#64748b,stroke:#0f172a,stroke-width:2px,color:#fff
style F fill:#94a3b8,stroke:#0f172a,stroke-width:2px,color:#0f172a
linkStyle 0 stroke:#0f172a,stroke-width:3px
linkStyle 1 stroke:#0f172a,stroke-width:3px
linkStyle 2 stroke:#0f172a,stroke-width:3px
linkStyle 3 stroke:#0f172a,stroke-width:3px
linkStyle 4 stroke:#0f172a,stroke-width:3px
Without load testing, you’d never see that 1,750ms connection pool wait. The trace looked fine under single requests.
Common Bottlenecks Revealed by Load + Traces
Section titled “Common Bottlenecks Revealed by Load + Traces”| Symptom in Traces | Root Cause | Fix |
|---|---|---|
| Long waits before DB query starts | Connection pool exhausted | Increase pool size or reduce query time |
| External API calls taking 10x longer | Rate limiting kicked in | Add caching, request batching |
| Same DB query repeated N times | N+1 query pattern | Use eager loading / joins |
| Memory spans getting longer over time | Memory leak / GC pressure | Profile memory, fix leaks |
| Timeouts only under load | Resource contention | Add connection limits, queuing |
Setting SLOs and Thresholds
Section titled “Setting SLOs and Thresholds”Don’t just measure. Set expectations. k6 thresholds fail your test if SLOs aren’t met:
export const options = {
thresholds: {
// Response time SLOs
http_req_duration: [
'p(50)<200', // Median under 200ms
'p(95)<500', // 95th percentile under 500ms
'p(99)<1000', // 99th percentile under 1s
],
// Availability SLO
http_req_failed: ['rate<0.001'], // 99.9% success rate
// Custom metrics
'order_created': ['count>100'], // At least 100 orders created
// Per-endpoint thresholds
'http_req_duration{endpoint:create_order}': ['p(95)<800'],
'http_req_duration{endpoint:get_order}': ['p(95)<200'],
},
};Now your CI pipeline can fail if performance regresses:
# .github/workflows/performance.yml
name: Performance Tests
on:
push:
branches: [main]
schedule:
- cron: '0 2 * * *' # Nightly
jobs:
load-test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Start services
run: docker-compose up -d
- name: Run load tests
uses: grafana/k6-action@v0.3.1
with:
filename: load-tests/load.js
flags: --out json=results.json
env:
BASE_URL: http://localhost:3000
- name: Upload results
uses: actions/upload-artifact@v4
with:
name: k6-results
path: results.jsonFrom Load Testing to Chaos Engineering
Section titled “From Load Testing to Chaos Engineering”Once you can measure performance, you can break things intentionally. This is chaos engineering: testing resilience by injecting failures.
The Idea
Section titled “The Idea”Your resilience patterns (retries, timeouts, circuit breakers) are only valuable if they work. Chaos engineering proves they do.
graph TD
A[1. Hypothesis<br/>If payment provider is slow, orders still complete within 5s due to timeout + retry logic] --> B[2. Inject Failure<br/>Add 2s latency to payment provider calls]
B --> C[3. Observe<br/>Run load test, check traces, verify SLOs]
C --> D[4. Learn<br/>Did the system behave as expected? If not, fix and repeat]
style A fill:#475569,stroke:#0f172a,stroke-width:2px,color:#fff
style B fill:#64748b,stroke:#0f172a,stroke-width:2px,color:#fff
style C fill:#94a3b8,stroke:#0f172a,stroke-width:2px,color:#0f172a
style D fill:#cbd5e1,stroke:#0f172a,stroke-width:2px,color:#0f172a
linkStyle 0 stroke:#0f172a,stroke-width:3px
linkStyle 1 stroke:#0f172a,stroke-width:3px
linkStyle 2 stroke:#0f172a,stroke-width:3px
Simple Chaos: Latency Injection
Section titled “Simple Chaos: Latency Injection”Start simple. Add latency to dependencies and see what happens:
// src/test-utils/chaos.ts
export function withLatency<T>(
fn: () => Promise<T>,
options: { minMs: number; maxMs: number }
): () => Promise<T> {
return async () => {
const delay = Math.random() * (options.maxMs - options.minMs) + options.minMs;
await new Promise((resolve) => setTimeout(resolve, delay));
return fn();
};
}
export function withFailureRate<T>(
fn: () => Promise<T>,
failureRate: number, // 0.0 to 1.0
error: Error = new Error('Injected failure')
): () => Promise<T> {
return async () => {
if (Math.random() < failureRate) {
throw error;
}
return fn();
};
}Use in integration tests:
// src/orders/create-order.chaos.test.ts
import { describe, it, expect } from 'vitest';
import { withLatency, withFailureRate } from '../test-utils/chaos';
import { createOrder } from './create-order';
describe('createOrder under chaos', () => {
it('completes within SLO when payment provider is slow', async () => {
const slowPaymentProvider = {
charge: withLatency(
() => Promise.resolve({ transactionId: 'tx-123' }),
{ minMs: 1500, maxMs: 2000 } // 1.5-2s latency
),
};
const start = Date.now();
const result = await createOrder(
{ customerId: 'cust-1', items: [{ productId: 'p1', quantity: 1, unitPrice: 1000 }] },
{ db: mockDb, paymentProvider: slowPaymentProvider }
);
const duration = Date.now() - start;
expect(result.ok).toBe(true);
expect(duration).toBeLessThan(5000); // Still under 5s SLO
});
it('handles payment provider failures gracefully', async () => {
const flakyPaymentProvider = {
charge: withFailureRate(
() => Promise.resolve({ transactionId: 'tx-123' }),
0.5 // 50% failure rate
),
};
// With retry logic, this should eventually succeed
const result = await createOrder(
{ customerId: 'cust-1', items: [{ productId: 'p1', quantity: 1, unitPrice: 1000 }] },
{ db: mockDb, paymentProvider: flakyPaymentProvider }
);
// The retry logic from our workflow should handle this
expect(result.ok).toBe(true);
});
});Network-Level Chaos with Toxiproxy
Section titled “Network-Level Chaos with Toxiproxy”For more realistic chaos, use Toxiproxy to inject network-level failures:
# docker-compose.chaos.yml
services:
toxiproxy:
image: ghcr.io/shopify/toxiproxy
ports:
- "8474:8474" # API
- "5433:5433" # Proxied postgres
postgres:
image: postgres:16
# Toxiproxy sits between app and postgres// Configure toxic before load test
import Toxiproxy from 'toxiproxy-node-client';
const toxiproxy = new Toxiproxy('http://localhost:8474');
// Add 500ms latency to database
await toxiproxy.createToxic('postgres', {
name: 'latency',
type: 'latency',
attributes: { latency: 500, jitter: 100 },
});
// Run load test
// Then check: Did connection pool handle the latency?
// Did timeouts fire correctly?
// Did the circuit breaker trip?Chaos Scenarios to Test
Section titled “Chaos Scenarios to Test”| Scenario | What You’re Testing | Inject |
|---|---|---|
| Slow database | Connection pool, timeouts | 500ms+ latency |
| Database down | Circuit breaker, error handling | 100% failure rate |
| Slow external API | Timeout configuration | 2-5s latency |
| External API rate limiting | Retry with backoff | 429 responses |
| Network partition | Graceful degradation | Drop packets |
| High memory pressure | GC behavior, OOM handling | Memory limits |
The Testing Pyramid Complete
Section titled “The Testing Pyramid Complete”With load testing and chaos, your testing pyramid is complete:
graph TD
A[Chaos Tests<br/>Does it survive failures?] --> B[Load Tests<br/>Does it scale?]
B --> C[Integration Tests<br/>Does the stack work?]
C --> D[Unit Tests<br/>Does the logic work?]
style A fill:#475569,stroke:#0f172a,stroke-width:2px,color:#fff
style B fill:#64748b,stroke:#0f172a,stroke-width:2px,color:#fff
style C fill:#94a3b8,stroke:#0f172a,stroke-width:2px,color:#0f172a
style D fill:#cbd5e1,stroke:#0f172a,stroke-width:2px,color:#0f172a
linkStyle 0 stroke:#0f172a,stroke-width:3px
linkStyle 1 stroke:#0f172a,stroke-width:3px
linkStyle 2 stroke:#0f172a,stroke-width:3px
Quick Reference
Section titled “Quick Reference”k6 Commands
Section titled “k6 Commands”# Run load test
k6 run load-tests/load.js
# Run with custom config
k6 run --vus 50 --duration 5m load-tests/orders-api.js
# Run with environment variables
k6 run -e BASE_URL=https://staging.example.com load-tests/orders-api.js
# Output to JSON for analysis
k6 run --out json=results.json load-tests/load.js
# Output to InfluxDB for dashboards
k6 run --out influxdb=http://localhost:8086/k6 load-tests/load.jsTest Types at a Glance
Section titled “Test Types at a Glance”| Test Type | VUs | Duration | Purpose |
|---|---|---|---|
| Smoke | 1 | 1m | Verify functionality |
| Load | 50-100 | 10m | Expected traffic |
| Stress | 100-500+ | 20m | Find breaking point |
| Soak | 50 | 4h+ | Memory leaks, degradation |
| Spike | 10→500→10 | 10m | Sudden traffic bursts |
The Rules
Section titled “The Rules”- Test progressively. Smoke → Load → Stress → Soak.
- Set thresholds. Tests should fail if SLOs aren’t met.
- Connect to traces. Load + OpenTelemetry = finding real bottlenecks.
- Run in CI. Catch performance regressions before production.
- Inject chaos. Prove your resilience patterns actually work.
What’s Next
Section titled “What’s Next”We’ve built a complete architecture:
- Functions with explicit dependencies (testable)
- Validation at the boundary (safe)
- Result types (explicit errors)
- OpenTelemetry (observable)
- Resilience patterns (reliable)
- Configuration at startup (fail-fast)
- TypeScript + ESLint (enforced)
- Load testing + chaos (proven under fire)
Next: What We’ve Built. The complete architecture.