Performance Benchmarks
pypaginate tracks performance automatically on every commit and pull request.
Live Dashboard
View the interactive benchmark charts:
The dashboard shows historical performance trends across all benchmark categories,
updated automatically when changes merge to main.
What We Measure
Category |
File |
Description |
|---|---|---|
Pagination |
|
Core offset/cursor pagination throughput |
Filtering |
|
Filter engine across all operators |
Sorting |
|
Sort engine with various dataset sizes |
Search |
|
Text search and fuzzy matching |
Pipeline |
|
End-to-end pipeline composition overhead |
Scaling |
|
1K to 1M items scaling behavior |
FastAPI |
|
HTTP endpoint response overhead |
Serialization |
|
Page model serialization speed |
Overhead |
|
Full ops to paginate to serialize path |
Boundaries |
|
Edge case performance |
Comparison |
|
pypaginate vs raw Python |
Competitors |
|
vs other pagination libraries |
Running Locally
# Run all benchmarks
uv run pytest tests/perf --benchmark-enable -v
# Run specific category
uv run pytest tests/perf/test_pagination.py --benchmark-enable -v
# Save results for comparison
uv run pytest tests/perf --benchmark-enable --benchmark-autosave
# Compare against a saved baseline
uv run pytest tests/perf --benchmark-enable --benchmark-compare=0001
# Generate JSON output
uv run pytest tests/perf --benchmark-enable --benchmark-json=results.json
CI Pipeline
The full CI pipeline runs 40+ concurrent jobs across 4 Python versions and 3 operating systems:
graph TD
S[Setup] --> Q[Quality<br>ruff + mypy]
S --> SEC[Security<br>bandit + pip-audit]
S --> CQL[CodeQL]
Q --> ARCH[Architecture<br>72 subtests]
Q --> U1[Unit 3.11<br>Linux / macOS / Win]
Q --> U2[Unit 3.12<br>Linux / macOS / Win]
Q --> U3[Unit 3.13<br>Linux / macOS / Win]
Q --> U4[Unit 3.14<br>Linux / macOS / Win]
U1 & U2 & U3 & U4 --> I[Integration<br>4 Py × 3 OS = 12 jobs]
U1 & U2 & U3 & U4 --> E2E[E2E Tests<br>6 FastAPI flows]
U1 & U2 & U3 & U4 --> PG[PostgreSQL<br>real Postgres 16]
U1 & U2 & U3 & U4 --> PROP[Property<br>Hypothesis]
U1 & U2 & U3 & U4 --> BENCH[Benchmarks<br>293 data points]
U1 & U2 & U3 & U4 --> BUILD[Build<br>hatchling + twine]
style S fill:#1f6feb,color:#fff
style Q fill:#238636,color:#fff
style ARCH fill:#238636,color:#fff
style BENCH fill:#d29922,color:#fff
style PG fill:#8957e5,color:#fff
Test Suite |
Jobs |
Coverage |
|---|---|---|
Unit |
12 (4 Python × 3 OS) |
All modules, parallel execution |
Integration |
12 (4 Python × 3 OS) |
Cross-module with real SQLite |
E2E |
1 |
Full FastAPI user journeys |
PostgreSQL |
1 |
Real Postgres 16 via service container |
Property |
1 |
Hypothesis invariant checking |
Architecture |
1 |
File limits, imports, protocols |
Benchmarks |
1 |
293 perf benchmarks, PR regression alerts |
Total |
29+ |
872+ tests, 85% coverage gate |
Benchmarks run automatically:
On
main: full benchmark suite, results stored for historical trackingOn pull requests: full suite with comparison against
mainbaseline
When a PR introduces a performance regression exceeding 20%, the CI flags it with a comment on the pull request showing the before/after comparison.
Benchmark Datasets
Tests use pre-generated datasets at various scales:
Dataset |
Size |
Purpose |
|---|---|---|
|
1,000 items |
Fast iteration, basic correctness |
|
10,000 items |
Standard workload |
|
100,000 items |
Medium scale |
|
500,000 items |
Large scale |
|
1,000,000 items |
Stress testing |
Each dataset contains user-like dictionaries with name, email, age, status, and timestamp fields.
Interpreting Results
Median is the primary metric (more stable than mean)
IQR (interquartile range) shows result stability
Rounds indicates how many iterations were run
Compare results on the same machine/environment for accuracy
CI comparisons account for runner variability with a 20% threshold