Performance Benchmarks

pypaginate tracks performance automatically on every commit and pull request.

Live Dashboard

View the interactive benchmark charts:

The dashboard shows historical performance trends across all benchmark categories, updated automatically when changes merge to main.

What We Measure

Category	File	Description
Pagination	`test_pagination.py`	Core offset/cursor pagination throughput
Filtering	`test_filtering.py`	Filter engine across all operators
Sorting	`test_sorting.py`	Sort engine with various dataset sizes
Search	`test_search.py`	Text search and fuzzy matching
Pipeline	`test_pipeline.py`	End-to-end pipeline composition overhead
Scaling	`test_scaling.py`	1K to 1M items scaling behavior
FastAPI	`test_fastapi_perf.py`	HTTP endpoint response overhead
Serialization	`test_serialization.py`	Page model serialization speed
Overhead	`test_overhead.py`	Full ops to paginate to serialize path
Boundaries	`test_boundary.py`	Edge case performance
Comparison	`test_comparison.py`	pypaginate vs raw Python
Competitors	`test_competitors.py`	vs other pagination libraries

Running Locally

# Run all benchmarks
uv run pytest tests/perf --benchmark-enable -v

# Run specific category
uv run pytest tests/perf/test_pagination.py --benchmark-enable -v

# Save results for comparison
uv run pytest tests/perf --benchmark-enable --benchmark-autosave

# Compare against a saved baseline
uv run pytest tests/perf --benchmark-enable --benchmark-compare=0001

# Generate JSON output
uv run pytest tests/perf --benchmark-enable --benchmark-json=results.json

CI Pipeline

The full CI pipeline runs 40+ concurrent jobs across 4 Python versions and 3 operating systems:

        graph TD
    S[Setup] --> Q[Quality<br>ruff + mypy]
    S --> SEC[Security<br>bandit + pip-audit]
    S --> CQL[CodeQL]

    Q --> ARCH[Architecture<br>72 subtests]
    Q --> U1[Unit 3.11<br>Linux / macOS / Win]
    Q --> U2[Unit 3.12<br>Linux / macOS / Win]
    Q --> U3[Unit 3.13<br>Linux / macOS / Win]
    Q --> U4[Unit 3.14<br>Linux / macOS / Win]

    U1 & U2 & U3 & U4 --> I[Integration<br>4 Py × 3 OS = 12 jobs]
    U1 & U2 & U3 & U4 --> E2E[E2E Tests<br>6 FastAPI flows]
    U1 & U2 & U3 & U4 --> PG[PostgreSQL<br>real Postgres 16]
    U1 & U2 & U3 & U4 --> PROP[Property<br>Hypothesis]
    U1 & U2 & U3 & U4 --> BENCH[Benchmarks<br>293 data points]
    U1 & U2 & U3 & U4 --> BUILD[Build<br>hatchling + twine]

    style S fill:#1f6feb,color:#fff
    style Q fill:#238636,color:#fff
    style ARCH fill:#238636,color:#fff
    style BENCH fill:#d29922,color:#fff
    style PG fill:#8957e5,color:#fff

Test Suite	Jobs	Coverage
Unit	12 (4 Python × 3 OS)	All modules, parallel execution
Integration	12 (4 Python × 3 OS)	Cross-module with real SQLite
E2E	1	Full FastAPI user journeys
PostgreSQL	1	Real Postgres 16 via service container
Property	1	Hypothesis invariant checking
Architecture	1	File limits, imports, protocols
Benchmarks	1	293 perf benchmarks, PR regression alerts
Total	29+	872+ tests, 85% coverage gate

Benchmarks run automatically:

On main: full benchmark suite, results stored for historical tracking
On pull requests: full suite with comparison against main baseline

When a PR introduces a performance regression exceeding 20%, the CI flags it with a comment on the pull request showing the before/after comparison.

Benchmark Datasets

Tests use pre-generated datasets at various scales:

Dataset	Size	Purpose
`dataset_1k`	1,000 items	Fast iteration, basic correctness
`dataset_10k`	10,000 items	Standard workload
`dataset_100k`	100,000 items	Medium scale
`dataset_500k`	500,000 items	Large scale
`dataset_1m`	1,000,000 items	Stress testing

Each dataset contains user-like dictionaries with name, email, age, status, and timestamp fields.

Interpreting Results

Median is the primary metric (more stable than mean)
IQR (interquartile range) shows result stability
Rounds indicates how many iterations were run
Compare results on the same machine/environment for accuracy
CI comparisons account for runner variability with a 20% threshold