Fuzzy Matching
Fuzzy matching finds approximate string matches, handling typos, misspellings, and word-order variations.
Fuzzy matching is built into the native engine and always available – no extra dependency
to install. The algorithms are a Rust reimplementation of rapidfuzz’s partial_ratio and
token_sort_ratio.
FuzzyMode
FuzzyMode controls the fuzzy matching algorithm:
from pypaginate import FuzzyMode
FuzzyMode.EXACT # no fuzzy matching (default)
FuzzyMode.FUZZY # partial_ratio -- good for substring typos
FuzzyMode.TOKEN_SORT # token_sort_ratio -- word-order agnostic
Basic Usage
FuzzyMode.FUZZY
Uses partial_ratio for substring-aware fuzzy matching:
from pypaginate import SearchSpec, FuzzyMode
from pypaginate.search.engine import SearchEngine
engine = SearchEngine()
users = [
{"name": "Alice Smith"},
{"name": "Alicia Jones"},
{"name": "Bob Wilson"},
]
spec = SearchSpec(
query="alice",
fields=("name",),
fuzzy=FuzzyMode.FUZZY,
threshold=75,
)
results = engine.apply(users, spec)
# [Alice Smith, Alicia Jones] -- "Alicia" fuzzy-matches "alice"
FuzzyMode.TOKEN_SORT
Uses token_sort_ratio for word-order-agnostic matching:
spec = SearchSpec(
query="smith alice",
fields=("name",),
fuzzy=FuzzyMode.TOKEN_SORT,
threshold=75,
)
results = engine.apply(users, spec)
# [Alice Smith] -- word order doesn't matter
TOKEN_SORT treats the entire query as a single unit (no tokenization), normalizes it, and compares against each field value using token-sorted ratio.
Threshold
The threshold parameter (0-100) controls how strict fuzzy matching is:
from pypaginate import SearchSpec, FuzzyMode
# Strict: only very close matches
SearchSpec(query="alice", fields=("name",), fuzzy=FuzzyMode.FUZZY, threshold=90)
# Moderate (default): catches common typos
SearchSpec(query="alice", fields=("name",), fuzzy=FuzzyMode.FUZZY, threshold=75)
# Lenient: more false positives, fewer misses
SearchSpec(query="alice", fields=("name",), fuzzy=FuzzyMode.FUZZY, threshold=60)
Threshold |
Behavior |
Use Case |
|---|---|---|
90-100 |
Very strict |
Exact-ish matching |
75-89 |
Moderate |
General search (recommended) |
60-74 |
Lenient |
Autocomplete, “did you mean” |
< 60 |
Very lenient |
Broad discovery |
Scoring and Ranking
Results are ranked by fuzzy score (highest first):
from pypaginate import SearchSpec, FuzzyMode
from pypaginate.search.engine import SearchEngine
engine = SearchEngine()
users = [
{"name": "Alice"}, # score ~100 (exact)
{"name": "Alicia"}, # score ~83 (close)
{"name": "Alexandra"}, # score ~60 (distant)
{"name": "Bob"}, # score 0 (no match, filtered out)
]
spec = SearchSpec(
query="alice",
fields=("name",),
fuzzy=FuzzyMode.FUZZY,
threshold=55,
)
results = engine.apply(users, spec)
# [Alice, Alicia, Alexandra] -- ordered by descending score
Weighted Fuzzy Search
Combine fuzzy matching with field weights for relevance-tuned results:
from pypaginate import SearchSpec, FuzzyMode
spec = SearchSpec(
query="jhon", # typo for "john"
fields=("name", "email", "bio"),
weights={"name": 3.0, "email": 2.0, "bio": 1.0},
fuzzy=FuzzyMode.FUZZY,
threshold=70,
)
# Name matches rank 3x higher than bio matches
How the Algorithms Work
partial_ratio (FuzzyMode.FUZZY)
Compares the shorter string as a sliding window against the longer string. Good for matching substrings with typos:
"alice" vs "Alice Smith" -> high score (substring match)
"alce" vs "Alice Smith" -> moderate score (typo)
"alice" vs "Bob" -> low score (no similarity)
token_sort_ratio (FuzzyMode.TOKEN_SORT)
Sorts the tokens alphabetically before comparing, making word order irrelevant:
"smith alice" vs "Alice Smith" -> high score (same words)
"alice s" vs "Alice Smith" -> moderate score (partial)
Multi-Field Fuzzy Search
When searching multiple fields with fuzzy mode, the engine picks the best matching field per token:
from pypaginate import SearchSpec, FuzzyMode
spec = SearchSpec(
query="jhon",
fields=("name", "email"),
fuzzy=FuzzyMode.FUZZY,
threshold=70,
)
# For each item, checks both name and email
# Uses the highest-scoring field match for ranking
Pipeline Integration
from pypaginate import SearchSpec, FuzzyMode, OffsetParams
from pypaginate.adapters.memory import MemoryBackend, MemorySearchBackend
from pypaginate.engine.paginator import Paginator
from pypaginate.engine.pipeline import SyncPipeline
pipeline = SyncPipeline(
Paginator(MemoryBackend()),
search_backend=MemorySearchBackend(),
)
page = pipeline.execute(
users,
OffsetParams(page=1, limit=20),
search=SearchSpec(
query="jhon",
fields=("name", "email"),
fuzzy=FuzzyMode.FUZZY,
threshold=70,
),
)
Real-World Examples
User Search with Typo Tolerance
spec = SearchSpec(
query="jhon smth", # typos in both words
fields=("name",),
fuzzy=FuzzyMode.FUZZY,
threshold=70,
)
# Finds "John Smith"
Product Search
spec = SearchSpec(
query="samung galxy", # misspelled brand and product
fields=("title", "brand"),
weights={"title": 1.0, "brand": 2.0},
fuzzy=FuzzyMode.FUZZY,
threshold=65,
)
# Finds "Samsung Galaxy"
Name Search (Order-Agnostic)
spec = SearchSpec(
query="doe jane",
fields=("full_name",),
fuzzy=FuzzyMode.TOKEN_SORT,
threshold=80,
)
# Finds "Jane Doe" -- word order doesn't matter
Performance Tips
Filter first – reduce the dataset before fuzzy search
Set
max_results– stop ranking after enough matchesRaise threshold – higher threshold means fewer comparisons pass
Limit fields – only search relevant fields
Use SQLAlchemy for large datasets – fuzzy search is CPU-intensive in memory
# Efficient: filter, then fuzzy search a smaller set
from pypaginate import FilterSpec
filtered = filter_backend.apply_filters(users, [
FilterSpec(field="status", value="active"),
])
results = engine.apply(filtered, fuzzy_spec)
Next Steps
Text Search – Exact text search modes
Filtering – Combine with declarative filters