Fuzzy Matching

Fuzzy matching finds approximate string matches, handling typos, misspellings, and word-order variations.

Fuzzy matching is built into the native engine and always available – no extra dependency to install. The algorithms are a Rust reimplementation of rapidfuzz’s partial_ratio and token_sort_ratio.

FuzzyMode

FuzzyMode controls the fuzzy matching algorithm:

from pypaginate import FuzzyMode

FuzzyMode.EXACT       # no fuzzy matching (default)
FuzzyMode.FUZZY       # partial_ratio -- good for substring typos
FuzzyMode.TOKEN_SORT  # token_sort_ratio -- word-order agnostic

Basic Usage

FuzzyMode.FUZZY

Uses partial_ratio for substring-aware fuzzy matching:

from pypaginate import SearchSpec, FuzzyMode
from pypaginate.search.engine import SearchEngine

engine = SearchEngine()

users = [
    {"name": "Alice Smith"},
    {"name": "Alicia Jones"},
    {"name": "Bob Wilson"},
]

spec = SearchSpec(
    query="alice",
    fields=("name",),
    fuzzy=FuzzyMode.FUZZY,
    threshold=75,
)
results = engine.apply(users, spec)
# [Alice Smith, Alicia Jones] -- "Alicia" fuzzy-matches "alice"

FuzzyMode.TOKEN_SORT

Uses token_sort_ratio for word-order-agnostic matching:

spec = SearchSpec(
    query="smith alice",
    fields=("name",),
    fuzzy=FuzzyMode.TOKEN_SORT,
    threshold=75,
)
results = engine.apply(users, spec)
# [Alice Smith] -- word order doesn't matter

TOKEN_SORT treats the entire query as a single unit (no tokenization), normalizes it, and compares against each field value using token-sorted ratio.

Threshold

The threshold parameter (0-100) controls how strict fuzzy matching is:

from pypaginate import SearchSpec, FuzzyMode

# Strict: only very close matches
SearchSpec(query="alice", fields=("name",), fuzzy=FuzzyMode.FUZZY, threshold=90)

# Moderate (default): catches common typos
SearchSpec(query="alice", fields=("name",), fuzzy=FuzzyMode.FUZZY, threshold=75)

# Lenient: more false positives, fewer misses
SearchSpec(query="alice", fields=("name",), fuzzy=FuzzyMode.FUZZY, threshold=60)

Threshold

Behavior

Use Case

90-100

Very strict

Exact-ish matching

75-89

Moderate

General search (recommended)

60-74

Lenient

Autocomplete, “did you mean”

< 60

Very lenient

Broad discovery

Scoring and Ranking

Results are ranked by fuzzy score (highest first):

from pypaginate import SearchSpec, FuzzyMode
from pypaginate.search.engine import SearchEngine

engine = SearchEngine()

users = [
    {"name": "Alice"},       # score ~100 (exact)
    {"name": "Alicia"},      # score ~83 (close)
    {"name": "Alexandra"},   # score ~60 (distant)
    {"name": "Bob"},         # score 0 (no match, filtered out)
]

spec = SearchSpec(
    query="alice",
    fields=("name",),
    fuzzy=FuzzyMode.FUZZY,
    threshold=55,
)
results = engine.apply(users, spec)
# [Alice, Alicia, Alexandra] -- ordered by descending score

How the Algorithms Work

partial_ratio (FuzzyMode.FUZZY)

Compares the shorter string as a sliding window against the longer string. Good for matching substrings with typos:

"alice" vs "Alice Smith"  -> high score (substring match)
"alce"  vs "Alice Smith"  -> moderate score (typo)
"alice" vs "Bob"          -> low score (no similarity)

token_sort_ratio (FuzzyMode.TOKEN_SORT)

Sorts the tokens alphabetically before comparing, making word order irrelevant:

"smith alice"  vs "Alice Smith"  -> high score (same words)
"alice s"      vs "Alice Smith"  -> moderate score (partial)

Pipeline Integration

from pypaginate import SearchSpec, FuzzyMode, OffsetParams
from pypaginate.adapters.memory import MemoryBackend, MemorySearchBackend
from pypaginate.engine.paginator import Paginator
from pypaginate.engine.pipeline import SyncPipeline

pipeline = SyncPipeline(
    Paginator(MemoryBackend()),
    search_backend=MemorySearchBackend(),
)

page = pipeline.execute(
    users,
    OffsetParams(page=1, limit=20),
    search=SearchSpec(
        query="jhon",
        fields=("name", "email"),
        fuzzy=FuzzyMode.FUZZY,
        threshold=70,
    ),
)

Real-World Examples

User Search with Typo Tolerance

spec = SearchSpec(
    query="jhon smth",  # typos in both words
    fields=("name",),
    fuzzy=FuzzyMode.FUZZY,
    threshold=70,
)
# Finds "John Smith"

Name Search (Order-Agnostic)

spec = SearchSpec(
    query="doe jane",
    fields=("full_name",),
    fuzzy=FuzzyMode.TOKEN_SORT,
    threshold=80,
)
# Finds "Jane Doe" -- word order doesn't matter

Performance Tips

  1. Filter first – reduce the dataset before fuzzy search

  2. Set max_results – stop ranking after enough matches

  3. Raise threshold – higher threshold means fewer comparisons pass

  4. Limit fields – only search relevant fields

  5. Use SQLAlchemy for large datasets – fuzzy search is CPU-intensive in memory

# Efficient: filter, then fuzzy search a smaller set
from pypaginate import FilterSpec

filtered = filter_backend.apply_filters(users, [
    FilterSpec(field="status", value="active"),
])
results = engine.apply(filtered, fuzzy_spec)

Next Steps