Discover structure in
your embeddings

Turn any collection of embeddings into a map — find what's common, what connects, and what's unique.

Get Started View on GitHub

◉

Dense

Core items in well-populated semantic regions. The backbone of your corpus.

↔

Bridge

Transitional items connecting different clusters. Critical for semantic navigation.

○

Orphan

Unique items with no semantic neighbors. Anomalies worth investigating.

0 config

No k to pick, no hyperparams to tune

3 roles

Dense, Bridge, Orphan — every item classified

Instant open

Memory-mapped indexes, zero startup cost

Quick Start

Three lines to structure

Fit a classifier, inspect the report, retrieve items by category.

example.py

from dyf import DensityClassifierFull

classifier = DensityClassifierFull.from_texts(
    texts=documents,       # any list of strings
    categories=categories, # optional grouping
)

# What did we find?
print(classifier.report())
# Dense:  8,420 (84.2%) — core topics
# Bridge:   980 (9.8%)  — cross-topic items
# Orphan:   600 (6.0%)  — unique outliers

How it works

DYF uses a two-stage PCA-based LSH to classify every item in your corpus. Bucket sizes reveal density, centroid similarity reveals centrality, and stability under perturbation reveals bridges.

Learn the algorithm →

Use Cases

Built for real workflows

Semantic Navigation

Trace paths between concepts through bridge items that connect distant clusters.

Structure Discovery

See how your data organizes itself — no k to pick, no taxonomy to impose.

Anomaly Detection

Surface orphans that don't fit anywhere and bridges that span multiple domains.

Index Building

Build .dyf files for instant-open, memory-mapped approximate search.

Installation

Get started in seconds

terminal

# Core
pip install dyf

# With serialization
pip install dyf[io]

# Full features (embeddings, LLM labeling)
pip install dyf[full]

Discover structure inyour embeddings