Cluster Explorer: Your Algorithmic Communities

What Are Clusters?

Twitter's algorithm doesn't see you as "interested in AI" or "interested in cooking." Instead, it assigns you to invisible communities called clusters—discovered by analyzing the follow graph of 400+ million users.

The scale: There are approximately 145,000 clusters discovered from the top 20 million most-followed accounts. Each cluster represents a community with similar interests, discovered organically from who follows whom.

Why clusters matter: Your cluster membership determines what appears in your For You feed. If you're 70% assigned to the "AI/ML Research" cluster and 30% to "Cooking," your feed will reflect that split. The algorithm shows you content from producers (accounts) in your clusters.

How Clusters Are Discovered

Clusters aren't manually defined—they emerge from the data using Sparse Binary Matrix Factorization (SBF) with Metropolis-Hastings optimization:

Build similarity graph: Calculate how similar each producer is to every other producer based on who follows them (cosine similarity of follower patterns)
Find communities: Use SBF algorithm with Metropolis-Hastings optimization to group producers who share many followers into clusters, with the constraint that each producer belongs to exactly one cluster (maximally sparse)
Assign producers: Each producer gets assigned to their strongest cluster (called "KnownFor")—this sparsity constraint is what creates clear community separation
Derive user interests: Your cluster membership comes from the clusters of producers you follow and engage with (called "InterestedIn")

The Critical Insight: Your clusters come from ENGAGEMENT (likes, replies, retweets), NOT just follows. Engagement has a 100-day half-life, which means:

Your likes from today have 100% weight
Your likes from 100 days ago still have 50% weight
Your likes from 200 days ago still have 25% weight

This makes your clusters "sticky"—they change slowly over months, not days. Following diverse accounts isn't enough; you must engage with diverse content.

The Shape of Cluster Assignment

For new accounts: Your follows determine your initial clusters. If you follow 5 AI researchers and 2 chefs, you'll start ~70% AI cluster, ~30% cooking cluster.

Over time (weeks to months): Your engagement history dominates. If you like 50 AI tweets and 10 cooking tweets in 100 days, your clusters will drift toward AI regardless of who you follow.

Long-term steady state: Your clusters reflect the last 100-200 days of engagement with exponential decay. Past behavior has momentum—changing clusters requires sustained engagement pattern changes over 3-6 months.

Experience Your Cluster Assignment

Use this calculator to see how the algorithm would categorize you based on your follows and engagement patterns. Notice how engagement weights dominate over follows.

Step 1: Choose a Profile

Start with a preset or build your own custom profile:

Profile Type:

The Technical Details

How Cluster Assignment Actually Works

Your cluster membership (InterestedIn) is calculated through matrix multiplication:

InterestedIn[you] = EngagementGraph[you, producers] × KnownFor[producers, clusters]

Where:
- EngagementGraph: Your follows + engagement history (100-day half-life)
- KnownFor: Each producer's primary cluster assignment
- Result: Your score for each of the ~145,000 clusters
- Final step: L2 normalization (scores sum to 1.0)

Concrete Example

Let's say you follow and engage with these producers over 100 days:

Follows:
- @ylecun (KnownFor: AI cluster)
- @karpathy (KnownFor: AI cluster)
- @gordonramsay (KnownFor: Cooking cluster)

Engagement (likes, weighted):
- AI tweets: 50 engagements
- Cooking tweets: 30 engagements

Matrix multiplication:
AI cluster score = (2 follows × follow_weight) + (50 engagements × engagement_weight)
Cooking cluster score = (1 follow × follow_weight) + (30 engagements × engagement_weight)

If follow_weight = 1.0 and engagement_weight = 5.0 (engagement dominates):

AI: (2 × 1.0) + (50 × 5.0) = 2 + 250 = 252
Cooking: (1 × 1.0) + (30 × 5.0) = 1 + 150 = 151

Normalization (divide by sum to get percentages):
Sum = 252 + 151 = 403

AI: 252 / 403 = 0.625 (62.5%)
Cooking: 151 / 403 = 0.375 (37.5%)

Result: You're assigned 62.5% AI, 37.5% Cooking

Notice: Engagement dominated! Even though you followed 2:1 AI:Cooking,
your 50:30 engagement ratio (1.67:1) created a 62.5:37.5 final ratio (also 1.67:1).
The engagement weight (5.0) completely overwhelmed the follow weight (1.0).

The 100-Day Half-Life Formula

Engagement decay follows exponential decay with 100-day half-life:

weight(t) = initial_weight × (0.5)^(days_ago / 100)

Examples:
- Today (t=0):       weight = 1.0 × (0.5)^(0/100)   = 1.0   (100%)
- 50 days ago:       weight = 1.0 × (0.5)^(50/100)  = 0.707 (70.7%)
- 100 days ago:      weight = 1.0 × (0.5)^(100/100) = 0.5   (50% - HALF-LIFE)
- 200 days ago:      weight = 1.0 × (0.5)^(200/100) = 0.25  (25%)
- 300 days ago:      weight = 1.0 × (0.5)^(300/100) = 0.125 (12.5%)

Why this matters: Your engagement from 6 months ago (180 days) still has 29% weight. Your clusters have momentum—they resist change. Diversifying your feed requires sustained engagement pattern changes over 3-6 months, not just following different accounts.

Update Frequencies

Cluster data updates at different cadences:

KnownFor (producer → cluster mapping): Updated weekly (7 days)
- Computationally expensive (runs SBF with Metropolis-Hastings on 20M producers)
- Changes slowly (follow graph is stable)
InterestedIn (your cluster interests): Updated weekly (7 days)
- Much cheaper (matrix multiplication using existing KnownFor)
- Reflects your recent engagement faster

Implication: Your clusters lag behind your behavior. It takes up to 1 week for new engagement to affect InterestedIn, and up to 1 week for follow graph changes to affect which clusters exist (KnownFor). Both update on the same weekly schedule.

Cluster Discovery Process

The ~145,000 clusters are discovered using Sparse Binary Matrix Factorization (SBF) with Metropolis-Hastings optimization:

Build follow graph: 400M+ users, filter to top 20M most-followed (producers)

Calculate similarity: Cosine similarity between producers based on shared followers

similarity(Producer_A, Producer_B) = (shared_followers) / √(followers_A × followers_B)

Filter weak edges: Remove producer pairs with low similarity (threshold ~0.1-0.2)

Run SBF with Metropolis-Hastings: Iteratively optimize cluster assignments over 4 epochs with sparsity constraint

Constraint: Each producer → exactly ONE cluster (maximally sparse)
Optimization: Metropolis-Hastings sampling to find best assignments
Initialization: Start from previous week's assignments (incremental stability)

Result: ~145,000 clusters, each representing a community with shared interests, with clear separation (no producer overlap)

Why sparsity matters: The "one cluster per producer" constraint is what creates echo chambers by design. Producers can't belong to multiple clusters, which enforces clear community boundaries and limits cross-cluster discovery.

Why 145,000 Clusters?

This number is emergent from the follow graph structure, not directly tuned:

Self-perpetuating: The algorithm reads the max cluster ID from the previous week's data and uses that as the starting point
Not directly configurable: Would require completely rebuilding the clustering from scratch to change
~145,000 reflects natural community structure: When running SBF on 20M producers with current similarity thresholds
Trade-offs: Too few = overly broad ("Tech" too diverse), too many = too granular (sparse data)
Empirically stable: Week-over-week incremental updates maintain approximately this count

Code References

Cluster formation (SBF/Metropolis-Hastings):
UpdateKnownForSBFRunner.scala

KnownFor generation (production job):
UpdateKnownFor20M145K2020.scala

Note on Louvain:
Louvain clustering exists in LouvainClusteringMethod.scala but is used for TWICE (alternative multi-embeddings), NOT for main KnownFor cluster formation

InterestedIn calculation:
InterestedInFromKnownFor.scala:292

100-day half-life decay:
favScoreHalfLife100Days (used throughout SimClusters codebase)

L2 normalization:
SimClustersEmbedding.scala:59-72

Update frequencies (verified from Twitter Engineering Blog):
Twitter's Recommendation Algorithm Blog Post (March 2023)

Key Implications

For users trying to diversify:

Following diverse accounts gives you initial diverse clusters (helpful for new accounts)
But engagement is what matters long-term—you must like/reply to diverse content
Expect 3-6 months to shift cluster balance due to 100-day half-life
You're fighting the gravitational pull effect (multiplicative scoring amplifies dominant clusters)

For creators trying to reach audiences:

Your KnownFor cluster assignment determines who sees your tweets
Clear niche = strong cluster assignment = multiplicative advantage
Multi-topic accounts get weak/diffuse cluster assignment = penalty
Building reach requires building a strong following within ONE cluster first

Why echo chambers emerge:

SBF algorithm enforces sparsity constraint (one cluster per producer = forced separation)
100-day half-life creates momentum (clusters resist change)
Multiplicative scoring amplifies dominant clusters (gravitational pull)
No built-in exploration or cross-cluster recommendation mechanisms
Echo chamber architecture is the optimization objective, not a side effect