Cluster Explorer: Your Algorithmic Communities

What Are Clusters?

Twitter's algorithm doesn't see you as "interested in AI" or "interested in cooking." Instead, it assigns you to invisible communities called clusters—discovered by analyzing the follow graph of 400+ million users.

The scale: There are approximately 145,000 clusters discovered from the top 20 million most-followed accounts. Each cluster represents a community with similar interests, discovered organically from who follows whom.

Why clusters matter: Your cluster membership determines what appears in your For You feed. If you're 70% assigned to the "AI/ML Research" cluster and 30% to "Cooking," your feed will reflect that split. The algorithm shows you content from producers (accounts) in your clusters.

How Clusters Are Discovered

Clusters aren't manually defined—they emerge from the data using Sparse Binary Matrix Factorization (SBF) with Metropolis-Hastings optimization:

  1. Build similarity graph: Calculate how similar each producer is to every other producer based on who follows them (cosine similarity of follower patterns)
  2. Find communities: Use SBF algorithm with Metropolis-Hastings optimization to group producers who share many followers into clusters, with the constraint that each producer belongs to exactly one cluster (maximally sparse)
  3. Assign producers: Each producer gets assigned to their strongest cluster (called "KnownFor")—this sparsity constraint is what creates clear community separation
  4. Derive user interests: Your cluster membership comes from the clusters of producers you follow and engage with (called "InterestedIn")

The Critical Insight: Your clusters come from ENGAGEMENT (likes, replies, retweets), NOT just follows. Engagement has a 100-day half-life, which means:

This makes your clusters "sticky"—they change slowly over months, not days. Following diverse accounts isn't enough; you must engage with diverse content.

The Shape of Cluster Assignment

For new accounts: Your follows determine your initial clusters. If you follow 5 AI researchers and 2 chefs, you'll start ~70% AI cluster, ~30% cooking cluster.

Over time (weeks to months): Your engagement history dominates. If you like 50 AI tweets and 10 cooking tweets in 100 days, your clusters will drift toward AI regardless of who you follow.

Long-term steady state: Your clusters reflect the last 100-200 days of engagement with exponential decay. Past behavior has momentum—changing clusters requires sustained engagement pattern changes over 3-6 months.


Experience Your Cluster Assignment

Use this calculator to see how the algorithm would categorize you based on your follows and engagement patterns. Notice how engagement weights dominate over follows.

Step 1: Choose a Profile

Start with a preset or build your own custom profile:

Step 2: Select Accounts You Follow

These provide your initial cluster assignment. Engagement will shift this over time.

Step 3: Add Your Engagement History

This is where cluster assignment actually happens. Engagement with 100-day half-life dominates over follows.

Likes, replies, retweets (weighted by engagement type)


The Technical Details

How Cluster Assignment Actually Works

Your cluster membership (InterestedIn) is calculated through matrix multiplication:

InterestedIn[you] = EngagementGraph[you, producers] × KnownFor[producers, clusters]

Where:
- EngagementGraph: Your follows + engagement history (100-day half-life)
- KnownFor: Each producer's primary cluster assignment
- Result: Your score for each of the ~145,000 clusters
- Final step: L2 normalization (scores sum to 1.0)

Concrete Example

Let's say you follow and engage with these producers over 100 days:

Follows:
- @ylecun (KnownFor: AI cluster)
- @karpathy (KnownFor: AI cluster)
- @gordonramsay (KnownFor: Cooking cluster)

Engagement (likes, weighted):
- AI tweets: 50 engagements
- Cooking tweets: 30 engagements

Matrix multiplication:
AI cluster score = (2 follows × follow_weight) + (50 engagements × engagement_weight)
Cooking cluster score = (1 follow × follow_weight) + (30 engagements × engagement_weight)

If follow_weight = 1.0 and engagement_weight = 5.0 (engagement dominates):

AI: (2 × 1.0) + (50 × 5.0) = 2 + 250 = 252
Cooking: (1 × 1.0) + (30 × 5.0) = 1 + 150 = 151

Normalization (divide by sum to get percentages):
Sum = 252 + 151 = 403

AI: 252 / 403 = 0.625 (62.5%)
Cooking: 151 / 403 = 0.375 (37.5%)

Result: You're assigned 62.5% AI, 37.5% Cooking

Notice: Engagement dominated! Even though you followed 2:1 AI:Cooking,
your 50:30 engagement ratio (1.67:1) created a 62.5:37.5 final ratio (also 1.67:1).
The engagement weight (5.0) completely overwhelmed the follow weight (1.0).

The 100-Day Half-Life Formula

Engagement decay follows exponential decay with 100-day half-life:

weight(t) = initial_weight × (0.5)^(days_ago / 100)

Examples:
- Today (t=0):       weight = 1.0 × (0.5)^(0/100)   = 1.0   (100%)
- 50 days ago:       weight = 1.0 × (0.5)^(50/100)  = 0.707 (70.7%)
- 100 days ago:      weight = 1.0 × (0.5)^(100/100) = 0.5   (50% - HALF-LIFE)
- 200 days ago:      weight = 1.0 × (0.5)^(200/100) = 0.25  (25%)
- 300 days ago:      weight = 1.0 × (0.5)^(300/100) = 0.125 (12.5%)

Why this matters: Your engagement from 6 months ago (180 days) still has 29% weight. Your clusters have momentum—they resist change. Diversifying your feed requires sustained engagement pattern changes over 3-6 months, not just following different accounts.

Update Frequencies

Cluster data updates at different cadences:

Implication: Your clusters lag behind your behavior. It takes up to 1 week for new engagement to affect InterestedIn, and up to 1 week for follow graph changes to affect which clusters exist (KnownFor). Both update on the same weekly schedule.

Cluster Discovery Process

The ~145,000 clusters are discovered using Sparse Binary Matrix Factorization (SBF) with Metropolis-Hastings optimization:

  1. Build follow graph: 400M+ users, filter to top 20M most-followed (producers)
  2. Calculate similarity: Cosine similarity between producers based on shared followers
    similarity(Producer_A, Producer_B) = (shared_followers) / √(followers_A × followers_B)
  3. Filter weak edges: Remove producer pairs with low similarity (threshold ~0.1-0.2)
  4. Run SBF with Metropolis-Hastings: Iteratively optimize cluster assignments over 4 epochs with sparsity constraint
    Constraint: Each producer → exactly ONE cluster (maximally sparse)
    Optimization: Metropolis-Hastings sampling to find best assignments
    Initialization: Start from previous week's assignments (incremental stability)
  5. Result: ~145,000 clusters, each representing a community with shared interests, with clear separation (no producer overlap)

Why sparsity matters: The "one cluster per producer" constraint is what creates echo chambers by design. Producers can't belong to multiple clusters, which enforces clear community boundaries and limits cross-cluster discovery.

Why 145,000 Clusters?

This number is emergent from the follow graph structure, not directly tuned:

Code References

Cluster formation (SBF/Metropolis-Hastings):
UpdateKnownForSBFRunner.scala

KnownFor generation (production job):
UpdateKnownFor20M145K2020.scala

Note on Louvain:
Louvain clustering exists in LouvainClusteringMethod.scala but is used for TWICE (alternative multi-embeddings), NOT for main KnownFor cluster formation

InterestedIn calculation:
InterestedInFromKnownFor.scala:292

100-day half-life decay:
favScoreHalfLife100Days (used throughout SimClusters codebase)

L2 normalization:
SimClustersEmbedding.scala:59-72

Update frequencies (verified from Twitter Engineering Blog):
Twitter's Recommendation Algorithm Blog Post (March 2023)

Key Implications

For users trying to diversify:

For creators trying to reach audiences:

Why echo chambers emerge: