x-algo-ml

Explain the Phoenix ML model architecture for X recommendations. Use when users ask about embeddings, transformers, how predictions work, or ML model details.

4stars

1forks

Updated 1/22/2026

Get Skill Source Code

SKILL.md

readonlyread-only

name

x-algo-ml

description

Explain the Phoenix ML model architecture for X recommendations. Use when users ask about embeddings, transformers, how predictions work, or ML model details.

X Algorithm ML Architecture

The X recommendation system uses Phoenix, a transformer-based ML system for predicting user engagement. It operates in two stages: retrieval and ranking.

Two-Stage Pipeline

┌─────────────────────────────────────────────────────────────────────────────────┐
│                           RECOMMENDATION PIPELINE                               │
├─────────────────────────────────────────────────────────────────────────────────┤
│                                                                                 │
│   ┌──────────┐     ┌─────────────────────┐     ┌─────────────────────┐          │
│   │          │     │                     │     │                     │          │
│   │   User   │────▶│   STAGE 1:          │────▶│   STAGE 2:          │────▶ Feed│
│   │ Request  │     │   RETRIEVAL         │     │   RANKING           │          │
│   │          │     │   (Two-Tower)       │     │   (Transformer)     │          │
│   └──────────┘     │                     │     │                     │          │
│                    │   Millions → 1000s  │     │   1000s → Ranked    │          │
│                    └─────────────────────┘     └─────────────────────┘          │
│                                                                                 │
└─────────────────────────────────────────────────────────────────────────────────┘

Stage 1: Retrieval (Two-Tower Model)

Efficiently narrows millions of candidates to thousands using approximate nearest neighbor search.

Architecture

User Tower: Encodes user features + engagement history → normalized embedding [B, D]
Candidate Tower: Pre-computed embeddings for all posts in corpus → [N, D]
Similarity: Dot product between user embedding and candidate embeddings

User Tower          Candidate Tower
    │                     │
    ▼                     ▼
[B, D] user emb    [N, D] all posts
    │                     │
    └───── dot product ───┘
              │
              ▼
         Top-K by similarity

Stage 2: Ranking (Transformer with Candidate Isolation)

Scores the retrieved candidates using a transformer that predicts multiple engagement actions.

Model Configuration

# phoenix/recsys_model.py
@dataclass
class PhoenixModelConfig:
    model: TransformerConfig              # Grok-1 based transformer
    emb_size: int                         # Embedding dimension D
    num_actions: int                      # 18 action types
    history_seq_len: int = 128            # User history length
    candidate_seq_len: int = 32           # Candidates per batch
    product_surface_vocab_size: int = 16  # Where post was seen
    hash_config: HashConfig               # Hash embedding config

Input Structure

class RecsysBatch(NamedTuple):
    # User identification
    user_hashes: ArrayLike               # [B, num_user_hashes]

    # User engagement history
    history_post_hashes: ArrayLike       # [B, S, num_item_hashes]
    history_author_hashes: ArrayLike     # [B, S, num_author_hashes]
    history_actions: ArrayLike           # [B, S, num_actions]
    history_product_surface: ArrayLike   # [B, S]

    # Candidates to score
    candidate_post_hashes: ArrayLike     # [B, C, num_item_hashes]
    candidate_author_hashes: ArrayLike   # [B, C, num_author_hashes]
    candidate_product_surface: ArrayLike # [B, C]

Hash-Based Embeddings

Multiple hash functions map IDs to embedding tables:

@dataclass
class HashConfig:
    num_user_hashes: int = 2      # Hash user ID 2 ways
    num_item_hashes: int = 2      # Hash post ID 2 ways
    num_author_hashes: int = 2    # Hash author ID 2 ways

Why hashes?

Fixed memory: No need for individual embeddings per user/post
Handles new entities: Any ID maps to some embedding
Collision averaging: Multiple hashes reduce collision impact

Embedding Combination

Each entity type has a "reduce" function that combines hash embeddings:

# User: Concatenate hash embeddings → project to D
def block_user_reduce(...):
    # [B, num_user_hashes, D] → [B, 1, num_user_hashes * D] → [B, 1, D]
    user_embedding = user_embeddings.reshape((B, 1, num_user_hashes * D))
    user_embedding = jnp.dot(user_embedding, proj_mat_1)  # Project down
    return user_embedding, user_padding_mask

# History: Combine post + author + actions + product_surface
def block_history_reduce(...):
    # Concatenate all features, project to D
    post_author_embedding = jnp.concatenate([
        history_post_embeddings_reshaped,
        history_author_embeddings_reshaped,
        history_actions_embeddings,
        history_product_surface_embeddings,
    ], axis=-1)
    history_embedding = jnp.dot(post_author_embedding, proj_mat_3)
    return history_embedding, history_padding_mask

Transformer Input

Final input is concatenation of:

[User (1)] + [History (S)] + [Candidates (C)]
     │              │               │
     ▼              ▼               ▼
 [B, 1, D]    [B, S, D]       [B, C, D]
         ╲       │       ╱
          ╲      │      ╱
            [B, 1+S+C, D]

Attention Masking: Candidate Isolation

Critical design: Candidates cannot attend to each other, only to user + history.

                    ATTENTION MASK
         Keys (what we attend TO)
         ─────────────────────────────────────────────▶

         │ User │    History (S)    │   Candidates (C)    │
    ┌────┼──────┼───────────────────┼─────────────────────┤
 Q  │ U  │  ✓   │  ✓   ✓   ✓   ✓   │  ✗   ✗   ✗   ✗      │
 u  ├────┼──────┼───────────────────┼─────────────────────┤
 e  │ H  │  ✓   │  ✓   ✓   ✓   ✓   │  ✗   ✗   ✗   ✗      │
 r  │ i  │  ✓   │  ✓   ✓   ✓   ✓   │  ✗   ✗   ✗   ✗      │
 i  │ s  │  ✓   │  ✓   ✓   ✓   ✓   │  ✗   ✗   ✗   ✗      │
 e  │ t  │  ✓   │  ✓   ✓   ✓   ✓   │  ✗   ✗   ✗   ✗      │
 s  ├────┼──────┼───────────────────┼─────────────────────┤
    │ C  │  ✓   │  ✓   ✓   ✓   ✓   │  ✓   ✗   ✗   ✗      │
 │  │ a  │  ✓   │  ✓   ✓   ✓   ✓   │  ✗   ✓   ✗   ✗      │
 │  │ n  │  ✓   │  ✓   ✓   ✓   ✓   │  ✗   ✗   ✓   ✗      │
 ▼  │ d  │  ✓   │  ✓   ✓   ✓   ✓   │  ✗   ✗   ✗   ✓      │
    └────┴──────┴───────────────────┴─────────────────────┘

    ✓ = Can attend          ✗ = Cannot attend (diagonal only for candidates)

Why candidate isolation?

Score for post A shouldn't depend on whether post B is in the batch
Ensures consistent scoring regardless of batch composition
Enables parallel scoring of candidates

Transformer Forward Pass

def __call__(self, batch, recsys_embeddings) -> RecsysModelOutput:
    # 1. Build combined embeddings
    embeddings, padding_mask, candidate_start = self.build_inputs(batch, recsys_embeddings)

    # 2. Pass through transformer (with candidate isolation mask)
    model_output = self.model(
        embeddings,
        padding_mask,
        candidate_start_offset=candidate_start,  # For attention masking
    )

    # 3. Extract candidate outputs
    out_embeddings = layer_norm(model_output.embeddings)
    candidate_embeddings = out_embeddings[:, candidate_start:, :]

    # 4. Project to action logits
    logits = jnp.dot(candidate_embeddings, unembeddings)
    # Shape: [B, num_candidates, num_actions]

    return RecsysModelOutput(logits=logits)

Output: Multi-Action Prediction

Output Shape: [B, num_candidates, num_actions]
                          │
                          ▼
    ┌─────────────────────────────────────────────┐
    │ Like │ Reply │ Retweet │ Quote │ ... (18)   │
    └─────────────────────────────────────────────┘

Each output is a log-probability. Convert to probability:

probability = exp(log_prob)

Action Embeddings

History actions are encoded as signed vectors:

def _get_action_embeddings(self, actions):
    # actions: [B, S, num_actions] multi-hot vector
    actions_signed = (2 * actions - 1)  # 0→-1, 1→+1
    action_emb = jnp.dot(actions_signed, action_projection)
    return action_emb

This encodes "did action" (+1) vs "didn't do action" (-1) for each action type.

Product Surface Embeddings

Where the user engaged (home feed, search, notifications, etc.):

def _single_hot_to_embeddings(self, input, vocab_size, emb_size, name):
    # Standard embedding lookup table
    embedding_table = hk.get_parameter(name, [vocab_size, emb_size])
    input_one_hot = jax.nn.one_hot(input, vocab_size)
    return jnp.dot(input_one_hot, embedding_table)

Model Heritage

The sample transformer implementation is ported from the Grok-1 open source release by xAI. The core transformer architecture comes from Grok-1, adapted for recommendation system use cases with custom input embeddings and attention masking for candidate isolation.

Related Skills

/x-algo-engagement - The 18 action types the model predicts
/x-algo-scoring - How predictions become weighted scores
/x-algo-pipeline - Where ML fits in the full system

Related Skills

summarize

179Kresearch

Summarize or extract text/transcripts from URLs, podcasts, and local files (great fallback for “transcribe this YouTube/video”).

openclaw

Get

prompt-lookup

143Kresearch

Activates when the user asks about AI prompts, needs prompt templates, wants to search for prompts, or mentions prompts.chat. Use for discovering, retrieving, and improving prompts.

Get

Activates when the user asks about Agent Skills, wants to find reusable AI capabilities, needs to install skills, or mentions skills for Claude. Use for discovering, retrieving, and installing skills.

Get

sherpa-onnx-tts

88Kresearch

Local text-to-speech via sherpa-onnx (offline, no cloud)

moltbot

Get

openai-whisper

87Kresearch

Local speech-to-text with the Whisper CLI (no API key).

moltbot

Get

seo-review

66Kresearch

Perform a focused SEO audit on JavaScript concept pages to maximize search visibility, featured snippet optimization, and ranking potential

leonardomso

Get

x-algo-ml

X Algorithm ML Architecture

Two-Stage Pipeline

Stage 1: Retrieval (Two-Tower Model)

Architecture

Stage 2: Ranking (Transformer with Candidate Isolation)

Model Configuration

Input Structure

Hash-Based Embeddings

Embedding Combination

Transformer Input

Attention Masking: Candidate Isolation

Transformer Forward Pass

Output: Multi-Action Prediction

Action Embeddings

Product Surface Embeddings

Model Heritage

Related Skills

You Might Also Like

Related Skills

summarize

prompt-lookup

skill-lookup

sherpa-onnx-tts

openai-whisper

seo-review