Curious Now

Story

OCP: Orthogonal Constrained Projection for Sparse Scaling in Industrial Commodity Recommendation

ComputingMaterials & Engineering

Key takeaway

A new method called OCP improves how recommendation systems for industrial commodities represent items, making the systems more scalable and able to generalize better.

Read the paper

Quick Explainer

The paper introduces Orthogonal Constrained Projection (OCP), a technique to address "embedding collapse" in large-scale recommendation systems. OCP works by constraining the projection matrix that maps item embeddings to the interaction space, preserving the singular value spectrum. This ensures gradients for infrequent items are updated along diverse directions, effectively "stretching" the representation space and mitigating the degradation of embedding quality as vocabulary scales to billions of items. The key innovation is the use of an orthogonality constraint to maintain the breadth of the learned embedding manifold, complementing increases in model density to enable high-performance recommendations at massive scale.

Deep Dive

Technical Deep Dive: OCP: Orthogonal Constrained Projection for Sparse Scaling in Industrial Commodity Recommendation

Overview

The paper introduces Orthogonal Constrained Projection (OCP), a novel method for optimizing item embeddings in industrial recommendation systems. OCP aims to address the challenge of "embedding collapse" that arises when scaling vocabulary size from millions to billions of unique items.

Problem & Context

  • Modern industrial recommendation systems rely on high-quality representation of massive and heterogeneous item features, including categorical IDs, Item-IDs, and semantic IDs (SIDs).
  • To capture subtle preference signals, models require fine-grained Item-ID features. However, scaling Item-ID vocabularies to billions of entries leads to a fundamental optimization challenge:
    • User-item interactions are highly skewed, so frequent items receive dense updates while most long-tail Item-IDs are rarely updated.
    • Over time, this causes the effective rank of the learned embedding space to degrade, a phenomenon known as "embedding collapse".

Methodology

Embedding Collapse Analysis

  • The authors characterize embedding collapse using Singular Entropy (SE) as a metric to quantify representation isotropy.
  • They show that the absence of directional constraints on gradient flow triggers dimensional redundancy in the learned embedding space.

Orthogonal Constrained Projection (OCP)

  • OCP constrains the projection matrix $P$ that maps Item-ID embeddings $E$ to the interaction space $H = EP$, by optimizing $P$ on the Stiefel manifold.
  • This preserves the singular value spectrum of the upstream gradients, forcing the model to explore a broader representation space for infrequent items.

Data & Experimental Setup

  • The authors evaluate OCP on both a generative retrieval model (OxygenREC) and a ranking model from JD.com's production environment.
  • Offline experiments vary vocabulary size from 100 million to 1 billion, while online A/B tests scale the Item-ID vocabulary from 178 million to 1 billion.
  • Evaluation metrics include Top1 hit@k for retrieval and AUC/GAUC for ranking.

Results

  • OCP consistently improves performance compared to baselines, yielding:
    • +0.52 percentage point gain in Top1 hit@3 for the 3.2B OxygenREC model.
    • +0.71% and +0.85% improvements in Singular Entropy for low-frequency and high-frequency items, respectively.
  • Online A/B tests show that OCP enables a 12.97% increase in UCXR and an 8.95% uplift in GMV.

Interpretation

  • OCP's orthogonal constraint preserves the singular value spectrum of the learned embeddings, preventing the "squeezing" of the gradient manifold.
  • This ensures that gradients for infrequent items are updated along diverse orthogonal directions, effectively "stretching" the representation space and mitigating embedding collapse.
  • The gains from OCP are complementary to increasing model density, as it maintains representation quality when scaling sparse vocabularies.

Limitations & Uncertainties

  • The current study focuses on a single orthogonality mechanism (QR retraction) and fixed Item-ID vocabulary construction pipelines.
  • Future work includes exploring alternative manifold optimizers, integrating dynamic vocabulary growth and pruning, and analyzing OCP under stronger distribution shifts.

What Comes Next

The authors plan to investigate:

  • Comparing alternative manifold optimizers with lower overhead
  • Integrating dynamic vocabulary growth and pruning
  • Analyzing OCP under stronger distribution shifts (e.g., seasonal and cold-start bursts) with longer online evaluation windows

Source