Dcard · Data Team

Dice Recommendation System

TW 推薦系統的 end-to-end 架構 — 從 candidate generation 到 ranking、diversity,再到最終 feed。

FastAPI Go · gRPC Redis · BigTable ONNX · Triton · Multi-task NN Market: TW

End-to-End Flow 整體流程

一次 TW 推薦請求會依序經過 Candidate Generation(Go 服務,多種來源的候選)、 Ranking Engine(Python 服務,ML 打分)、Diversity / Rules(多樣性與規則後處理), 最終輸出使用者看到的 feed。

flowchart LR
    U["👤 Client Request
member_id"] CG["🎯 Candidate Generation
(Go · data-dice-candidate)
~25 sources"] DICE["⚙️ Dice API
(Python · FastAPI)
DiceRankingEngineTW"] FEAT["🧊 Feature Store
Redis · BigTable · PG"] MODEL["🧠 Models
Multi-task ONNX NN
Triton Server"] DIV["🔀 Diversity & Rules
SlidingWindow · MMR
Pinned · Guides"] OUT["📰 Personalized Feed"] U --> CG --> DICE DICE --> FEAT DICE --> MODEL DICE --> DIV --> OUT classDef go fill:#00ADD8,stroke:#007d9c,color:#fff classDef py fill:#3776AB,stroke:#1e4d75,color:#fff classDef data fill:#F59E0B,stroke:#B45309,color:#fff classDef ml fill:#10B981,stroke:#047857,color:#fff classDef out fill:#374151,stroke:#1f2937,color:#fff class CG go class DICE,DIV py class FEAT data class MODEL ml class U,OUT out
Go
Candidate Generation · Dcard-Backend/modules/data-dice-candidate
每次請求並行召回所有啟用的 candidate source,做 pre-filter(NSFW、spam、品質、封鎖)後合併成候選池。
Py
Ranking Engine · Dice-API/dice_api/engine/dice
由 DiceRankingEngineTW 統一讀 features、送進 ranker 打分,最後交給 diversity ranker 排版。
ML
Models · dice_api/model
使用 multi-task ONNX NN 預測 click / react / click_duration;V2 版本透過 Triton inference server 執行。

Candidate Generation 候選生成

Go 服務 data-dice-candidate 中每個 candidate source 都是一個獨立的召回策略, 依意圖分為七類。每次請求並行執行,彙整後進入 ranker。

Hot / Popular 熱門與高互動

popular

Popular posts from past 1-3 days sorted by engagement, split by recency buckets.

tw_default

Default ranking for Taiwan region, posts sorted by like count descending.

New / Fresh 新鮮內容

new

Newly posted content within configurable age threshold, sorted by impression count.

Content Similarity 內容相似

item_cf_last_n

Posts similar to recently-viewed content via item-based collaborative filtering.

last_n_tags

Posts sharing tags with last-viewed posts, scored by tag overlap similarity.

similar_content_tw

Posts similar to user's viewed posts via pre-computed post similarity graph.

User Profile 使用者輪廓

interest_category

Posts from interest category L1 tags matched to user's interests, ranked by like rate.

job_title

Posts pre-computed as relevant to user's job title classification.

user_profile_popular

Popular posts per user's profile tags with linearly-decreasing allocation per tag.

similar_user_viewed

Posts viewed by similar users, merged from realtime and historical post-view data.

Social / Subscriber 社交訂閱

forum

Posts from user's subscribed forums, with weighted scores based on viewing history and DTR.

persona_subscribed

Posts from user's subscribed personas, randomly shuffled.

similar_subscriber

Posts from forums similar to user's subscriptions via static forum-similarity mapping.

Search / Topic 搜尋與主題

search

Posts matching user's search query history, ranked by Elasticsearch relevance score.

topic

Posts from subscribed topics with DTR-based sorting, or top posts per topic.

Offline ML 離線模型

two_tower_offline

Two-tower neural network offline predictions, pre-generated candidate list per user.

user_tag_evergreen

High-quality evergreen posts matched to user's interest tags, pre-generated list.

Operations 營運/策略

boosted

Boosted posts with high promotion scores, ranked by strength metric.

op_hawkeye

Posts selected by operations team via Hawkeye tasks, with boosted visibility.

op_recsys

Posts from operations RecSys curated for personas, scored by forum CTR prediction.

所有 candidates 在合併前會先經過 pre-filter(NSFW / spam / quality / personalization block)與 matcher(使用者個人化過濾,如封鎖名單、已看過的文章)。

Ranking Pipeline 排序管線

候選進入 Dice-API 後,由 DiceRankingEngineTW 統一協調, 依 Reader → Ranker → Model → Diversity 這條管線跑完整個流程。

flowchart TB
    subgraph Engine["TW Engine (dice_api/engine/dice)"]
      ETW["DiceRankingEngineTW
pinned · follow guide
job title boost · new post guarantee"] end subgraph Readers["Readers (dice_api/reader)"] R1[PostsReader] R2[ForumWhiteListReader] R3[ForumGroupReader] R4[CreatorBadgePostReader] R5[PostPoliticsReader] end subgraph Rankers["Rankers (dice_api/ranker)"] RK1[BaseTWRanker
multi-task NN] RK2[DailyRetrainRanker
click · react · duration ensemble] RK4[SlidingWindowRanker
diversity] RK5[MMRSlidingWindowRanker
MMR diversity] end subgraph Models["Models (dice_api/model)"] M1[DailyRetrainModel · ONNX] M2[DailyRetrainModelV2 · Triton] end ETW --> Readers --> Rankers Rankers --> Models Rankers --> RK4 RK4 --> Feed["Final Feed"] RK5 --> Feed classDef py fill:#3776AB,stroke:#1e4d75,color:#fff classDef data fill:#F59E0B,stroke:#B45309,color:#fff classDef ml fill:#10B981,stroke:#047857,color:#fff class ETW py class R1,R2,R3,R4,R5 data class M1,M2 ml

Engine TW orchestration

  • BaseDiceRankingEngine — Abstract base engine orchestrating member prep, ranking, and feed generation with metrics tracking.
  • DiceRankingEngineTW — Taiwan engine with pinned post handling, forum follow guides, job title promotion, and new post guarantees.

Readers features from Redis / PG

  • PostsReader — Reads post metadata from Redis cache including engagement metrics and content features.
  • PostPoliticsReader — Reads post politics classification predictions from Redis for content filtering.
  • CreatorBadgePostReader — Reads creator badge post IDs to identify verified creator posts.
  • ForumWhiteListReader — Reads forum whitelist metadata including region and school status.
  • ForumGroupReader — Reads forum group classifications mapping aliases to categories.
  • ForumRegionReader — Reads forum region assignments for geographic classification.

Rankers ML scoring

  • BaseTWRanker — Taiwan multi-task neural network combining member, post, and interaction features with tag-aware cross features.
  • BaseTWRankerV2 — TW ranker variant preserving null values for improved model handling.
  • DailyRetrainRanker — Daily-retrained ensemble combining click and reaction predictions with duration-weighted scoring.

Diversity Rankers

  • SlidingWindowRanker — Diversity ranker reordering feed with sliding-window rules for forum/media distribution and evergreen spreading.
  • MMRSlidingWindowRanker — MMR diversity using post embeddings to maximize diversity.

Models

  • DailyRetrainModel — Multi-task ONNX NN predicting click, react, and click_duration for TW feed ranking.
  • DailyRetrainModelV2 — Pipelined multi-task NN delegating inference to Triton server with batched prediction.

Recipe & Experiment System 實驗配置

所有 ranker / diversity / feature 的啟用與參數都由 recipe 驅動。 多個 group 以字母順序疊加,支援 A/B 實驗與個人設定覆蓋。

flowchart LR
    D["default
configs/recipe/base.py"] E1["experiment groups
dice_359_b1, kytu, ..."] A["author override
configs/recipe/dice/tw/authors/{name}.py"] F["⚙️ Final Recipe
(applied in sorted order)"] D --> F E1 --> F A --> F classDef base fill:#6366F1,stroke:#4338CA,color:#fff classDef exp fill:#EC4899,stroke:#BE185D,color:#fff classDef auth fill:#F59E0B,stroke:#B45309,color:#fff classDef final fill:#10B981,stroke:#047857,color:#fff class D base class E1 exp class A auth class F final
Example · A/B 實驗對照組
# configs/recipe/dice/tw/dice_359_a1.py  (control)
def apply(config):
    return config  # 維持原設定

# configs/recipe/dice/tw/dice_359_b1.py  (treatment)
def apply(config):
    config.use_forum_subscriptions_streaming = True
    return config
A1
Control group — 通常為空,維持 base recipe;作為對照組收集 baseline 指標。
B1
Treatment group — 啟用實驗特徵、切換 ranker、調整參數等。
Stacking["dice_359_b1", "kytu"] → 依字母順序依序套用,結果是確定性的 deterministic recipe。

Tech Stack 技術組成

🐍 Python · Dice-API

  • FastAPI · Poetry · uvicorn
  • OpenTelemetry tracing · Prometheus metrics
  • ONNX Runtime · scikit-learn
  • Triton inference client

🐹 Go · Candidate Service

  • Go 1.24+ · gRPC · Protocol Buffers
  • Wire DI framework
  • ~25 parallel candidate sources
  • pre-filter + matcher 個人化過濾

🧊 Storage

  • Redis · post / forum / feature cache
  • PostgreSQL · 關聯資料
  • BigTable · 大量 feature serving
  • GCS · feed snapshot

📊 Data / ML

  • BigQuery · feature engineering
  • ONNX / TensorFlow · model export
  • Triton · GPU inference
  • A/B via recipe system

🌐 TW Market

  • DiceRankingEngineTW
  • Multi-task NN (click / react / duration)
  • Sliding-window & MMR diversity
  • Recipe-driven A/B experimentation

🔁 Deployment

  • Docker · GKE
  • gRPC between Py ↔ Go
  • Multi-region serving
  • pre-commit · pytest · dockerbuild