Dice 推薦系統架構 · Dcard Recommendation Architecture

① End-to-End Flow 整體流程

一次 TW 推薦請求會依序經過 Candidate Generation（Go 服務，多種來源的候選）、 Ranking Engine（Python 服務，ML 打分）、Diversity / Rules（多樣性與規則後處理），最終輸出使用者看到的 feed。

flowchart LR
    U["👤 Client Request
member_id"]
    CG["🎯 Candidate Generation
(Go · data-dice-candidate)
~25 sources"]
    DICE["⚙️ Dice API
(Python · FastAPI)
DiceRankingEngineTW"]
    FEAT["🧊 Feature Store
Redis · BigTable · PG"]
    MODEL["🧠 Models
Multi-task ONNX NN
Triton Server"]
    DIV["🔀 Diversity & Rules
SlidingWindow · MMR
Pinned · Guides"]
    OUT["📰 Personalized Feed"]

    U --> CG --> DICE
    DICE --> FEAT
    DICE --> MODEL
    DICE --> DIV --> OUT

    classDef go fill:#00ADD8,stroke:#007d9c,color:#fff
    classDef py fill:#3776AB,stroke:#1e4d75,color:#fff
    classDef data fill:#F59E0B,stroke:#B45309,color:#fff
    classDef ml fill:#10B981,stroke:#047857,color:#fff
    classDef out fill:#374151,stroke:#1f2937,color:#fff
    class CG go
    class DICE,DIV py
    class FEAT data
    class MODEL ml
    class U,OUT out

Candidate Generation · Dcard-Backend/modules/data-dice-candidate
每次請求並行召回所有啟用的 candidate source，做 pre-filter（NSFW、spam、品質、封鎖）後合併成候選池。

Ranking Engine · Dice-API/dice_api/engine/dice
由 DiceRankingEngineTW 統一讀 features、送進 ranker 打分，最後交給 diversity ranker 排版。

Models · dice_api/model
使用 multi-task ONNX NN 預測 click / react / click_duration；V2 版本透過 Triton inference server 執行。

② Candidate Generation 候選生成

Go 服務 data-dice-candidate 中每個 candidate source 都是一個獨立的召回策略，依意圖分為七類。每次請求並行執行，彙整後進入 ranker。

Hot / Popular 熱門與高互動

popular

Popular posts from past 1-3 days sorted by engagement, split by recency buckets.

tw_default

Default ranking for Taiwan region, posts sorted by like count descending.

New / Fresh 新鮮內容

new

Newly posted content within configurable age threshold, sorted by impression count.

Content Similarity 內容相似

item_cf_last_n

Posts similar to recently-viewed content via item-based collaborative filtering.

last_n_tags

Posts sharing tags with last-viewed posts, scored by tag overlap similarity.

similar_content_tw

Posts similar to user's viewed posts via pre-computed post similarity graph.

User Profile 使用者輪廓

interest_category

Posts from interest category L1 tags matched to user's interests, ranked by like rate.

job_title

Posts pre-computed as relevant to user's job title classification.

user_profile_popular

Popular posts per user's profile tags with linearly-decreasing allocation per tag.

similar_user_viewed

Posts viewed by similar users, merged from realtime and historical post-view data.

Social / Subscriber 社交訂閱

forum

Posts from user's subscribed forums, with weighted scores based on viewing history and DTR.

persona_subscribed

Posts from user's subscribed personas, randomly shuffled.

similar_subscriber

Posts from forums similar to user's subscriptions via static forum-similarity mapping.

Search / Topic 搜尋與主題

search

Posts matching user's search query history, ranked by Elasticsearch relevance score.

topic

Posts from subscribed topics with DTR-based sorting, or top posts per topic.

Offline ML 離線模型

two_tower_offline

Two-tower neural network offline predictions, pre-generated candidate list per user.

user_tag_evergreen

High-quality evergreen posts matched to user's interest tags, pre-generated list.

Operations 營運／策略

boosted

Boosted posts with high promotion scores, ranked by strength metric.

op_hawkeye

Posts selected by operations team via Hawkeye tasks, with boosted visibility.

op_recsys

Posts from operations RecSys curated for personas, scored by forum CTR prediction.

所有 candidates 在合併前會先經過 pre-filter（NSFW / spam / quality / personalization block）與 matcher（使用者個人化過濾，如封鎖名單、已看過的文章）。

③ Ranking Pipeline 排序管線

候選進入 Dice-API 後，由 DiceRankingEngineTW 統一協調，依 Reader → Ranker → Model → Diversity 這條管線跑完整個流程。

flowchart TB
    subgraph Engine["TW Engine (dice_api/engine/dice)"]
      ETW["DiceRankingEngineTW
pinned · follow guide
job title boost · new post guarantee"]
    end

    subgraph Readers["Readers (dice_api/reader)"]
      R1[PostsReader]
      R2[ForumWhiteListReader]
      R3[ForumGroupReader]
      R4[CreatorBadgePostReader]
      R5[PostPoliticsReader]
    end

    subgraph Rankers["Rankers (dice_api/ranker)"]
      RK1[BaseTWRanker
multi-task NN]
      RK2[DailyRetrainRanker
click · react · duration ensemble]
      RK4[SlidingWindowRanker
diversity]
      RK5[MMRSlidingWindowRanker
MMR diversity]
    end

    subgraph Models["Models (dice_api/model)"]
      M1[DailyRetrainModel · ONNX]
      M2[DailyRetrainModelV2 · Triton]
    end

    ETW --> Readers --> Rankers
    Rankers --> Models
    Rankers --> RK4
    RK4 --> Feed["Final Feed"]
    RK5 --> Feed

    classDef py fill:#3776AB,stroke:#1e4d75,color:#fff
    classDef data fill:#F59E0B,stroke:#B45309,color:#fff
    classDef ml fill:#10B981,stroke:#047857,color:#fff
    class ETW py
    class R1,R2,R3,R4,R5 data
    class M1,M2 ml

Engine TW orchestration

BaseDiceRankingEngine — Abstract base engine orchestrating member prep, ranking, and feed generation with metrics tracking.
DiceRankingEngineTW — Taiwan engine with pinned post handling, forum follow guides, job title promotion, and new post guarantees.

Readers features from Redis / PG

PostsReader — Reads post metadata from Redis cache including engagement metrics and content features.
PostPoliticsReader — Reads post politics classification predictions from Redis for content filtering.
CreatorBadgePostReader — Reads creator badge post IDs to identify verified creator posts.
ForumWhiteListReader — Reads forum whitelist metadata including region and school status.
ForumGroupReader — Reads forum group classifications mapping aliases to categories.
ForumRegionReader — Reads forum region assignments for geographic classification.

Rankers ML scoring

BaseTWRanker — Taiwan multi-task neural network combining member, post, and interaction features with tag-aware cross features.
BaseTWRankerV2 — TW ranker variant preserving null values for improved model handling.
DailyRetrainRanker — Daily-retrained ensemble combining click and reaction predictions with duration-weighted scoring.

Diversity Rankers

SlidingWindowRanker — Diversity ranker reordering feed with sliding-window rules for forum/media distribution and evergreen spreading.
MMRSlidingWindowRanker — MMR diversity using post embeddings to maximize diversity.

Models

DailyRetrainModel — Multi-task ONNX NN predicting click, react, and click_duration for TW feed ranking.
DailyRetrainModelV2 — Pipelined multi-task NN delegating inference to Triton server with batched prediction.

④ Recipe & Experiment System 實驗配置

所有 ranker / diversity / feature 的啟用與參數都由 recipe 驅動。多個 group 以字母順序疊加，支援 A/B 實驗與個人設定覆蓋。

flowchart LR
    D["default
configs/recipe/base.py"]
    E1["experiment groups
dice_359_b1, kytu, ..."]
    A["author override
configs/recipe/dice/tw/authors/{name}.py"]
    F["⚙️ Final Recipe
(applied in sorted order)"]

    D --> F
    E1 --> F
    A --> F

    classDef base fill:#6366F1,stroke:#4338CA,color:#fff
    classDef exp fill:#EC4899,stroke:#BE185D,color:#fff
    classDef auth fill:#F59E0B,stroke:#B45309,color:#fff
    classDef final fill:#10B981,stroke:#047857,color:#fff
    class D base
    class E1 exp
    class A auth
    class F final

Example · A/B 實驗對照組

# configs/recipe/dice/tw/dice_359_a1.py  (control)
def apply(config):
    return config  # 維持原設定

# configs/recipe/dice/tw/dice_359_b1.py  (treatment)
def apply(config):
    config.use_forum_subscriptions_streaming = True
    return config

Control group — 通常為空，維持 base recipe；作為對照組收集 baseline 指標。

Treatment group — 啟用實驗特徵、切換 ranker、調整參數等。

∑

Stacking — ["dice_359_b1", "kytu"] → 依字母順序依序套用，結果是確定性的 deterministic recipe。

⑤ Tech Stack 技術組成

🐍 Python · Dice-API

FastAPI · Poetry · uvicorn
OpenTelemetry tracing · Prometheus metrics
ONNX Runtime · scikit-learn
Triton inference client

🐹 Go · Candidate Service

Go 1.24+ · gRPC · Protocol Buffers
Wire DI framework
~25 parallel candidate sources
pre-filter + matcher 個人化過濾

🧊 Storage

Redis · post / forum / feature cache
PostgreSQL · 關聯資料
BigTable · 大量 feature serving
GCS · feed snapshot

📊 Data / ML

BigQuery · feature engineering
ONNX / TensorFlow · model export
Triton · GPU inference
A/B via recipe system

🌐 TW Market

DiceRankingEngineTW
Multi-task NN (click / react / duration)
Sliding-window & MMR diversity
Recipe-driven A/B experimentation

🔁 Deployment

Docker · GKE
gRPC between Py ↔ Go
Multi-region serving
pre-commit · pytest · dockerbuild