① End-to-End Flow 整體流程
一次 TW 推薦請求會依序經過 Candidate Generation(Go 服務,多種來源的候選)、 Ranking Engine(Python 服務,ML 打分)、Diversity / Rules(多樣性與規則後處理), 最終輸出使用者看到的 feed。
flowchart LR
U["👤 Client Request
member_id"]
CG["🎯 Candidate Generation
(Go · data-dice-candidate)
~25 sources"]
DICE["⚙️ Dice API
(Python · FastAPI)
DiceRankingEngineTW"]
FEAT["🧊 Feature Store
Redis · BigTable · PG"]
MODEL["🧠 Models
Multi-task ONNX NN
Triton Server"]
DIV["🔀 Diversity & Rules
SlidingWindow · MMR
Pinned · Guides"]
OUT["📰 Personalized Feed"]
U --> CG --> DICE
DICE --> FEAT
DICE --> MODEL
DICE --> DIV --> OUT
classDef go fill:#00ADD8,stroke:#007d9c,color:#fff
classDef py fill:#3776AB,stroke:#1e4d75,color:#fff
classDef data fill:#F59E0B,stroke:#B45309,color:#fff
classDef ml fill:#10B981,stroke:#047857,color:#fff
classDef out fill:#374151,stroke:#1f2937,color:#fff
class CG go
class DICE,DIV py
class FEAT data
class MODEL ml
class U,OUT out
每次請求並行召回所有啟用的 candidate source,做 pre-filter(NSFW、spam、品質、封鎖)後合併成候選池。
由 DiceRankingEngineTW 統一讀 features、送進 ranker 打分,最後交給 diversity ranker 排版。
使用 multi-task ONNX NN 預測 click / react / click_duration;V2 版本透過 Triton inference server 執行。
② Candidate Generation 候選生成
Go 服務 data-dice-candidate 中每個 candidate source 都是一個獨立的召回策略,
依意圖分為七類。每次請求並行執行,彙整後進入 ranker。
Hot / Popular 熱門與高互動
Popular posts from past 1-3 days sorted by engagement, split by recency buckets.
Default ranking for Taiwan region, posts sorted by like count descending.
New / Fresh 新鮮內容
Newly posted content within configurable age threshold, sorted by impression count.
Content Similarity 內容相似
Posts similar to recently-viewed content via item-based collaborative filtering.
Posts sharing tags with last-viewed posts, scored by tag overlap similarity.
Posts similar to user's viewed posts via pre-computed post similarity graph.
User Profile 使用者輪廓
Posts from interest category L1 tags matched to user's interests, ranked by like rate.
Posts pre-computed as relevant to user's job title classification.
Popular posts per user's profile tags with linearly-decreasing allocation per tag.
Posts viewed by similar users, merged from realtime and historical post-view data.
Social / Subscriber 社交訂閱
Posts from user's subscribed forums, with weighted scores based on viewing history and DTR.
Posts from user's subscribed personas, randomly shuffled.
Posts from forums similar to user's subscriptions via static forum-similarity mapping.
Search / Topic 搜尋與主題
Posts matching user's search query history, ranked by Elasticsearch relevance score.
Posts from subscribed topics with DTR-based sorting, or top posts per topic.
Offline ML 離線模型
Two-tower neural network offline predictions, pre-generated candidate list per user.
High-quality evergreen posts matched to user's interest tags, pre-generated list.
Operations 營運/策略
Boosted posts with high promotion scores, ranked by strength metric.
Posts selected by operations team via Hawkeye tasks, with boosted visibility.
Posts from operations RecSys curated for personas, scored by forum CTR prediction.
③ Ranking Pipeline 排序管線
候選進入 Dice-API 後,由 DiceRankingEngineTW 統一協調,
依 Reader → Ranker → Model → Diversity 這條管線跑完整個流程。
flowchart TB
subgraph Engine["TW Engine (dice_api/engine/dice)"]
ETW["DiceRankingEngineTW
pinned · follow guide
job title boost · new post guarantee"]
end
subgraph Readers["Readers (dice_api/reader)"]
R1[PostsReader]
R2[ForumWhiteListReader]
R3[ForumGroupReader]
R4[CreatorBadgePostReader]
R5[PostPoliticsReader]
end
subgraph Rankers["Rankers (dice_api/ranker)"]
RK1[BaseTWRanker
multi-task NN]
RK2[DailyRetrainRanker
click · react · duration ensemble]
RK4[SlidingWindowRanker
diversity]
RK5[MMRSlidingWindowRanker
MMR diversity]
end
subgraph Models["Models (dice_api/model)"]
M1[DailyRetrainModel · ONNX]
M2[DailyRetrainModelV2 · Triton]
end
ETW --> Readers --> Rankers
Rankers --> Models
Rankers --> RK4
RK4 --> Feed["Final Feed"]
RK5 --> Feed
classDef py fill:#3776AB,stroke:#1e4d75,color:#fff
classDef data fill:#F59E0B,stroke:#B45309,color:#fff
classDef ml fill:#10B981,stroke:#047857,color:#fff
class ETW py
class R1,R2,R3,R4,R5 data
class M1,M2 ml
Engine TW orchestration
- BaseDiceRankingEngine — Abstract base engine orchestrating member prep, ranking, and feed generation with metrics tracking.
- DiceRankingEngineTW — Taiwan engine with pinned post handling, forum follow guides, job title promotion, and new post guarantees.
Readers features from Redis / PG
- PostsReader — Reads post metadata from Redis cache including engagement metrics and content features.
- PostPoliticsReader — Reads post politics classification predictions from Redis for content filtering.
- CreatorBadgePostReader — Reads creator badge post IDs to identify verified creator posts.
- ForumWhiteListReader — Reads forum whitelist metadata including region and school status.
- ForumGroupReader — Reads forum group classifications mapping aliases to categories.
- ForumRegionReader — Reads forum region assignments for geographic classification.
Rankers ML scoring
- BaseTWRanker — Taiwan multi-task neural network combining member, post, and interaction features with tag-aware cross features.
- BaseTWRankerV2 — TW ranker variant preserving null values for improved model handling.
- DailyRetrainRanker — Daily-retrained ensemble combining click and reaction predictions with duration-weighted scoring.
Diversity Rankers
- SlidingWindowRanker — Diversity ranker reordering feed with sliding-window rules for forum/media distribution and evergreen spreading.
- MMRSlidingWindowRanker — MMR diversity using post embeddings to maximize diversity.
Models
- DailyRetrainModel — Multi-task ONNX NN predicting click, react, and click_duration for TW feed ranking.
- DailyRetrainModelV2 — Pipelined multi-task NN delegating inference to Triton server with batched prediction.
④ Recipe & Experiment System 實驗配置
所有 ranker / diversity / feature 的啟用與參數都由 recipe 驅動。 多個 group 以字母順序疊加,支援 A/B 實驗與個人設定覆蓋。
flowchart LR
D["default
configs/recipe/base.py"]
E1["experiment groups
dice_359_b1, kytu, ..."]
A["author override
configs/recipe/dice/tw/authors/{name}.py"]
F["⚙️ Final Recipe
(applied in sorted order)"]
D --> F
E1 --> F
A --> F
classDef base fill:#6366F1,stroke:#4338CA,color:#fff
classDef exp fill:#EC4899,stroke:#BE185D,color:#fff
classDef auth fill:#F59E0B,stroke:#B45309,color:#fff
classDef final fill:#10B981,stroke:#047857,color:#fff
class D base
class E1 exp
class A auth
class F final
# configs/recipe/dice/tw/dice_359_a1.py (control)
def apply(config):
return config # 維持原設定
# configs/recipe/dice/tw/dice_359_b1.py (treatment)
def apply(config):
config.use_forum_subscriptions_streaming = True
return config
["dice_359_b1", "kytu"] → 依字母順序依序套用,結果是確定性的 deterministic recipe。⑤ Tech Stack 技術組成
🐍 Python · Dice-API
- FastAPI · Poetry · uvicorn
- OpenTelemetry tracing · Prometheus metrics
- ONNX Runtime · scikit-learn
- Triton inference client
🐹 Go · Candidate Service
- Go 1.24+ · gRPC · Protocol Buffers
- Wire DI framework
- ~25 parallel candidate sources
- pre-filter + matcher 個人化過濾
🧊 Storage
- Redis · post / forum / feature cache
- PostgreSQL · 關聯資料
- BigTable · 大量 feature serving
- GCS · feed snapshot
📊 Data / ML
- BigQuery · feature engineering
- ONNX / TensorFlow · model export
- Triton · GPU inference
- A/B via recipe system
🌐 TW Market
- DiceRankingEngineTW
- Multi-task NN (click / react / duration)
- Sliding-window & MMR diversity
- Recipe-driven A/B experimentation
🔁 Deployment
- Docker · GKE
- gRPC between Py ↔ Go
- Multi-region serving
- pre-commit · pytest · dockerbuild