![Page 1: SmartNews TechNight Vol.5 : AD Data Engineering in practice: SmartNews Ads裏のデータシステム](https://reader031.vdocuments.pub/reader031/viewer/2022022203/586fd3f21a28ab18428b468d/html5/thumbnails/1.jpg)
Data Engineering In Practice: SmartNews Ads裏のDMP System
Lan
![Page 2: SmartNews TechNight Vol.5 : AD Data Engineering in practice: SmartNews Ads裏のデータシステム](https://reader031.vdocuments.pub/reader031/viewer/2022022203/586fd3f21a28ab18428b468d/html5/thumbnails/2.jpg)
Who am I• Lan
• Veteran hacker but new in AD world
• someone who can make a computer do what he wants—whether the computer wants to or not. (http://paulgraham.com/gba.html)
• ex-{Rakuten, GREE}
• Distribution System, Info Retrieval, ML
![Page 3: SmartNews TechNight Vol.5 : AD Data Engineering in practice: SmartNews Ads裏のデータシステム](https://reader031.vdocuments.pub/reader031/viewer/2022022203/586fd3f21a28ab18428b468d/html5/thumbnails/3.jpg)
Today’s Talk
• DMP in SmartNews Ads
• #1. Prediction
• #2. Targeting
• Future Work & Summary
![Page 4: SmartNews TechNight Vol.5 : AD Data Engineering in practice: SmartNews Ads裏のデータシステム](https://reader031.vdocuments.pub/reader031/viewer/2022022203/586fd3f21a28ab18428b468d/html5/thumbnails/4.jpg)
DMP = Data Management Platform
![Page 5: SmartNews TechNight Vol.5 : AD Data Engineering in practice: SmartNews Ads裏のデータシステム](https://reader031.vdocuments.pub/reader031/viewer/2022022203/586fd3f21a28ab18428b468d/html5/thumbnails/5.jpg)
DMP in SmartNews Ads• Private DMP ( 90%+1st-party data )
• Data Collect, Clean, Aggregation
• ID Mapping
• User Profiling
• User Clustering
• CTR / CVR Prediction
• Lookalike
• Custom Audience
![Page 6: SmartNews TechNight Vol.5 : AD Data Engineering in practice: SmartNews Ads裏のデータシステム](https://reader031.vdocuments.pub/reader031/viewer/2022022203/586fd3f21a28ab18428b468d/html5/thumbnails/6.jpg)
DMPClusters
AD delivery cluster
AD Log in S3
Kinesis
AD tracker
Video AD delivery cluster
DMPstreaming
Audience Data
in DynamoDBRDB
Hadoop
ML
Analytics
Models&
Targeting
SmartNewsLog
ML
Small company but not small data
•Article Meta > 200K/day •Article x {read, share, read_related …} •Channel x {subscribe, preview, view, …} •Push, Live, Weather, Setting, … •Survey result
•Audience Data > 14M (~5M MAU)
•AD Meta •AD History •AD Conversions •AD Optout
• Managed/Compressed Data > 130TB
• Lookalike seeds
• ~1TB Data for training CTR prediction model •> 1M unique features
•User Demographics •Device •Locations •…
![Page 7: SmartNews TechNight Vol.5 : AD Data Engineering in practice: SmartNews Ads裏のデータシステム](https://reader031.vdocuments.pub/reader031/viewer/2022022203/586fd3f21a28ab18428b468d/html5/thumbnails/7.jpg)
#1 Prediction
![Page 8: SmartNews TechNight Vol.5 : AD Data Engineering in practice: SmartNews Ads裏のデータシステム](https://reader031.vdocuments.pub/reader031/viewer/2022022203/586fd3f21a28ab18428b468d/html5/thumbnails/8.jpg)
Pick up an ADto feed here
![Page 9: SmartNews TechNight Vol.5 : AD Data Engineering in practice: SmartNews Ads裏のデータシステム](https://reader031.vdocuments.pub/reader031/viewer/2022022203/586fd3f21a28ab18428b468d/html5/thumbnails/9.jpg)
Similar to Recommendation
but DIFFERENT
• optimization goal • accuracy of the probability
![Page 10: SmartNews TechNight Vol.5 : AD Data Engineering in practice: SmartNews Ads裏のデータシステム](https://reader031.vdocuments.pub/reader031/viewer/2022022203/586fd3f21a28ab18428b468d/html5/thumbnails/10.jpg)
More than Ranking • When we do AD auction
• eCPM (effective Cost per Mille) = CTR (Click Through Rate) x CPC (Cost per Click)
• Suppose we have
• CTRad1=0.05 > CTRad2=0.04 > CTRad3=0.03
• CPCad1 = 10JPY, CPCad2 = 13JPY, CPCad3 = 20JPY(winner)
• but if: pCTRad1 = 0.2 (winner) > pCTR’ad2 = 0.1 > pCTR’ad3 = 0.03
• then we lost 0.1JPY potential income
![Page 11: SmartNews TechNight Vol.5 : AD Data Engineering in practice: SmartNews Ads裏のデータシステム](https://reader031.vdocuments.pub/reader031/viewer/2022022203/586fd3f21a28ab18428b468d/html5/thumbnails/11.jpg)
The CTR(CVR) prediction Problem
μ(a, u, c) = p(click | a,u,c)
![Page 12: SmartNews TechNight Vol.5 : AD Data Engineering in practice: SmartNews Ads裏のデータシステム](https://reader031.vdocuments.pub/reader031/viewer/2022022203/586fd3f21a28ab18428b468d/html5/thumbnails/12.jpg)
CTR Prediction v1• Train and scoring daily
• One GBDT (Gradient Boosting Decision Tree) model per AD campaign
• using ~1month’s data
• Hundreds of small batches inside Hadoop Yarn
• Quick and Simple
• dev in 1 month
• pick up best features for every campaign
• minutes ~ 1 hour for model training
• explainable Tree models
• no need for AD feature
• Same approach for CVR prediction (CPC / CVR = CPA (Cost Per Acquisition) )
delivery result
UserFeatures
generatesamples
Yarn
Users
predictions
sample
model
scoring
sample
model
scoring
sample
model
scoring
…
![Page 13: SmartNews TechNight Vol.5 : AD Data Engineering in practice: SmartNews Ads裏のデータシステム](https://reader031.vdocuments.pub/reader031/viewer/2022022203/586fd3f21a28ab18428b468d/html5/thumbnails/13.jpg)
Metrics• NE (Normalized Cross- Entropy)
• the average log loss when using predicted CTR / the average log loss per impression
• https://facebook.com//download/321355358042503/adkdd_2014_camera_ready_junfeng.pdf
• AUC (Area under the ROC curve, AUROC)
• measure ranking quality
• others: Precision/Recall, ECS(Effective catalog size), CTR / CVR / Sales, etc
![Page 14: SmartNews TechNight Vol.5 : AD Data Engineering in practice: SmartNews Ads裏のデータシステム](https://reader031.vdocuments.pub/reader031/viewer/2022022203/586fd3f21a28ab18428b468d/html5/thumbnails/14.jpg)
Review of CTR Prediction v1• Marked improvement, moderate AUC & NE
• And
• hard to do overall tuning
• hard to prediction online (feature set differs)
• latency for new campaigns
• relatively poor performance to new campaigns (cold start)
• lost the connections between campaigns even for the same advertiser
• …
![Page 15: SmartNews TechNight Vol.5 : AD Data Engineering in practice: SmartNews Ads裏のデータシステム](https://reader031.vdocuments.pub/reader031/viewer/2022022203/586fd3f21a28ab18428b468d/html5/thumbnails/15.jpg)
CTR Prediction v2• A simple model for all
• AD feature added
• Dynamic features extraction
• All calculation distributed
• GBDT + LogisticRegression
• Train once per day, scoring twice
![Page 16: SmartNews TechNight Vol.5 : AD Data Engineering in practice: SmartNews Ads裏のデータシステム](https://reader031.vdocuments.pub/reader031/viewer/2022022203/586fd3f21a28ab18428b468d/html5/thumbnails/16.jpg)
About the Features• >1M unique features, sparse
• GBDT provides great feature engineering
• (sometimes) feature engineering is kind of intuition and trial-and-error
• demographic, device, location, reading interests…
• AD history is helpful
• Feature Hashing, Binarization & Discretization, …
![Page 17: SmartNews TechNight Vol.5 : AD Data Engineering in practice: SmartNews Ads裏のデータシステム](https://reader031.vdocuments.pub/reader031/viewer/2022022203/586fd3f21a28ab18428b468d/html5/thumbnails/17.jpg)
Performance improvement
![Page 18: SmartNews TechNight Vol.5 : AD Data Engineering in practice: SmartNews Ads裏のデータシステム](https://reader031.vdocuments.pub/reader031/viewer/2022022203/586fd3f21a28ab18428b468d/html5/thumbnails/18.jpg)
#2 Targeting
Watabe
TamTam
Komiya
Takei
Ikeishi
Nagase
Lan
Niku
Game
Beer
Snack
Costume
Gourmet
Princess
![Page 19: SmartNews TechNight Vol.5 : AD Data Engineering in practice: SmartNews Ads裏のデータシステム](https://reader031.vdocuments.pub/reader031/viewer/2022022203/586fd3f21a28ab18428b468d/html5/thumbnails/19.jpg)
It’s difficult comparing to
![Page 20: SmartNews TechNight Vol.5 : AD Data Engineering in practice: SmartNews Ads裏のデータシステム](https://reader031.vdocuments.pub/reader031/viewer/2022022203/586fd3f21a28ab18428b468d/html5/thumbnails/20.jpg)
Profiling User by Statistics and ML
• Gender Prediction (precision: 0.90+), Age Prediction, …
• News Channel / Source Preference
• AD Slot Preference
• …
![Page 21: SmartNews TechNight Vol.5 : AD Data Engineering in practice: SmartNews Ads裏のデータシステム](https://reader031.vdocuments.pub/reader031/viewer/2022022203/586fd3f21a28ab18428b468d/html5/thumbnails/21.jpg)
Standard Targeting
• Female in Kansai who subscribes Travel Channel
![Page 22: SmartNews TechNight Vol.5 : AD Data Engineering in practice: SmartNews Ads裏のデータシステム](https://reader031.vdocuments.pub/reader031/viewer/2022022203/586fd3f21a28ab18428b468d/html5/thumbnails/22.jpg)
Lookalike Targeting
![Page 23: SmartNews TechNight Vol.5 : AD Data Engineering in practice: SmartNews Ads裏のデータシステム](https://reader031.vdocuments.pub/reader031/viewer/2022022203/586fd3f21a28ab18428b468d/html5/thumbnails/23.jpg)
Lookalike Targeting• Our solution
• Solve it as an classification problem
• Seed user as Positive Sample
• While all targeting candidates as Negative Sample (w/ random sampling )
• based on Spark MLlib Logistic Regression
• 30%~50% CVR↑ comparing to normal targeting
![Page 24: SmartNews TechNight Vol.5 : AD Data Engineering in practice: SmartNews Ads裏のデータシステム](https://reader031.vdocuments.pub/reader031/viewer/2022022203/586fd3f21a28ab18428b468d/html5/thumbnails/24.jpg)
Article Keyword TargetingKeyword
Realtime Calculating Reach UU
Only user who exceeds a certain
read-time threshold will be included
![Page 25: SmartNews TechNight Vol.5 : AD Data Engineering in practice: SmartNews Ads裏のデータシステム](https://reader031.vdocuments.pub/reader031/viewer/2022022203/586fd3f21a28ab18428b468d/html5/thumbnails/25.jpg)
Custom Audience
SmartNewsAD
tracker
Send any custom event(S2S req, web beacon, etc)
EventAudience
BloomFilter Obj
Updatingper
Several Minutes
YourService / App / Site
SmartNewsAD
DeliveryCluster
AD targeting/
Delete Targeting
Lookalike
Lookalike Targeting
![Page 26: SmartNews TechNight Vol.5 : AD Data Engineering in practice: SmartNews Ads裏のデータシステム](https://reader031.vdocuments.pub/reader031/viewer/2022022203/586fd3f21a28ab18428b468d/html5/thumbnails/26.jpg)
Future Work
![Page 27: SmartNews TechNight Vol.5 : AD Data Engineering in practice: SmartNews Ads裏のデータシステム](https://reader031.vdocuments.pub/reader031/viewer/2022022203/586fd3f21a28ab18428b468d/html5/thumbnails/27.jpg)
Targeting Audience by Interests
![Page 28: SmartNews TechNight Vol.5 : AD Data Engineering in practice: SmartNews Ads裏のデータシステム](https://reader031.vdocuments.pub/reader031/viewer/2022022203/586fd3f21a28ab18428b468d/html5/thumbnails/28.jpg)
Collect Negative Signal to
Optimize UX
![Page 29: SmartNews TechNight Vol.5 : AD Data Engineering in practice: SmartNews Ads裏のデータシステム](https://reader031.vdocuments.pub/reader031/viewer/2022022203/586fd3f21a28ab18428b468d/html5/thumbnails/29.jpg)
Summary of My 1st SmartNews Year
• Challenge place. We’re startup so we can move quick and break things
• Learn from the industry leaders. Keep trial-and-error.
• Number don’t lie. Don’t trust your intuition over number.
• But if you really doubt the number, look closely. there may be BUG hidden.