How Does Etsy's ML Detect 100K Content Violations
Etsy combines multimodal machine learning with efficient architectures to manage trust and safety across 100M+ listings and 7M sellers
TL;DR
Situation
With over 100 million unique items and a vibrant community of more than 90 million active buyers and 7 million active sellers, Etsy faced challenges in enforcing policies and removing potentially violating or infringing items at scale.
Task
Etsy aimed to develop an automated system to detect policy violations effectively, complementing community reporting and manual reviews, to maintain marketplace integrity
Action
The team developed a two-component solution:
Data Collection: Annotated violations into positives and hard negatives and soft negatives for better model accuracy
Feature Extraction: Extracted multimodal signals, both textual and imagery, from listings to build a comprehensive dataset
Dataset Splitting: Adopted time-based splits to ensure relevance in dynamic environments
Model Development: Built a supervised model with violation, neutral, and non-violation classes
Model Evaluation: Used progressive evaluation, offline benchmarks, and continuous monitoring to refine the model and minimize false positives.
Result
Etsy’s machine learning system identified and removed over 100,000 policy-violating listings, significantly enhancing content moderation efficiency.
Use Cases
Content Moderation, Policy Compliance
Tech Stack/Framework
ALBERT, EfficientNet, EmbraceNet