How Does Etsy's ML Detect 100K Content Violations

Etsy combines multimodal machine learning with efficient architectures to manage trust and safety across 100M+ listings and 7M sellers

Dec 22, 2024

∙ Paid

TL;DR

Situation

With over 100 million unique items and a vibrant community of more than 90 million active buyers and 7 million active sellers, Etsy faced challenges in enforcing policies and removing potentially violating or infringing items at scale.

Task

Etsy aimed to develop an automated system to detect policy violations effectively, complementing community reporting and manual reviews, to maintain marketplace integrity

Action

The team developed a two-component solution:

Data Collection: Annotated violations into positives and hard negatives and soft negatives for better model accuracy
Feature Extraction: Extracted multimodal signals, both textual and imagery, from listings to build a comprehensive dataset
Dataset Splitting: Adopted time-based splits to ensure relevance in dynamic environments
Model Development: Built a supervised model with violation, neutral, and non-violation classes
Model Evaluation: Used progressive evaluation, offline benchmarks, and continuous monitoring to refine the model and minimize false positives.

Result

Etsy’s machine learning system identified and removed over 100,000 policy-violating listings, significantly enhancing content moderation efficiency.

Use Cases

Content Moderation, Policy Compliance

Tech Stack/Framework

ALBERT, EfficientNet, EmbraceNet

Explained Further

Data Collection and Preparation

Continue reading this post for free, courtesy of Data Tinkerer.

Or purchase a paid subscription.