Data Tinkerer

Data Tinkerer

Share this post

Data Tinkerer
Data Tinkerer
Scaling Apache Flink: How Reddit Cut Memory Usage by 60%
Data Engineering

Scaling Apache Flink: How Reddit Cut Memory Usage by 60%

Optimizing real-time ad validation with field filtering, tiered storage, and infrastructure enhancements.

Data Tinkerer's avatar
Data Tinkerer
Feb 19, 2025
∙ Paid
2

Share this post

Data Tinkerer
Data Tinkerer
Scaling Apache Flink: How Reddit Cut Memory Usage by 60%
2
Share
red and white 8 logo
Photo by Brett Jordan on Unsplash

Situation

Reddit's advertising platform processes thousands of ad engagement events per second, necessitating real-time validation and enrichment to ensure accurate reporting and prevent budget overdelivery.

Task

Develop a scalable, real-time ad event validation system capable of efficiently handling high event volumes while maintaining performance and reliability.

Action

The engineering team developed the Ad Events Validator (AEV) utilizing Apache Flink to correlate ad server events with user engagement events. To overcome issues related to large state sizes and resource demands, they implemented:

  • Field Filtering: Conducted a thorough analysis of downstream data consumption, establishing an allowlist that significantly reduced the event payload size by 90%, leading to CPU and memory usage reductions of 25% and 60%, respectively.

  • Tiered State Storage: Integrated Apache Cassandra for external state storage, effectively reducing in-memory state size and enhancing the efficiency of checkpointing and system recovery processes.

Result

These strategic enhancements resulted in a more scalable and cost-efficient AEV system, improving overall performance and operational effectiveness.

Use Cases

Real-Time Event Validation, Data Enrichment, Resource Optimization

Tech Stack/Framework

Apache Flink, Apache Kafka, Apache Cassandra


Explained Further


Background

Reddit processes thousands of ad engagement events per second. These events require validation and enrichment before being sent to downstream systems. Key components of this validation process include applying a standardized look-back window and filtering out suspected invalid traffic.

In addition to a batch validation pipeline, a near real-time pipeline improves budget spend accuracy and provides advertisers with real-time insights into campaign performance. This real-time component, known as the Ad Events Validator (AEV), is built using Apache Flink. AEV matches ad server events with engagement events and writes the validated results to a separate Kafka topic for downstream consumption.

Overview of the real-time ad engagement event validation system (Source: Reddit)

Building and maintaining AEV though, presented several challenges to the Reddit team


1st Challenge: Addressing High State Size Issues

This post is for paid subscribers

Already a paid subscriber? Sign in
© 2025 Data Tinkerer
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share