<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:googleplay="http://www.google.com/schemas/play-podcasts/1.0"><channel><title><![CDATA[Data Tinkerer: Data Engineering]]></title><description><![CDATA[Dive into the latest trends and updates in data engineering!]]></description><link>https://www.datatinkerer.io/s/data-engineering</link><image><url>https://substackcdn.com/image/fetch/$s_!JEdj!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6bea5ccd-f356-4154-a1db-16268300510e_500x500.png</url><title>Data Tinkerer: Data Engineering</title><link>https://www.datatinkerer.io/s/data-engineering</link></image><generator>Substack</generator><lastBuildDate>Sat, 23 May 2026 17:49:52 GMT</lastBuildDate><atom:link href="https://www.datatinkerer.io/feed" rel="self" type="application/rss+xml"/><copyright><![CDATA[Data Tinkerer]]></copyright><language><![CDATA[en]]></language><webMaster><![CDATA[datatinkerer@substack.com]]></webMaster><itunes:owner><itunes:email><![CDATA[datatinkerer@substack.com]]></itunes:email><itunes:name><![CDATA[Data Tinkerer]]></itunes:name></itunes:owner><itunes:author><![CDATA[Data Tinkerer]]></itunes:author><googleplay:owner><![CDATA[datatinkerer@substack.com]]></googleplay:owner><googleplay:email><![CDATA[datatinkerer@substack.com]]></googleplay:email><googleplay:author><![CDATA[Data Tinkerer]]></googleplay:author><itunes:block><![CDATA[Yes]]></itunes:block><item><title><![CDATA[How Airtable Saved Millions by Cutting Archive Storage Costs by 100x]]></title><description><![CDATA[Airtable moved petabytes of cold log data out of MySQL and built a cheaper archive layer on S3 and Parquet without sacrificing fast queries.]]></description><link>https://www.datatinkerer.io/p/how-airtable-saved-millions-by-cutting</link><guid isPermaLink="false">https://www.datatinkerer.io/p/how-airtable-saved-millions-by-cutting</guid><dc:creator><![CDATA[Data Tinkerer]]></dc:creator><pubDate>Thu, 23 Apr 2026 04:53:35 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/0887cd3f-fd83-4fbd-93df-e009c31ed22b_1536x1024.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Fellow Data Tinkerers</p><p>Today we will look at how Airtable cut archive storage costs by 100x and saved millions.</p><p>But before that, I wanted to share with you what you could unlock if you share Data Tinkerer with just <strong>1 more person</strong>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!-UN6!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa84d7578-0b5f-40e6-9c87-afcbb488bda1_800x402.gif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!-UN6!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa84d7578-0b5f-40e6-9c87-afcbb488bda1_800x402.gif 424w, https://substackcdn.com/image/fetch/$s_!-UN6!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa84d7578-0b5f-40e6-9c87-afcbb488bda1_800x402.gif 848w, https://substackcdn.com/image/fetch/$s_!-UN6!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa84d7578-0b5f-40e6-9c87-afcbb488bda1_800x402.gif 1272w, https://substackcdn.com/image/fetch/$s_!-UN6!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa84d7578-0b5f-40e6-9c87-afcbb488bda1_800x402.gif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!-UN6!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa84d7578-0b5f-40e6-9c87-afcbb488bda1_800x402.gif" width="800" height="402" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a84d7578-0b5f-40e6-9c87-afcbb488bda1_800x402.gif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:402,&quot;width&quot;:800,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:3369150,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/gif&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/194669935?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa84d7578-0b5f-40e6-9c87-afcbb488bda1_800x402.gif&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!-UN6!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa84d7578-0b5f-40e6-9c87-afcbb488bda1_800x402.gif 424w, https://substackcdn.com/image/fetch/$s_!-UN6!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa84d7578-0b5f-40e6-9c87-afcbb488bda1_800x402.gif 848w, https://substackcdn.com/image/fetch/$s_!-UN6!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa84d7578-0b5f-40e6-9c87-afcbb488bda1_800x402.gif 1272w, https://substackcdn.com/image/fetch/$s_!-UN6!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa84d7578-0b5f-40e6-9c87-afcbb488bda1_800x402.gif 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>There are 100+ resources to learn all things data (science, engineering, analysis). It includes videos, courses, projects and can be filtered by tech stack (Python, SQL, Spark and etc), skill level (Beginner, Intermediate and so on) provider name or free/paid. So if you know other people who like staying up to date on all things data, please share Data Tinkerer with them!</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.datatinkerer.io/leaderboard?&amp;utm_source=post&quot;,&quot;text&quot;:&quot;Refer a friend&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.datatinkerer.io/leaderboard?&amp;utm_source=post"><span>Refer a friend</span></a></p><p>Now, with that out of the way, let&#8217;s get into how Airtable pulled it off.</p><h3>TL;DR</h3><div><hr></div><h4><strong>Situation</strong></h4><p>Airtable&#8217;s MySQL storage had grown to petabytes, with archive tables driving major cost and scale issues. Some databases were also getting close to the 64TB RDS limit.</p><h4><strong>Task</strong></h4><p>The team needed to move cold archive data out of MySQL without breaking revision history or slowing queries. They also had to keep durability, availability and enterprise requirements intact.</p><h4><strong>Action</strong></h4><p>Airtable built a two-tier system: recent data stayed in MySQL, old data moved to S3 as Parquet files. They used DataFusion for querying, plus Flink, compaction, validation, caching, indexes and bloom filters.</p><h4><strong>Result</strong></h4><p>Parquet made the archive dataset about 10x smaller and S3 was about 10x cheaper than MySQL. That led to roughly <strong>100x lower storage costs</strong> and <strong>millions in annual savings</strong>.</p><h4><strong>Use Cases</strong></h4><p>Archiving data, reducing storage cost, Improving query latency</p><h4><strong>Tech Stack/Framework</strong></h4><p>Apache DataFusion, AWS MySQL RDS, Apache Flink, AWS SQS</p><div><hr></div><h3>Explained further</h3><div><hr></div><h4>Context</h4><p>Airtable&#8217;s storage team went into 2024 with a pretty blunt problem: too much archive data sitting in the wrong place.</p><p>Their AWS MySQL RDS footprint had grown to petabytes, some of the biggest databases were getting dangerously close to the 64TB RDS disk limit and one particular class of data was doing most of the damage: cell history and action log tables. Together, these acted as Airtable&#8217;s archive layer, powering revision history and helping with internal debugging. For some enterprise customers, that data also had to be retained for up to 10 years.</p><p>The issue was not that the data had no value. It clearly did. The issue was that MySQL was an expensive home for a workload that was mostly cold.</p><p>Most of this archive data was old, rarely touched and read-only except for hard deletion cases. When it was queried, the access patterns were fairly predictable: point selects and paginated range queries, always scoped to a specific base. That made it a poor match for row-oriented OLTP storage at this scale, especially when Airtable still needed interactive latency and strong durability and availability guarantees.</p><p>So the team built a new two-tier storage system. Recent rows would stay in MySQL. Older archive data would move to S3, be stored as Parquet files partitioned by base and queried through a new engine built on Apache DataFusion.</p><p>That shift did more than trim costs around the edges. The final archived dataset became 10x smaller than the original data in MySQL thanks to Parquet compression and S3 itself was around 10x cheaper per byte than MySQL storage. Put those together and the result was a storage layer that was about 100x cheaper.</p><div><hr></div><h4>Why Airtable needed a better archive layer</h4><p>Airtable&#8217;s archive data had a few characteristics that mattered a lot:</p><ul><li><p>the overwhelming majority (trillions of rows) of it was old and infrequently accessed</p></li><li><p>most reads were point selects or range queries used for pagination</p></li><li><p>queries were always filtered to a single base</p></li><li><p>old data was effectively immutable</p></li><li><p>the data was keyed by MySQL&#8217;s <code>autoincr_id</code>, so it naturally followed insertion order</p></li></ul><p>That combination is useful. It tells you the team did not need a general-purpose database for this layer. They needed something cheaper that still handled a narrow set of read patterns well.</p><p>The first key idea was to move archive data from MySQL into S3. S3 was already much cheaper byte-for-byte. The second was to store that data in Parquet and partition it by base. The third was to place a query engine in front of it that could answer interactive requests without forcing full scans.</p><p>This became a two-tier system: hot and recent archive rows in MySQL, older rows in S3-backed Parquet. That let Airtable keep the user-facing experience intact while steadily pulling massive amounts of cold data out of an expensive OLTP system.</p><div><hr></div><h4>Architecture overview</h4><p>At a high level, the architecture is simple enough to explain in one sentence: archive data moved from MySQL into S3 Parquet files, and <a href="https://datafusion.apache.org/">DataFusion</a> was used to query those files directly with low enough latency to support product features.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!uA6C!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5812c0d-7db5-4ff0-a5df-70d50bfc39e0_720x405.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!uA6C!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5812c0d-7db5-4ff0-a5df-70d50bfc39e0_720x405.webp 424w, https://substackcdn.com/image/fetch/$s_!uA6C!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5812c0d-7db5-4ff0-a5df-70d50bfc39e0_720x405.webp 848w, https://substackcdn.com/image/fetch/$s_!uA6C!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5812c0d-7db5-4ff0-a5df-70d50bfc39e0_720x405.webp 1272w, https://substackcdn.com/image/fetch/$s_!uA6C!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5812c0d-7db5-4ff0-a5df-70d50bfc39e0_720x405.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!uA6C!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5812c0d-7db5-4ff0-a5df-70d50bfc39e0_720x405.webp" width="720" height="405" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c5812c0d-7db5-4ff0-a5df-70d50bfc39e0_720x405.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:405,&quot;width&quot;:720,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:21292,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/webp&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/194669935?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5812c0d-7db5-4ff0-a5df-70d50bfc39e0_720x405.webp&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!uA6C!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5812c0d-7db5-4ff0-a5df-70d50bfc39e0_720x405.webp 424w, https://substackcdn.com/image/fetch/$s_!uA6C!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5812c0d-7db5-4ff0-a5df-70d50bfc39e0_720x405.webp 848w, https://substackcdn.com/image/fetch/$s_!uA6C!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5812c0d-7db5-4ff0-a5df-70d50bfc39e0_720x405.webp 1272w, https://substackcdn.com/image/fetch/$s_!uA6C!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5812c0d-7db5-4ff0-a5df-70d50bfc39e0_720x405.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Architecture overview (Source: Airtable)</figcaption></figure></div><p>What made it work was the alignment between storage layout and query patterns.</p><p>The team kept the Parquet schema close to the original MySQL schema. They also preserved ordering by <code>autoincr_id</code>, because a large share of their reads depended on point lookups or ranged access on that field. That meant the query engine could use Parquet metadata to narrow down what bytes to fetch from S3 instead of pulling full files.</p><p>They also partitioned files by base, which mattered just as much. Since queries were always scoped to a specific base, partitioning by base meant Airtable could avoid touching unrelated data entirely.</p><p>On top of that, Airtable stored S3 file location metadata in DynamoDB. This gave the client layer a clean way to register the right files with the query engine and helped support enterprise requirements such as regional data residency and encryption with customer-provided keys.</p><div><hr></div><h4>How Parquet fit the workload</h4><p>Parquet was not just a cheaper format choice. It was the thing that made interactive querying on S3 plausible.</p><p>Unlike MySQL&#8217;s row-oriented <a href="https://en.wikipedia.org/wiki/InnoDB">InnoDB</a> layout, Parquet is columnar. It stores each column contiguously and groups rows into row groups, with each row group containing column chunks. More importantly, Parquet files include metadata that query engines can exploit for pruning. File metadata carries offsets and sizes, while page-level metadata can include statistics such as min/max values and bloom filters.</p><p>That matters because Airtable&#8217;s queries were usually not broad analytical scans. They were targeted reads. If a query asks for a narrow range of <code>autoincr_id</code> values within one base, and the files are sorted on that column, the engine can inspect metadata and skip most row groups without reading them.</p><p>Airtable leaned directly into that. They kept the original schema mostly intact and preserved sorting by <code>autoincr_id</code>. Because of that, the engine could selectively download the relevant byte ranges from S3 rather than treat each Parquet file like a blob.</p><p>There was also a second major upside: compression. Thanks to the columnar layout, the archived dataset ended up about 10x smaller than the original MySQL version. That is a huge result on its own. Pair that with S3&#8217;s lower storage cost and the economics really shifted in Airtable&#8217;s favor.</p><div><hr></div><h4>Picking the right query engine</h4><p>Once the storage format was settled, Airtable benchmarked several engines capable of querying Parquet in S3:</p><ul><li><p>AWS Athena</p></li><li><p>DuckDB</p></li><li><p>StarRocks</p></li><li><p>DataFusion</p></li></ul><p>Athena was ruled out quickly for latency reasons. Its API pattern of starting a query and polling for completion made it better suited to general OLAP workloads than user-facing interactive queries. Airtable was seeing query latencies in the seconds, which was too slow for revision history use cases. It also lacked the strong isolation Airtable cared about across bases.</p><p>DuckDB was useful, but not ideal for this workload. They found that query planning did not always use projection pushdowns effectively, which sometimes led to full file downloads. Simple point queries on one <code>autoincr_id</code> could still be subsecond, but overall it trailed DataFusion. The team still used DuckDB heavily during development because it was convenient for debugging Parquet contents from the command line.</p><p>StarRocks produced performance results comparable to DataFusion, but it came with the operational burden of running a full-time cluster in Kubernetes to serve relatively low-QPS cold-storage queries. Like Athena, it also did not give Airtable the same kind of strong base-level isolation.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!2acq!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5b3365c-3555-4a3f-b846-885bf7347342_720x105.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!2acq!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5b3365c-3555-4a3f-b846-885bf7347342_720x105.webp 424w, https://substackcdn.com/image/fetch/$s_!2acq!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5b3365c-3555-4a3f-b846-885bf7347342_720x105.webp 848w, https://substackcdn.com/image/fetch/$s_!2acq!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5b3365c-3555-4a3f-b846-885bf7347342_720x105.webp 1272w, https://substackcdn.com/image/fetch/$s_!2acq!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5b3365c-3555-4a3f-b846-885bf7347342_720x105.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!2acq!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5b3365c-3555-4a3f-b846-885bf7347342_720x105.webp" width="720" height="105" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c5b3365c-3555-4a3f-b846-885bf7347342_720x105.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:105,&quot;width&quot;:720,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:8406,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/webp&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/194669935?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5b3365c-3555-4a3f-b846-885bf7347342_720x105.webp&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!2acq!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5b3365c-3555-4a3f-b846-885bf7347342_720x105.webp 424w, https://substackcdn.com/image/fetch/$s_!2acq!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5b3365c-3555-4a3f-b846-885bf7347342_720x105.webp 848w, https://substackcdn.com/image/fetch/$s_!2acq!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5b3365c-3555-4a3f-b846-885bf7347342_720x105.webp 1272w, https://substackcdn.com/image/fetch/$s_!2acq!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5b3365c-3555-4a3f-b846-885bf7347342_720x105.webp 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption">Query performance results (Source: Airtable)</figcaption></figure></div><p>That left DataFusion.</p><p>For Airtable, DataFusion hit the sweet spot. It was strong at exploiting Parquet metadata, it was extensible and because it is an embedded Rust library, the team could run it inside their existing worker architecture.</p><p>That brought a few clear benefits.</p><p>Operationally, there was no extra service to deploy and babysit. Isolation came for free because each base already had its own process boundary. And request affinity stayed high because the same workers kept serving the same bases, which later made caching hit rates excellent.</p><p>In other words, DataFusion fit Airtable&#8217;s architecture.</p><div><hr></div><h4>Migrating data out of MySQL</h4><p>Designing the new storage layer was one thing. Moving petabytes of live data into it without breaking anything was another.</p><p>Airtable wanted a one-time migration process that would start cutting MySQL storage costs quickly. To get a consistent export view, they used <a href="https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_ExportSnapshot.html">AWS RDS snapshot capabilities</a>, which produced large Parquet files of full tables. They had also prototyped direct SQL extraction into Parquet, but chose not to productionize that approach at scale. Snapshots were preferred because they ran against backup instances and avoided extra pressure on production systems.</p><p>The challenge was that these snapshots were massive table-level exports across many shards, while Airtable&#8217;s serving model required files partitioned by base.</p><p>So the team added a repartitioning and compaction pipeline.</p><p>First, Flink jobs parallelized across shard snapshots and repartitioned records by base into intermediate S3 directories. Then AWS Step Functions scanned those intermediate outputs and enqueued bases into SQS. From there, custom compactor code merged the files, merge-sorted them, deduplicated records and produced final serving Parquet files capped at 1GB each.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!YQr0!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F303b0a8f-3f61-473c-a08c-b54e818cee02_720x475.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!YQr0!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F303b0a8f-3f61-473c-a08c-b54e818cee02_720x475.webp 424w, https://substackcdn.com/image/fetch/$s_!YQr0!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F303b0a8f-3f61-473c-a08c-b54e818cee02_720x475.webp 848w, https://substackcdn.com/image/fetch/$s_!YQr0!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F303b0a8f-3f61-473c-a08c-b54e818cee02_720x475.webp 1272w, https://substackcdn.com/image/fetch/$s_!YQr0!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F303b0a8f-3f61-473c-a08c-b54e818cee02_720x475.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!YQr0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F303b0a8f-3f61-473c-a08c-b54e818cee02_720x475.webp" width="720" height="475" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/303b0a8f-3f61-473c-a08c-b54e818cee02_720x475.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:475,&quot;width&quot;:720,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:26436,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/webp&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/194669935?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F303b0a8f-3f61-473c-a08c-b54e818cee02_720x475.webp&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!YQr0!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F303b0a8f-3f61-473c-a08c-b54e818cee02_720x475.webp 424w, https://substackcdn.com/image/fetch/$s_!YQr0!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F303b0a8f-3f61-473c-a08c-b54e818cee02_720x475.webp 848w, https://substackcdn.com/image/fetch/$s_!YQr0!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F303b0a8f-3f61-473c-a08c-b54e818cee02_720x475.webp 1272w, https://substackcdn.com/image/fetch/$s_!YQr0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F303b0a8f-3f61-473c-a08c-b54e818cee02_720x475.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Overview of the migration process (Source: Airtable)</figcaption></figure></div><p>That 1GB size was not arbitrary. It was chosen through benchmarking as a good serving size with a useful density of page groups per file. Again, a small design detail with big latency implications.</p><div><hr></div><h4>Validating the migration</h4><p>A migration like this lives or dies on validation.</p><p>Airtable first ran bulk validation to confirm that data had not been corrupted during export, repartitioning and compaction. For this, they spun up a StarRocks cluster and compared the serving Parquet files against the original RDS snapshots, finding zero cases of data corruption.</p><p>We covered this validation approach in more detail in an earlier piece on how Airtable made archive validation work at petabyte scale. </p><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;d35a32c3-a903-40ce-8426-c018883774af&quot;,&quot;caption&quot;:&quot;&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;lg&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;How Airtable Made Archive Validation Work at Petabyte Scale&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:291590442,&quot;name&quot;:&quot;Data Tinkerer&quot;,&quot;bio&quot;:&quot;Head of analytics sharing deep dives and learnings about AI and all things data (science, engineering, analysis)&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/83d3bbe0-5fb8-4f8d-9b74-036abbd6fec9_500x500.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2025-04-10T06:22:40.886Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/$s_!T_KN!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F682b198e-3ef1-40ee-b823-b52f80699458_1024x1024.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.datatinkerer.io/p/how-airtable-made-archive-validation-work-at-petabyte-scale&quot;,&quot;section_name&quot;:&quot;Data Engineering&quot;,&quot;video_upload_id&quot;:null,&quot;id&quot;:160983286,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:3,&quot;comment_count&quot;:0,&quot;publication_id&quot;:3422740,&quot;publication_name&quot;:&quot;Data Tinkerer&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!JEdj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6bea5ccd-f356-4154-a1db-16268300510e_500x500.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><p>The important point is that bulk validation gave the team confidence that the migration pipeline itself had preserved the data correctly.</p><p>That is a strong result, though it only answered one part of the problem. The bigger challenge was all the new storage client logic that now sat around the data:</p><ul><li><p>a Rust query engine built with DataFusion</p></li><li><p>integration into Node.js using napi-rs</p></li><li><p>logic to combine MySQL and S3 results</p></li><li><p>logic to identify which S3 files to read</p></li><li><p>support for enterprise features like customer keys, data residency and hard deletes</p></li></ul><p>Bulk validation could confirm the data files were right. It could not prove that the full end-to-end user experience would behave exactly as before.</p><p>So Airtable moved to shadow validation on live traffic. Requests continued reading from MySQL as normal, while the same queries were also executed in the background through the new system. That let the team compare outputs under real conditions and catch implementation issues before rollout.</p><p>The bugs they found were exactly the sort of things that show up when systems cross language runtimes and execution models:</p><ul><li><p>float precision mismatches between JavaScript and Rust&#8217;s serde JSON handling</p></li><li><p>a sorting issue where DataFusion used lexicographic rather than numeric ordering</p></li><li><p>a crashing <code>SIGABRT</code> issue tied to async napi-rs and Node.js worker threads</p></li><li><p>latency problems</p></li></ul><p>They resolved these before launch and before deleting the MySQL copies of the migrated data.</p><div><hr></div><h4>Fixing latency bottlenecks</h4><p>Once staged rollout began, latency became the main battleground.</p><p>That is not surprising. S3-backed query systems usually do not fail because storage is too expensive. They fail because network round trips and scan inefficiencies make them annoyingly slow.</p><p>Airtable saw a mix of problems: inefficient query plans, too many S3 requests and cases where sparse filters still caused too much data to be downloaded. They responded with a few targeted optimizations.</p><p><strong>Building a tiered cache for archive queries</strong></p><p>Caching turned out to be one of the biggest wins.</p><p>Under the hood, DataFusion translates SQL queries into S3 GET operations. It fetches Parquet footer metadata, column chunk metadata and then decides what row groups and byte ranges need to be read. If every step involves another network trip, latency stacks up quickly.</p><p>So Airtable built a tiered caching system.</p><p>The first layer used DataFusion&#8217;s built-in cache support to store Parquet file metadata and S3 <code>ListObjects</code> results.</p><p>The second layer cached additional Parquet page header metadata in memory. Combined with the first layer, this reduced how often the engine had to round-trip to S3 during query planning. Airtable wrote a custom implementation around DataFusion&#8217;s parquet reader interfaces, which let them cache metadata results directly and add instrumentation. The result was a reported 99%+ cache hit ratio.</p><p>That number is believable in context because DataFusion ran inside per-base workers and the files themselves were partitioned by base. The system had strong locality by design.</p><p>Finally, Airtable added an on-disk cache for full Parquet files. This was reserved for a very small number of heavy bases with bad enough query patterns to justify the extra work and cost. Unlike metadata caching, downloading whole files is not something you want to do casually. But for outlier cases, it gave the team another escape hatch.</p><p><strong>Building custom indexes for sparse queries</strong></p><p>Not every query could be handled efficiently by the base file layout alone.</p><p>Most reads were anchored on <code>autoincr_id</code>, but Airtable also had filters on other fields that could reduce result sets dramatically. Examples included filtering by action type, filtering by row or excluding sync-generated updates.</p><p>For some bases, those additional conditions matched only a tiny slice of rows. In those cases, even if <code>autoincr_id</code> helped somewhat, reading broad sections of Parquet files was still wasteful.</p><p>So Airtable built a secondary indexing system.</p><p>Using DataFusion, they scanned Parquet files and wrote index data out as new Parquet files. The client layer knew how to query those indexes first, then use the result to build a more targeted query against the original archive files.</p><p>This was much easier to do because the data was effectively read-only. Airtable did not need to solve the usual headache of synchronizing constantly changing base tables and secondary indexes. Static data makes a lot of index ideas suddenly practical.</p><p><strong>Using Bloom Filters for faster lookups</strong></p><p>There was one more edge case: lower-QPS point lookups on a different unique identifier that was randomly distributed.</p><p>That broke the usual min/max pruning strategy. Since the identifier values were not ordered, Parquet statistics were not helpful. Without another technique, the engine would need to fetch and scan every page group before applying the filter.</p><p>Airtable could have solved this with another custom index, but they chose a simpler route: <a href="https://parquet.apache.org/docs/file-format/bloomfilter/">Parquet bloom filters</a>.</p><p>Bloom filters are probabilistic membership structures. They can tell you if a value is definitely not present, or maybe present. False positives are possible. False negatives are not.</p><p>That property is enough for pruning. If the bloom filter says a page group definitely does not contain the target identifier, the engine can skip it safely. DataFusion already understood Parquet bloom filter metadata, so Airtable could rely on native support instead of bolting on another indexing layer.</p><div><hr></div><h4>Conclusion</h4><p>Airtable&#8217;s storage team took a dataset that had clearly outgrown MySQL&#8217;s economics and built a system that matched the workload far better.</p><p>They moved petabytes of archive data out of MySQL, kept recent data in the transactional store, archived old rows into base-partitioned Parquet files on S3 and queried those files with an embedded DataFusion engine. Along the way, they layered in DynamoDB metadata registration, a large-scale migration pipeline, bulk and shadow validation, multiple caching layers, custom secondary indexes and bloom filter-based pruning.</p><p>The result was a storage system that stayed durable and queryable at interactive latency while cutting storage costs by around 100x and saving millions of dollars per year.</p><p>There is still more to do. Airtable&#8217;s first implementation focused on bulk migration so they could start saving money quickly. The longer-term goal is incremental archiving, likely through a CDC-style system such as Flink. That opens up a new set of engineering problems around compaction, index rebuilds and operations. There are also other log-like tables that could be migrated onto the same platform.</p><p>Still, the core idea is already proven.</p><p>If a dataset is mostly cold, mostly read-only and queried through a narrow set of predictable access patterns, keeping it in an expensive OLTP database is often just inertia dressed up as architecture. Airtable looked at the shape of the workload, changed the storage model to match it and got the kind of result every infra team wants: better economics without making the product worse.</p><div><hr></div><h3>The full scoop</h3><p>To learn more about this, check <a href="https://medium.com/airtable-eng/how-we-reduced-archive-storage-costs-by-100x-and-saved-millions-21754b5a6c8e">Airtable's Engineering Blog</a> post on this topic</p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.datatinkerer.io/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">If you liked this post and don&#8217;t want to miss the next one, subscribe to Data Tinkerer!</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>If you are already subscribed and enjoyed the article, please give it a like and/or share it others, really appreciate it &#128591;</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.datatinkerer.io/p/how-airtable-saved-millions-by-cutting?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.datatinkerer.io/p/how-airtable-saved-millions-by-cutting?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><div><hr></div><h3>Keep learning</h3><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;1bbcb654-6113-4e9e-8400-1e3da752d647&quot;,&quot;caption&quot;:&quot;Notion scaled AI Q&amp;amp;A to millions of workspaces while increasing onboarding throughput 600x and cutting costs by up to 90%.<br /><br />Under the hood, that meant rethinking everything from sharding and indexing to embeddings generation, moving from a dual Spark + API setup to a simpler, unified pipeline.<br /><br />This piece breaks down how they handled multi-tenant vector search at scale, avoided unnecessary recomputation and rebuilt their search stack to be faster and easier to operate.&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;lg&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;How Notion Scaled AI Q&amp;A to Millions of Workspaces&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:291590442,&quot;name&quot;:&quot;Data Tinkerer&quot;,&quot;bio&quot;:&quot;Head of analytics sharing deep dives and learnings about AI and all things data (science, engineering, analysis)&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/83d3bbe0-5fb8-4f8d-9b74-036abbd6fec9_500x500.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2026-03-26T04:00:33.287Z&quot;,&quot;cover_image&quot;:&quot;https://images.unsplash.com/photo-1683114010575-3ead4403180e?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxfHxub3Rpb258ZW58MHx8fHwxNzc0MjU5NjgzfDA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.datatinkerer.io/p/how-notion-scaled-ai-q-and-a-to-millions-of-workspaces&quot;,&quot;section_name&quot;:&quot;Data Engineering&quot;,&quot;video_upload_id&quot;:null,&quot;id&quot;:191742179,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:11,&quot;comment_count&quot;:0,&quot;publication_id&quot;:3422740,&quot;publication_name&quot;:&quot;Data Tinkerer&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!JEdj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6bea5ccd-f356-4154-a1db-16268300510e_500x500.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;0dbf0d77-87fd-4655-82da-31cc841a6d73&quot;,&quot;caption&quot;:&quot;LinkedIn pushed Venice to handle 175M+ lookups per second while ingesting 230M writes per second.<br /><br />This piece breaks down how they balanced compaction, CPU bottlenecks and adaptive throttling to scale ingestion under eventual consistency.&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;lg&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;How LinkedIn Built a Pipeline That Scales to 230M Records/sec Without Breaking SLAs&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:291590442,&quot;name&quot;:&quot;Data Tinkerer&quot;,&quot;bio&quot;:&quot;Head of analytics sharing deep dives and learnings about AI and all things data (science, engineering, analysis)&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/83d3bbe0-5fb8-4f8d-9b74-036abbd6fec9_500x500.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2026-02-19T04:00:52.353Z&quot;,&quot;cover_image&quot;:&quot;https://images.unsplash.com/photo-1712217559097-cc2aaf698767?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxM3x8bGlua2VkaW58ZW58MHx8fHwxNzcxMTMwNjQ1fDA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.datatinkerer.io/p/how-linkedin-built-a-pipeline-that-scales-to-230-million-records&quot;,&quot;section_name&quot;:&quot;Data Engineering&quot;,&quot;video_upload_id&quot;:null,&quot;id&quot;:187999868,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:10,&quot;comment_count&quot;:0,&quot;publication_id&quot;:3422740,&quot;publication_name&quot;:&quot;Data Tinkerer&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!JEdj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6bea5ccd-f356-4154-a1db-16268300510e_500x500.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div>]]></content:encoded></item><item><title><![CDATA[How Notion Scaled AI Q&A to Millions of Workspaces]]></title><description><![CDATA[Kafka, Spark and Ray powering low-latency, high-throughput search pipelines]]></description><link>https://www.datatinkerer.io/p/how-notion-scaled-ai-q-and-a-to-millions-of-workspaces</link><guid isPermaLink="false">https://www.datatinkerer.io/p/how-notion-scaled-ai-q-and-a-to-millions-of-workspaces</guid><dc:creator><![CDATA[Data Tinkerer]]></dc:creator><pubDate>Thu, 26 Mar 2026 04:00:33 GMT</pubDate><enclosure url="https://images.unsplash.com/photo-1683114010575-3ead4403180e?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxfHxub3Rpb258ZW58MHx8fHwxNzc0MjU5NjgzfDA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Fellow Data Tinkerers</p><p>Today we will look at how Notion scaled its AI Q&amp;A to millions of users.</p><p>But before that, I wanted to share with you what you could unlock if you share Data Tinkerer with just <strong>1 more person</strong>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!HsBV!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F084aaae7-58c7-464e-b0f8-7c31523985ef_800x402.gif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!HsBV!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F084aaae7-58c7-464e-b0f8-7c31523985ef_800x402.gif 424w, https://substackcdn.com/image/fetch/$s_!HsBV!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F084aaae7-58c7-464e-b0f8-7c31523985ef_800x402.gif 848w, https://substackcdn.com/image/fetch/$s_!HsBV!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F084aaae7-58c7-464e-b0f8-7c31523985ef_800x402.gif 1272w, https://substackcdn.com/image/fetch/$s_!HsBV!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F084aaae7-58c7-464e-b0f8-7c31523985ef_800x402.gif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!HsBV!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F084aaae7-58c7-464e-b0f8-7c31523985ef_800x402.gif" width="800" height="402" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/084aaae7-58c7-464e-b0f8-7c31523985ef_800x402.gif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:402,&quot;width&quot;:800,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:3369150,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/gif&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/191742179?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F084aaae7-58c7-464e-b0f8-7c31523985ef_800x402.gif&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!HsBV!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F084aaae7-58c7-464e-b0f8-7c31523985ef_800x402.gif 424w, https://substackcdn.com/image/fetch/$s_!HsBV!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F084aaae7-58c7-464e-b0f8-7c31523985ef_800x402.gif 848w, https://substackcdn.com/image/fetch/$s_!HsBV!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F084aaae7-58c7-464e-b0f8-7c31523985ef_800x402.gif 1272w, https://substackcdn.com/image/fetch/$s_!HsBV!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F084aaae7-58c7-464e-b0f8-7c31523985ef_800x402.gif 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>There are 100+ resources to learn all things data (science, engineering, analysis). It includes videos, courses, projects and can be filtered by tech stack (Python, SQL, Spark and etc), skill level (Beginner, Intermediate and so on) provider name or free/paid. So if you know other people who like staying up to date on all things data, please share Data Tinkerer with them!</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.datatinkerer.io/leaderboard?&amp;utm_source=post&quot;,&quot;text&quot;:&quot;Refer a friend&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.datatinkerer.io/leaderboard?&amp;utm_source=post"><span>Refer a friend</span></a></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://images.unsplash.com/photo-1683114010575-3ead4403180e?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxfHxub3Rpb258ZW58MHx8fHwxNzc0MjU5NjgzfDA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://images.unsplash.com/photo-1683114010575-3ead4403180e?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxfHxub3Rpb258ZW58MHx8fHwxNzc0MjU5NjgzfDA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 424w, https://images.unsplash.com/photo-1683114010575-3ead4403180e?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxfHxub3Rpb258ZW58MHx8fHwxNzc0MjU5NjgzfDA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 848w, https://images.unsplash.com/photo-1683114010575-3ead4403180e?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxfHxub3Rpb258ZW58MHx8fHwxNzc0MjU5NjgzfDA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 1272w, https://images.unsplash.com/photo-1683114010575-3ead4403180e?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxfHxub3Rpb258ZW58MHx8fHwxNzc0MjU5NjgzfDA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 1456w" sizes="100vw"><img src="https://images.unsplash.com/photo-1683114010575-3ead4403180e?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxfHxub3Rpb258ZW58MHx8fHwxNzc0MjU5NjgzfDA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080" width="3840" height="2160" data-attrs="{&quot;src&quot;:&quot;https://images.unsplash.com/photo-1683114010575-3ead4403180e?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxfHxub3Rpb258ZW58MHx8fHwxNzc0MjU5NjgzfDA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:2160,&quot;width&quot;:3840,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;a black and white block with the letter n on it&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="a black and white block with the letter n on it" title="a black and white block with the letter n on it" srcset="https://images.unsplash.com/photo-1683114010575-3ead4403180e?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxfHxub3Rpb258ZW58MHx8fHwxNzc0MjU5NjgzfDA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 424w, https://images.unsplash.com/photo-1683114010575-3ead4403180e?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxfHxub3Rpb258ZW58MHx8fHwxNzc0MjU5NjgzfDA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 848w, https://images.unsplash.com/photo-1683114010575-3ead4403180e?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxfHxub3Rpb258ZW58MHx8fHwxNzc0MjU5NjgzfDA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 1272w, https://images.unsplash.com/photo-1683114010575-3ead4403180e?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxfHxub3Rpb258ZW58MHx8fHwxNzc0MjU5NjgzfDA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Photo by <a href="https://unsplash.com/@maria_shalabaieva">Mariia Shalabaieva</a> on <a href="https://unsplash.com">Unsplash</a></figcaption></figure></div><p>Now, with that out of the way, let&#8217;s get to Notion&#8217;s AI Q&amp;A level up!</p><h3>TL;DR</h3><div><hr></div><h4><strong>Situation</strong></h4><p>Notion launched AI Q&amp;A on top of vector search and quickly faced massive demand across millions of workspaces. The initial system hit limits in capacity, onboarding speed and cost.</p><h4><strong>Task</strong></h4><p>Scale onboarding, keep indexes fresh and reduce rising infrastructure costs. At the same time, simplify a growingly complex architecture without hurting latency.</p><h4><strong>Action</strong></h4><p>They introduced dual ingestion paths, generation-based indexing, serverless architecture and migrated to turbopuffer. Then reduced recomputation with page state tracking and moved embeddings to Ray for unified compute.</p><h4><strong>Result</strong></h4><p>600x onboarding growth, 15x workspace growth and major cost reductions across layers. Latency improved and the system became simpler and more efficient.</p><h4><strong>Use Cases</strong></h4><p>Real-time search indexing, semantic search, document retrieval</p><h4><strong>Tech Stack/Framework</strong></h4><p>Apache Spark, AWS EMR, Apache Airflow, Apache Kafka, AWS S3, DynamoDB, Ray, turbopuffer</p><div><hr></div><h3>Explained further</h3><div><hr></div><h4>Context</h4><p>When <a href="https://www.notion.com/blog/introducing-q-and-a">Notion launched AI Q&amp;A</a> in November 2023, the core idea sounded simple enough: let people ask natural-language questions and retrieve relevant knowledge from across their workspace and connected tools. In practice, that meant building a vector search system that could ingest huge amounts of content, stay fresh as pages changed and do all of it at a cost that made sense at Notion scale.</p><p>That is the real story here. Not just &#8220;vector search powers AI&#8221; but what happens after launch, when adoption jumps faster than expected and the infrastructure underneath has to keep up. Over two years, the Notion team pushed that system through several big transitions: scaling onboarding, dealing with storage pressure, changing database architecture, reworking indexing logic and moving embeddings workloads onto Ray. The headline numbers are hard to ignore: 10x scale and roughly one-tenth the cost.</p><p>This is a good example of how modern AI infrastructure usually evolves. The first version gets the product live. The next few versions are about survival, then simplification, then cost, then latency, then getting rid of all the awkward bits that built up during the rush.</p><div><hr></div><h4>Vector search, explained through Notion&#8217;s lens</h4><p>Traditional keyword search is literal. It works when users type the exact words that exist in the content. It starts falling apart when the wording changes but the meaning stays the same. Someone searching for &#8220;team meeting notes&#8221; may still want a page called &#8220;group standup summary,&#8221; but keyword search does not naturally understand that those are closely related.</p><p>Vector search solves that by representing text as embeddings. Instead of storing only words, it maps text into a high-dimensional space where semantically similar ideas sit closer together. That means retrieval is based on meaning, not exact phrasing.</p><p>For Notion AI, this matters a lot. The system needs to answer questions in natural language by finding useful content across a workspace and even across connected sources like Slack and Google Drive. That is exactly the sort of setup where semantic retrieval becomes more useful than plain lexical matching. A user is not thinking about the title of the page or the exact phrasing inside a paragraph. They are asking a question in their own words and expecting the system to bridge the gap.</p><p>That expectation becomes expensive very quickly.</p><div><hr></div><h4>Part 1: Scaling beyond what the original system expected</h4><p>At launch, Notion&#8217;s ingestion and indexing pipeline had two paths.</p><p>The first was an offline path. Batch jobs running on Apache Spark would chunk existing documents, generate embeddings through an API and bulk-load those vectors into the vector database. This handled the heavy lifting for backfilling large amounts of existing content.</p><p>The second was an online path. Kafka consumers processed page edits in near real time so live workspaces stayed up to date with sub-minute latency.</p><p>It is a practical split. The offline side handles the backlog and large initial loads. The online side keeps things fresh once a workspace is active. Together, the two-path setup gave Notion a way to onboard workspaces at scale without sacrificing freshness for day-to-day edits.</p><p>The vector database itself ran on dedicated &#8216;pod&#8217; clusters, where storage and compute were coupled. The Notion team designed sharding in a way that echoed their Postgres setup: workspace ID was the partitioning key, routing used range-based partitioning and a single config referenced all shards.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!zNlu!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff165c24e-314a-4bfb-ae67-84e69f2b5c63_2172x1428.avif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!zNlu!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff165c24e-314a-4bfb-ae67-84e69f2b5c63_2172x1428.avif 424w, https://substackcdn.com/image/fetch/$s_!zNlu!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff165c24e-314a-4bfb-ae67-84e69f2b5c63_2172x1428.avif 848w, https://substackcdn.com/image/fetch/$s_!zNlu!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff165c24e-314a-4bfb-ae67-84e69f2b5c63_2172x1428.avif 1272w, https://substackcdn.com/image/fetch/$s_!zNlu!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff165c24e-314a-4bfb-ae67-84e69f2b5c63_2172x1428.avif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!zNlu!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff165c24e-314a-4bfb-ae67-84e69f2b5c63_2172x1428.avif" width="1456" height="957" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f165c24e-314a-4bfb-ae67-84e69f2b5c63_2172x1428.avif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:957,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:12663,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/avif&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/191742179?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff165c24e-314a-4bfb-ae67-84e69f2b5c63_2172x1428.avif&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!zNlu!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff165c24e-314a-4bfb-ae67-84e69f2b5c63_2172x1428.avif 424w, https://substackcdn.com/image/fetch/$s_!zNlu!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff165c24e-314a-4bfb-ae67-84e69f2b5c63_2172x1428.avif 848w, https://substackcdn.com/image/fetch/$s_!zNlu!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff165c24e-314a-4bfb-ae67-84e69f2b5c63_2172x1428.avif 1272w, https://substackcdn.com/image/fetch/$s_!zNlu!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff165c24e-314a-4bfb-ae67-84e69f2b5c63_2172x1428.avif 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">pipelines writing into sharded vector database pods (Source: Notion)</figcaption></figure></div><p>That all made sense on paper. Then the product launched and demand was overwhelming.</p><p>Notion quickly built up a waitlist of millions of workspaces that wanted access to Q&amp;A. The problem was no longer whether the system worked. It was how fast it could onboard people without cracking under the pressure.</p><p><strong>When the indexes started to fill up</strong></p><p>Only a month after launch, the original indexes were already nearing capacity.</p><p>That is the kind of problem that sounds good in product meetings and bad in infrastructure meetings. If the indexes filled up, Notion would have to pause onboarding. That would slow down rollout and delay access for everyone waiting.</p><p>The team had two obvious options.</p><p>One was to re-shard incrementally. Clone data into another index, delete half, repeat and keep doing that every couple of weeks as new customers came in.</p><p>The other was to re-shard for the final expected volume. But their vector database provider charged for uptime, so over-provisioning would have been painfully expensive.</p><p>Instead, the Notion team went with a third approach. When a set of indexes got close to full, they provisioned a new set and directed all newly onboarded workspaces there. Each set was assigned a generation ID, which determined where reads and writes should go.</p><p>It is not the prettiest long-term design, but it was a smart short-term move. It avoided repeated re-shard operations and kept onboarding moving. Sometimes the right scaling decision is not the most elegant one. It is the one that buys breathing room without stopping the business.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!a8zu!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4100f229-aff6-4fd9-8fa3-f35e4878a0aa_1990x1218.avif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!a8zu!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4100f229-aff6-4fd9-8fa3-f35e4878a0aa_1990x1218.avif 424w, https://substackcdn.com/image/fetch/$s_!a8zu!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4100f229-aff6-4fd9-8fa3-f35e4878a0aa_1990x1218.avif 848w, https://substackcdn.com/image/fetch/$s_!a8zu!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4100f229-aff6-4fd9-8fa3-f35e4878a0aa_1990x1218.avif 1272w, https://substackcdn.com/image/fetch/$s_!a8zu!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4100f229-aff6-4fd9-8fa3-f35e4878a0aa_1990x1218.avif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!a8zu!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4100f229-aff6-4fd9-8fa3-f35e4878a0aa_1990x1218.avif" width="1456" height="891" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4100f229-aff6-4fd9-8fa3-f35e4878a0aa_1990x1218.avif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:891,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:16313,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/avif&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/191742179?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4100f229-aff6-4fd9-8fa3-f35e4878a0aa_1990x1218.avif&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!a8zu!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4100f229-aff6-4fd9-8fa3-f35e4878a0aa_1990x1218.avif 424w, https://substackcdn.com/image/fetch/$s_!a8zu!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4100f229-aff6-4fd9-8fa3-f35e4878a0aa_1990x1218.avif 848w, https://substackcdn.com/image/fetch/$s_!a8zu!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4100f229-aff6-4fd9-8fa3-f35e4878a0aa_1990x1218.avif 1272w, https://substackcdn.com/image/fetch/$s_!a8zu!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4100f229-aff6-4fd9-8fa3-f35e4878a0aa_1990x1218.avif 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">New index &#8216;generations&#8217; added as capacity fills, routing new workspaces without re-sharding. (Source: Notion)</figcaption></figure></div><p><strong>Turning onboarding into a throughput problem</strong></p><p>Even with the architecture in place, the initial onboarding rate was nowhere near enough. At launch, Notion could onboard only a few hundred workspaces per day. At that pace, clearing a multi-million waitlist would have taken decades which is obviously not a real option.</p><p>So the team pushed hard on throughput. Using Airflow scheduling, pipelining and Spark job tuning, they dramatically increased capacity.</p><p>The results were big:</p><ul><li><p>Daily onboarding capacity increased by <strong>600x</strong></p></li><li><p>Active workspaces grew <strong>15x</strong></p></li><li><p>Vector database capacity expanded <strong>8x</strong></p></li></ul><p>By April 2024, the Q&amp;A waitlist was cleared.</p><p>That is the kind of milestone that looks clean in hindsight but it came with a cost. Managing multiple generations of databases helped during the hypergrowth phase but it also added operational complexity and financial overhead. The team had solved the immediate scaling problem, but the architecture was starting to feel heavy.</p><p>That set up the next phase of the story.</p><div><hr></div><h4>Part 2: Cost becomes the next constraint</h4><p>In May 2024, Notion migrated its embeddings workload from the original dedicated &#8216;pod&#8217; architecture to a serverless setup that decoupled storage from compute and charged based on usage instead of uptime.</p><p>The effect was immediate. Costs dropped by 50 percent from peak usage, translating into several millions of dollars in annual savings.</p><p>That alone would have made the migration worthwhile, but the serverless design also fixed two practical problems. First, it removed the storage capacity constraints that had become a serious scaling bottleneck. Second, it simplified operations because the team no longer had to provision capacity ahead of demand.</p><p>Still, even after cutting costs in half, the annual run rate for vector database spend was still in the millions. From an engineering point of view, this is where things get interesting. The easy win had already happened. Now the team had to go after deeper structural gains.</p><p><strong>A new search foundation (turbopuffer)</strong></p><p>While working on the first round of savings, Notion also evaluated alternative search engines. <a href="https://turbopuffer.com/">turbopuffer</a> stood out because it offered significantly lower projected costs.</p><p>At the time, turbopuffer was a newer player in search. Its architecture was built on object storage with a focus on cost-efficiency and performance. It also supported both managed and bring-your-own-cloud deployment models and it made bulk modification of stored vector objects easier.</p><p>That combination lined up well with what Notion needed.</p><p>After a successful evaluation, the team decided to migrate its entire multi-billion-object workload to turbopuffer in late 2024. Since they were already making a provider switch, they used the migration as a chance to clean up the broader architecture too.</p><p>Several changes happened together.</p><p>First, they fully re-indexed the corpus, increasing write throughput in the offline indexing pipeline to rebuild everything in turbopuffer.</p><p>Second, they upgraded the embeddings model during the migration to be more performant.</p><p>Third, they simplified the architecture. turbopuffer treats each namespace as an independent index which removed the need to think about sharding and generation-based routing in the same way as before.</p><p>Finally, they handled the cutover gradually, migrating one generation at a time and validating correctness before moving on.</p><p>This is a strong pattern: if a migration is painful anyway, use it to pay off other infrastructure debt at the same time.</p><p>The outcome was solid on several fronts:</p><ul><li><p><strong>60 percent cost reduction</strong> on search engine spend</p></li><li><p><strong>35 percent reduction</strong> in AWS EMR compute costs</p></li><li><p>p50 production query latency <strong>improved from 70&#8211;100ms to 50&#8211;70ms</strong></p></li></ul><p>That is a meaningful improvement across cost and performance, which is not always easy to pull off together.</p><p><strong>Avoiding full reprocessing with page state tracking</strong></p><p>The next optimization went after a very expensive inefficiency in the indexing pipeline.</p><p>Notion pages can be long, so the team chunks each page into spans, embeds each span and stores those vectors with metadata such as authors and permissions. In the original implementation, any edit to a page or its properties triggered a full re-chunk, full re-embed and full re-upload of all spans on that page.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ytMS!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0ca76a6-5c49-4f2b-ba2d-f612690b6b83_1000x189.gif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ytMS!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0ca76a6-5c49-4f2b-ba2d-f612690b6b83_1000x189.gif 424w, https://substackcdn.com/image/fetch/$s_!ytMS!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0ca76a6-5c49-4f2b-ba2d-f612690b6b83_1000x189.gif 848w, https://substackcdn.com/image/fetch/$s_!ytMS!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0ca76a6-5c49-4f2b-ba2d-f612690b6b83_1000x189.gif 1272w, https://substackcdn.com/image/fetch/$s_!ytMS!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0ca76a6-5c49-4f2b-ba2d-f612690b6b83_1000x189.gif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ytMS!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0ca76a6-5c49-4f2b-ba2d-f612690b6b83_1000x189.gif" width="1000" height="189" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e0ca76a6-5c49-4f2b-ba2d-f612690b6b83_1000x189.gif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:189,&quot;width&quot;:1000,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2615711,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/gif&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/191742179?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0ca76a6-5c49-4f2b-ba2d-f612690b6b83_1000x189.gif&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ytMS!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0ca76a6-5c49-4f2b-ba2d-f612690b6b83_1000x189.gif 424w, https://substackcdn.com/image/fetch/$s_!ytMS!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0ca76a6-5c49-4f2b-ba2d-f612690b6b83_1000x189.gif 848w, https://substackcdn.com/image/fetch/$s_!ytMS!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0ca76a6-5c49-4f2b-ba2d-f612690b6b83_1000x189.gif 1272w, https://substackcdn.com/image/fetch/$s_!ytMS!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0ca76a6-5c49-4f2b-ba2d-f612690b6b83_1000x189.gif 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption">Page &#8594; chunking &#8594; embedding &#8594; vector DB with full reprocessing on every edit. (Source: Notion)</figcaption></figure></div><p>That meant even a tiny change could trigger a lot of unnecessary work.</p><p>The team narrowed the problem down to two things that actually mattered:</p><ol><li><p>The page text changes which means embeddings need updating</p></li><li><p>The metadata changes which means metadata needs updating</p></li></ol><p>To detect those cases, they tracked two hashes per span: one hash for the span text and another for the metadata fields. They chose 64-bit xxHash because it offered a good balance of speed, simplicity, low collision risk and storage footprint.</p><p>For caching, they used DynamoDB. Each page had one record containing the state of all spans on that page, including text and metadata hashes.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Mj4k!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8731bd9-7cf6-489b-bd3b-66077e710a40_1396x858.avif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Mj4k!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8731bd9-7cf6-489b-bd3b-66077e710a40_1396x858.avif 424w, https://substackcdn.com/image/fetch/$s_!Mj4k!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8731bd9-7cf6-489b-bd3b-66077e710a40_1396x858.avif 848w, https://substackcdn.com/image/fetch/$s_!Mj4k!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8731bd9-7cf6-489b-bd3b-66077e710a40_1396x858.avif 1272w, https://substackcdn.com/image/fetch/$s_!Mj4k!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8731bd9-7cf6-489b-bd3b-66077e710a40_1396x858.avif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Mj4k!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8731bd9-7cf6-489b-bd3b-66077e710a40_1396x858.avif" width="1396" height="858" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e8731bd9-7cf6-489b-bd3b-66077e710a40_1396x858.avif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:858,&quot;width&quot;:1396,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:23088,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/avif&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/191742179?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8731bd9-7cf6-489b-bd3b-66077e710a40_1396x858.avif&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Mj4k!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8731bd9-7cf6-489b-bd3b-66077e710a40_1396x858.avif 424w, https://substackcdn.com/image/fetch/$s_!Mj4k!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8731bd9-7cf6-489b-bd3b-66077e710a40_1396x858.avif 848w, https://substackcdn.com/image/fetch/$s_!Mj4k!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8731bd9-7cf6-489b-bd3b-66077e710a40_1396x858.avif 1272w, https://substackcdn.com/image/fetch/$s_!Mj4k!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8731bd9-7cf6-489b-bd3b-66077e710a40_1396x858.avif 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Span-level hashing (text + metadata) with DynamoDB state to detect and update only changed spans. (Source: Notion)</figcaption></figure></div><p>The win came from using that state to avoid unnecessary work.</p><p><strong>Case 1: The page text changes</strong></p><p>Imagine Herman Melville editing <em>Moby Dick</em> halfway through a page. Before this improvement, the whole page would have been re-embedded and reloaded. After the change, the system chunks the page, fetches the previous state from DynamoDB and compares text hashes span by span. It can then detect which spans actually changed and only re-embed and reload those.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!xTeN!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8fd33692-a8bf-4d28-bcd7-96a5201504f5_1000x151.gif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!xTeN!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8fd33692-a8bf-4d28-bcd7-96a5201504f5_1000x151.gif 424w, https://substackcdn.com/image/fetch/$s_!xTeN!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8fd33692-a8bf-4d28-bcd7-96a5201504f5_1000x151.gif 848w, https://substackcdn.com/image/fetch/$s_!xTeN!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8fd33692-a8bf-4d28-bcd7-96a5201504f5_1000x151.gif 1272w, https://substackcdn.com/image/fetch/$s_!xTeN!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8fd33692-a8bf-4d28-bcd7-96a5201504f5_1000x151.gif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!xTeN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8fd33692-a8bf-4d28-bcd7-96a5201504f5_1000x151.gif" width="1000" height="151" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8fd33692-a8bf-4d28-bcd7-96a5201504f5_1000x151.gif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:151,&quot;width&quot;:1000,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1891331,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/gif&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/191742179?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8fd33692-a8bf-4d28-bcd7-96a5201504f5_1000x151.gif&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!xTeN!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8fd33692-a8bf-4d28-bcd7-96a5201504f5_1000x151.gif 424w, https://substackcdn.com/image/fetch/$s_!xTeN!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8fd33692-a8bf-4d28-bcd7-96a5201504f5_1000x151.gif 848w, https://substackcdn.com/image/fetch/$s_!xTeN!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8fd33692-a8bf-4d28-bcd7-96a5201504f5_1000x151.gif 1272w, https://substackcdn.com/image/fetch/$s_!xTeN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8fd33692-a8bf-4d28-bcd7-96a5201504f5_1000x151.gif 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption">Only changed spans are re-embedded and updated using page state + text hash comparison. (Source: Notion)</figcaption></figure></div><p>That is the kind of fix that getting the balance right matters. Miss a changed span and search quality suffers. Reprocess too much and cost stays high.</p><p><strong>Case 2: The metadata changes</strong></p><p>Now imagine Melville updates permissions so the page becomes visible to everyone. The permissions metadata changes but the text does not.</p><p>Previously, that still meant re-embedding and reloading the entire page. With the new approach, Notion compares both text and metadata hashes. If the text hashes are unchanged but metadata hashes differ, the system skips embedding entirely and issues a PATCH command to the vector database to update only the metadata. That is much cheaper than recomputing embeddings.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!6qtN!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F44aa866c-9f82-4a8f-b065-1e5e780a2175_1000x197.gif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!6qtN!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F44aa866c-9f82-4a8f-b065-1e5e780a2175_1000x197.gif 424w, https://substackcdn.com/image/fetch/$s_!6qtN!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F44aa866c-9f82-4a8f-b065-1e5e780a2175_1000x197.gif 848w, https://substackcdn.com/image/fetch/$s_!6qtN!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F44aa866c-9f82-4a8f-b065-1e5e780a2175_1000x197.gif 1272w, https://substackcdn.com/image/fetch/$s_!6qtN!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F44aa866c-9f82-4a8f-b065-1e5e780a2175_1000x197.gif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!6qtN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F44aa866c-9f82-4a8f-b065-1e5e780a2175_1000x197.gif" width="1000" height="197" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/44aa866c-9f82-4a8f-b065-1e5e780a2175_1000x197.gif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:197,&quot;width&quot;:1000,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2162583,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/gif&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/191742179?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F44aa866c-9f82-4a8f-b065-1e5e780a2175_1000x197.gif&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!6qtN!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F44aa866c-9f82-4a8f-b065-1e5e780a2175_1000x197.gif 424w, https://substackcdn.com/image/fetch/$s_!6qtN!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F44aa866c-9f82-4a8f-b065-1e5e780a2175_1000x197.gif 848w, https://substackcdn.com/image/fetch/$s_!6qtN!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F44aa866c-9f82-4a8f-b065-1e5e780a2175_1000x197.gif 1272w, https://substackcdn.com/image/fetch/$s_!6qtN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F44aa866c-9f82-4a8f-b065-1e5e780a2175_1000x197.gif 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption">Metadata-only changes skip embeddings and update spans via PATCH in the vector DB. (Source: Notion)</figcaption></figure></div><p>Across these changes, the Page State Project reduced data volume by 70 percent. That saved money on both embeddings API costs and vector database write costs.</p><p><strong>Moving embeddings to Ray (indexing)</strong></p><p>In July 2025, Notion started migrating its near real-time embeddings pipeline to <a href="https://www.ray.io/">Ray</a> on <a href="https://www.anyscale.com/">Anyscale</a>.</p><p>The motivation came from several pain points in the earlier setup.</p><p>One was the <strong>&#8216;double compute&#8217; problem</strong>. Spark on EMR handled preprocessing like chunking, transformations and API orchestration, but embeddings themselves were still generated through an external provider that charged per token. So the team was paying for both preprocessing infrastructure and embedding API usage.</p><p>Another issue was <strong>endpoint reliability</strong>. Fresh search indexes depended on the stability of an external embeddings API.</p><p>The third problem was <strong>clunky pipelining</strong>. To smooth traffic and avoid API rate limits, the team had built a multi-step handoff process where Spark jobs passed batches through S3. It worked but it was clunky.</p><p>Ray and Anyscale gave Notion a cleaner path.</p><p>Ray let the team run open-source embedding models directly, which meant more model flexibility and less dependence on external providers. By consolidating preprocessing and inference onto a single compute layer, they could cut out the double-compute setup. Ray also supports pipelining CPU-bound work such as chunking and page-state detection with GPU-bound embedding generation on the same nodes, which helps keep utilization high.</p><p>There was also a developer productivity angle. Anyscale workspaces let engineers write and test pipelines from their preferred tools without having to provision infrastructure manually.</p><p>And on the product side, self-hosting embeddings removed a third-party API hop from the user-facing path, which helped reduce end-to-end latency.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!UN1z!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a163616-e23f-465c-a9d4-e974253208d3_1000x362.gif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!UN1z!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a163616-e23f-465c-a9d4-e974253208d3_1000x362.gif 424w, https://substackcdn.com/image/fetch/$s_!UN1z!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a163616-e23f-465c-a9d4-e974253208d3_1000x362.gif 848w, https://substackcdn.com/image/fetch/$s_!UN1z!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a163616-e23f-465c-a9d4-e974253208d3_1000x362.gif 1272w, https://substackcdn.com/image/fetch/$s_!UN1z!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a163616-e23f-465c-a9d4-e974253208d3_1000x362.gif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!UN1z!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a163616-e23f-465c-a9d4-e974253208d3_1000x362.gif" width="1000" height="362" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2a163616-e23f-465c-a9d4-e974253208d3_1000x362.gif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:362,&quot;width&quot;:1000,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1537621,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/gif&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/191742179?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a163616-e23f-465c-a9d4-e974253208d3_1000x362.gif&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!UN1z!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a163616-e23f-465c-a9d4-e974253208d3_1000x362.gif 424w, https://substackcdn.com/image/fetch/$s_!UN1z!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a163616-e23f-465c-a9d4-e974253208d3_1000x362.gif 848w, https://substackcdn.com/image/fetch/$s_!UN1z!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a163616-e23f-465c-a9d4-e974253208d3_1000x362.gif 1272w, https://substackcdn.com/image/fetch/$s_!UN1z!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a163616-e23f-465c-a9d4-e974253208d3_1000x362.gif 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Ray natively supports pipelining CPU bound tasks (chunking, detecting page state) with GPU bound embeddings generation within the same node. (Source: Notion)</figcaption></figure></div><p>The rollout is still ongoing, but early results suggest a 90+ percent reduction in embeddings infrastructure costs. That is a major shift in how the economics of the system work.</p><p><strong>Real-time query embeddings on Ray (serving)</strong></p><p>Indexing is only half the picture. When users or agents search in Notion, queries must also be embedded on the fly before the vector database can be searched.</p><p>That makes serving latency-sensitive. The embedding has to happen fast enough that the search still feels responsive.</p><p>Hosting large embedding models is not trivial. GPU allocation, ingress routing, replication and autoscaling all matter, especially when traffic is uneven and expectations for responsiveness are high.</p><p><a href="https://docs.ray.io/en/latest/serve/index.html">Ray Serve</a> helped Notion here by handling much of that operational layer out of the box. The team could wrap open-source embedding models in persistent deployments that stay loaded on GPU, configure request batching and replication and manage the serving setup with normal Python code plus YAML-based infrastructure configuration.</p><p>That is a pretty practical endpoint for the broader journey.</p><p>What started as a vector search stack built quickly enough to launch AI Q&amp;A turned into a much more refined system: simpler in some places, more selective in others, cheaper across multiple layers and faster where users feel it. The interesting part is not any single tool choice. It is how the Notion team kept removing bottlenecks one by one: storage limits, awkward shard routing, redundant recomputation, external API dependence and fragmented compute layers.</p><p>That is usually what mature AI infrastructure looks like in the real world. Not one giant redesign. A sequence of sharp decisions, each fixing the thing that has become too expensive, too slow or too annoying to keep around.</p><div><hr></div><h3>The full scoop</h3><p>To learn more about this, check <a href="https://www.notion.com/blog/two-years-of-vector-search-at-notion">Notion's Engineering Blog</a> post on this topic</p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.datatinkerer.io/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">If you liked this post and don&#8217;t want to miss the next one, subscribe to Data Tinkerer!</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>If you are already subscribed and enjoyed the article, please give it a like and/or share it others, really appreciate it &#128591;</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.datatinkerer.io/p/how-notion-scaled-ai-q-and-a-to-millions-of-workspaces?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.datatinkerer.io/p/how-notion-scaled-ai-q-and-a-to-millions-of-workspaces?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><div><hr></div><h3>Keep learning</h3><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;0dbf0d77-87fd-4655-82da-31cc841a6d73&quot;,&quot;caption&quot;:&quot;LinkedIn pushed Venice to handle 175M+ lookups per second while ingesting 230M writes per second.<br /><br />This piece breaks down how they balanced compaction, CPU bottlenecks and adaptive throttling to scale ingestion under eventual consistency.&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;lg&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;How LinkedIn Built a Pipeline That Scales to 230M Records/sec Without Breaking SLAs&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:291590442,&quot;name&quot;:&quot;Data Tinkerer&quot;,&quot;bio&quot;:&quot;Head of analytics sharing deep dives and learnings about AI and all things data (science, engineering, analysis)&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/83d3bbe0-5fb8-4f8d-9b74-036abbd6fec9_500x500.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2026-02-19T04:00:52.353Z&quot;,&quot;cover_image&quot;:&quot;https://images.unsplash.com/photo-1712217559097-cc2aaf698767?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxM3x8bGlua2VkaW58ZW58MHx8fHwxNzcxMTMwNjQ1fDA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.datatinkerer.io/p/how-linkedin-built-a-pipeline-that-scales-to-230-million-records&quot;,&quot;section_name&quot;:&quot;Data Engineering&quot;,&quot;video_upload_id&quot;:null,&quot;id&quot;:187999868,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:10,&quot;comment_count&quot;:0,&quot;publication_id&quot;:3422740,&quot;publication_name&quot;:&quot;Data Tinkerer&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!JEdj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6bea5ccd-f356-4154-a1db-16268300510e_500x500.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;765aa4a7-c63b-4175-8423-aae14d8d54cb&quot;,&quot;caption&quot;:&quot;Grab needed to detect schema and value issues in Kafka streams while data was still in motion.<br /><br />This piece breaks down how they introduced real-time checks and fast alerts to catch poison events before they spread.&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;lg&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;How Grab Detects Data Issues across 100+ Kafka Topics Before They Spread&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:291590442,&quot;name&quot;:&quot;Data Tinkerer&quot;,&quot;bio&quot;:&quot;Head of analytics sharing deep dives and learnings about AI and all things data (science, engineering, analysis)&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/83d3bbe0-5fb8-4f8d-9b74-036abbd6fec9_500x500.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2026-01-15T04:15:57.055Z&quot;,&quot;cover_image&quot;:&quot;https://images.unsplash.com/photo-1624957083543-9a67140fabfd?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw0fHxncmFifGVufDB8fHx8MTc2NzkzMTIwOXww&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.datatinkerer.io/p/how-grab-detects-data-issues-across-100-kafka-topics&quot;,&quot;section_name&quot;:&quot;Data Engineering&quot;,&quot;video_upload_id&quot;:null,&quot;id&quot;:183755897,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:15,&quot;comment_count&quot;:0,&quot;publication_id&quot;:3422740,&quot;publication_name&quot;:&quot;Data Tinkerer&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!JEdj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6bea5ccd-f356-4154-a1db-16268300510e_500x500.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div>]]></content:encoded></item><item><title><![CDATA[How LinkedIn Built a Pipeline That Scales to 230M Records/sec Without Breaking SLAs]]></title><description><![CDATA[From partition strategy to adaptive throttling, the playbook behind Venice&#8217;s ingestion evolution.]]></description><link>https://www.datatinkerer.io/p/how-linkedin-built-a-pipeline-that-scales-to-230-million-records</link><guid isPermaLink="false">https://www.datatinkerer.io/p/how-linkedin-built-a-pipeline-that-scales-to-230-million-records</guid><dc:creator><![CDATA[Data Tinkerer]]></dc:creator><pubDate>Thu, 19 Feb 2026 04:00:52 GMT</pubDate><enclosure url="https://images.unsplash.com/photo-1712217559097-cc2aaf698767?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxM3x8bGlua2VkaW58ZW58MHx8fHwxNzcxMTMwNjQ1fDA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Fellow Data Tinkerers</p><p>Today we will look at how LinkedIn ingests data at scale.</p><p>But before that, I wanted to share with you what you could unlock if you share Data Tinkerer with just <strong>1 more person</strong>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!y5YD!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F42e71ce5-0a12-4514-b7a0-ddfd0460289a_800x402.gif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!y5YD!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F42e71ce5-0a12-4514-b7a0-ddfd0460289a_800x402.gif 424w, https://substackcdn.com/image/fetch/$s_!y5YD!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F42e71ce5-0a12-4514-b7a0-ddfd0460289a_800x402.gif 848w, https://substackcdn.com/image/fetch/$s_!y5YD!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F42e71ce5-0a12-4514-b7a0-ddfd0460289a_800x402.gif 1272w, https://substackcdn.com/image/fetch/$s_!y5YD!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F42e71ce5-0a12-4514-b7a0-ddfd0460289a_800x402.gif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!y5YD!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F42e71ce5-0a12-4514-b7a0-ddfd0460289a_800x402.gif" width="800" height="402" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/42e71ce5-0a12-4514-b7a0-ddfd0460289a_800x402.gif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:402,&quot;width&quot;:800,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:3369150,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/gif&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/187999868?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F42e71ce5-0a12-4514-b7a0-ddfd0460289a_800x402.gif&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!y5YD!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F42e71ce5-0a12-4514-b7a0-ddfd0460289a_800x402.gif 424w, https://substackcdn.com/image/fetch/$s_!y5YD!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F42e71ce5-0a12-4514-b7a0-ddfd0460289a_800x402.gif 848w, https://substackcdn.com/image/fetch/$s_!y5YD!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F42e71ce5-0a12-4514-b7a0-ddfd0460289a_800x402.gif 1272w, https://substackcdn.com/image/fetch/$s_!y5YD!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F42e71ce5-0a12-4514-b7a0-ddfd0460289a_800x402.gif 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>There are 100+ resources to learn all things data (science, engineering, analysis). It includes videos, courses, projects and can be filtered by tech stack (Python, SQL, Spark and etc), skill level (Beginner, Intermediate and so on) provider name or free/paid. So if you know other people who like staying up to date on all things data, please share Data Tinkerer with them!</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.datatinkerer.io/leaderboard?&amp;utm_source=post&quot;,&quot;text&quot;:&quot;Refer a friend&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.datatinkerer.io/leaderboard?&amp;utm_source=post"><span>Refer a friend</span></a></p><p>Now, with that out of the way, let&#8217;s get to Venice: LinkedIn&#8217;s data storage platform</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://images.unsplash.com/photo-1712217559097-cc2aaf698767?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxM3x8bGlua2VkaW58ZW58MHx8fHwxNzcxMTMwNjQ1fDA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://images.unsplash.com/photo-1712217559097-cc2aaf698767?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxM3x8bGlua2VkaW58ZW58MHx8fHwxNzcxMTMwNjQ1fDA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 424w, https://images.unsplash.com/photo-1712217559097-cc2aaf698767?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxM3x8bGlua2VkaW58ZW58MHx8fHwxNzcxMTMwNjQ1fDA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 848w, https://images.unsplash.com/photo-1712217559097-cc2aaf698767?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxM3x8bGlua2VkaW58ZW58MHx8fHwxNzcxMTMwNjQ1fDA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 1272w, https://images.unsplash.com/photo-1712217559097-cc2aaf698767?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxM3x8bGlua2VkaW58ZW58MHx8fHwxNzcxMTMwNjQ1fDA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 1456w" sizes="100vw"><img src="https://images.unsplash.com/photo-1712217559097-cc2aaf698767?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxM3x8bGlua2VkaW58ZW58MHx8fHwxNzcxMTMwNjQ1fDA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080" width="5184" height="3456" data-attrs="{&quot;src&quot;:&quot;https://images.unsplash.com/photo-1712217559097-cc2aaf698767?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxM3x8bGlua2VkaW58ZW58MHx8fHwxNzcxMTMwNjQ1fDA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:3456,&quot;width&quot;:5184,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;a computer screen with a facebook page on it&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="a computer screen with a facebook page on it" title="a computer screen with a facebook page on it" srcset="https://images.unsplash.com/photo-1712217559097-cc2aaf698767?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxM3x8bGlua2VkaW58ZW58MHx8fHwxNzcxMTMwNjQ1fDA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 424w, https://images.unsplash.com/photo-1712217559097-cc2aaf698767?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxM3x8bGlua2VkaW58ZW58MHx8fHwxNzcxMTMwNjQ1fDA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 848w, https://images.unsplash.com/photo-1712217559097-cc2aaf698767?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxM3x8bGlua2VkaW58ZW58MHx8fHwxNzcxMTMwNjQ1fDA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 1272w, https://images.unsplash.com/photo-1712217559097-cc2aaf698767?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxM3x8bGlua2VkaW58ZW58MHx8fHwxNzcxMTMwNjQ1fDA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Photo by <a href="https://unsplash.com/@getswello">Swello</a> on <a href="https://unsplash.com">Unsplash</a></figcaption></figure></div><h3>TL;DR</h3><div><hr></div><h4><strong>Situation</strong></h4><p>Venice powers LinkedIn&#8217;s AI-driven products and has scaled to 2,600+ stores with workloads spanning bulk loads, streaming updates and active/active replication. The ingestion pipeline had to handle throughput-heavy, CPU-heavy and latency-sensitive traffic under eventual consistency.</p><h4><strong>Task</strong></h4><p>Redesign ingestion to scale to 230M writes/sec while preserving ordering and protecting read and write SLAs. Support hybrid stores, partial updates and multi&#8211;data center replication without destabilizing clusters.</p><h4><strong>Action</strong></h4><p>Scaled bulk ingestion with partition tuning, shared consumer/writer pools and direct SST writes; tuned RocksDB via compaction triggers and BlobDB to manage amplification. Optimized CPU-heavy paths using Fast-Avro and parallel processing, then enforced priority pools and adaptive throttling to protect current-version latency.</p><h4><strong>Result</strong></h4><p>Venice now handles 175M+ key lookups/sec and 230M+ writes/sec in production. It maintains a write latency SLA under 10 minutes while safeguarding read latency as the top priority.</p><h4><strong>Use Cases</strong></h4><p>Large-scale feature stores, real-time recommendation systems, hybrid data serving, low-latency notification</p><h4><strong>Tech Stack/Framework</strong></h4><p>Apache Spark, Apache Samza, Apache Kafka, RocksDB, Fast-Avro, Adaptive Throttling</p><div><hr></div><h3>Explained further</h3><div><hr></div><h4>Background</h4><p><a href="https://github.com/linkedin/venice">Venice</a> is an open-source derived data storage platform and LinkedIn&#8217;s default storage layer for online AI use cases. It sits behind products like People You May Know, feed, videos, ads, notifications, the A/B testing platform, LinkedIn Learning and more.</p><p>Since Venice launched internally in 2016 it has scaled from a handful of stores to over 2,600 production stores. The workloads also evolved a lot. It started with &#8220;just bulk load a dataset&#8221; and grew into a mix of:</p><ul><li><p>Bulk loading huge offline datasets</p></li><li><p>Nearline streaming updates</p></li><li><p>Active/active replication across data centers</p></li><li><p>Partial updates that merge fields and collections</p></li><li><p>Deterministic write latency expectations under eventual consistency</p></li></ul><p>This post walks through how the ingestion pipeline was revamped to hit <strong>230 million records per second in production</strong>, what changed across the architecture, which optimizations moved the needle and how different workload types get tuned. A lot of these ideas are portable if you run any distributed ingestion system where ordering, throughput and predictable latency all matter at once.</p><div><hr></div><h4>Venice overall ingestion pipeline</h4><p>At a high level, store owners write to Venice through three paths:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!BTop!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda1d3a94-d249-4fcc-82c0-50950cf13705_600x297.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!BTop!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda1d3a94-d249-4fcc-82c0-50950cf13705_600x297.png 424w, https://substackcdn.com/image/fetch/$s_!BTop!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda1d3a94-d249-4fcc-82c0-50950cf13705_600x297.png 848w, https://substackcdn.com/image/fetch/$s_!BTop!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda1d3a94-d249-4fcc-82c0-50950cf13705_600x297.png 1272w, https://substackcdn.com/image/fetch/$s_!BTop!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda1d3a94-d249-4fcc-82c0-50950cf13705_600x297.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!BTop!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda1d3a94-d249-4fcc-82c0-50950cf13705_600x297.png" width="600" height="297" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/da1d3a94-d249-4fcc-82c0-50950cf13705_600x297.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:297,&quot;width&quot;:600,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:27952,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/187999868?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda1d3a94-d249-4fcc-82c0-50950cf13705_600x297.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!BTop!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda1d3a94-d249-4fcc-82c0-50950cf13705_600x297.png 424w, https://substackcdn.com/image/fetch/$s_!BTop!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda1d3a94-d249-4fcc-82c0-50950cf13705_600x297.png 848w, https://substackcdn.com/image/fetch/$s_!BTop!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda1d3a94-d249-4fcc-82c0-50950cf13705_600x297.png 1272w, https://substackcdn.com/image/fetch/$s_!BTop!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda1d3a94-d249-4fcc-82c0-50950cf13705_600x297.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Venice overall ingestion pipeline (Source: LinkedIn)</figcaption></figure></div><ol><li><p><strong>Bulk loads</strong> from an offline processing platform (example: Spark)</p></li><li><p><strong>Nearline writes</strong> from a streaming processing platform (example: Samza)</p></li><li><p><strong>Direct writes</strong> from online applications</p></li></ol><p>No matter which path you take, the writes all pass through an intermediate PubSub broker layer. From there, the Venice Storage Node (VSN) consumes messages and persists data locally using RocksDB (an embedded key-value store).</p><p>The pipeline sounds straightforward until you operate it at scale. The same ingestion path has to support very different workloads. Some are throughput-driven (bootstrapping a massive store). Some are latency-driven (current-version updates). Some are CPU-heavy (partial updates and conflict resolution). Some are I/O-heavy (compaction, SST churn).</p><p>The following sections will look at the challenges and how the LinkedIn team resolved them.</p><div><hr></div><h4>Use case 1: bootstrapping from offline dataset</h4><p>Venice users can run bulk load jobs using offline processing platforms such as Spark to push new data versions to Venice stores. The hard part is performance for large or massive stores. If you want to find bottlenecks you need to understand the ingestion path end to end.</p><p><strong>What happens during a bulk load</strong></p><ul><li><p>A Venice Push Job (VPJ) creates a new version topic for the new store version, split into multiple partitions</p></li><li><p>The Spark job uses a map-reduce framework to produce messages to that version topic</p></li><li><p>It keeps one reducer per topic partition so message ordering is preserved</p></li><li><p>On the other side, the VSN spins up consumers, reads messages and persists them into RocksDB</p></li><li><p>There is one RocksDB instance per topic partition</p></li></ul><p>So you can hit bottlenecks in three obvious places:</p><ol><li><p>producing</p></li><li><p>consuming</p></li><li><p>persisting</p></li></ol><p>Production experience says you will hit all three, just not on the same day.</p><p><strong>Improving producing and consuming throughput</strong></p><p>The usual first lever is increasing the number of partitions for large stores so you can use more of the PubSub cluster capacity. More partitions tends to mean more parallelism and more throughput.</p><p>But it comes with trade-offs:</p><ul><li><p>more partitions means more management overhead across Venice and PubSub</p></li><li><p>there is a throughput ceiling per PubSub broker</p></li></ul><p>So partition count is not a free lunch. It&#8217;s a knob that buys you throughput and charges you complexity.</p><p><strong>Enhancing consumption scalability</strong></p><p>To keep up with production, VSN uses shared consumer pools across all hosted stores.</p><p>Instead of &#8220;one store version, one set of consumers,&#8221; each store version can use multiple consumers by distributing hosted partitions among them. The point is to keep multiple connections per PubSub broker to speed up consumption (similar to a <a href="https://en.wikipedia.org/wiki/Download_manager">Download Manager</a>).</p><p>The pool approach also does something boring but important: it sets an upper limit on total consumers which puts a ceiling on cost.</p><p><strong>Optimizing I/O performance</strong></p><p>VSN uses a shared writer pool to persist changes concurrently across multiple RocksDB instances and use local SSD capacity effectively.</p><p>Ordering is critical in Venice so for any given RocksDB instance there is only one writer actively writing to it. You still get concurrency across instances, not inside one instance which is the compromise that keeps ordering intact.</p><p><strong>Minimizing memory overhead</strong></p><p>Because messages for a partition are strictly ordered (thanks to the map-reduce framework), Venice uses <a href="https://github.com/facebook/rocksdb/wiki/creating-and-ingesting-sst-files">RocksDB&#8217;s SSTFileWriter</a> to generate SST files directly. That significantly reduces memory overhead during ingestion.</p><p><strong>Ingestion workflow in Venice Server</strong></p><p>Put together, the optimized workflow is basically: use the PubSub layer for distribution, use consumer pools for scalable reads, use writer pools for SSD throughput, preserve ordering by design and avoid memory blowups by writing SST files directly.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!pbHX!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7bb8e50b-527c-4d5b-81fe-4a43bb0f3bc1_1200x944.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!pbHX!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7bb8e50b-527c-4d5b-81fe-4a43bb0f3bc1_1200x944.png 424w, https://substackcdn.com/image/fetch/$s_!pbHX!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7bb8e50b-527c-4d5b-81fe-4a43bb0f3bc1_1200x944.png 848w, https://substackcdn.com/image/fetch/$s_!pbHX!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7bb8e50b-527c-4d5b-81fe-4a43bb0f3bc1_1200x944.png 1272w, https://substackcdn.com/image/fetch/$s_!pbHX!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7bb8e50b-527c-4d5b-81fe-4a43bb0f3bc1_1200x944.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!pbHX!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7bb8e50b-527c-4d5b-81fe-4a43bb0f3bc1_1200x944.png" width="1200" height="944" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7bb8e50b-527c-4d5b-81fe-4a43bb0f3bc1_1200x944.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:944,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:191738,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/187999868?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7bb8e50b-527c-4d5b-81fe-4a43bb0f3bc1_1200x944.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!pbHX!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7bb8e50b-527c-4d5b-81fe-4a43bb0f3bc1_1200x944.png 424w, https://substackcdn.com/image/fetch/$s_!pbHX!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7bb8e50b-527c-4d5b-81fe-4a43bb0f3bc1_1200x944.png 848w, https://substackcdn.com/image/fetch/$s_!pbHX!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7bb8e50b-527c-4d5b-81fe-4a43bb0f3bc1_1200x944.png 1272w, https://substackcdn.com/image/fetch/$s_!pbHX!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7bb8e50b-527c-4d5b-81fe-4a43bb0f3bc1_1200x944.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Optimised Venice pipeline (Source: LinkedIn)</figcaption></figure></div><div><hr></div><h4>Use case 2: hybrid store</h4><p>Venice supports Lambda architecture style use cases by merging updates from both <strong>bulk loads</strong> and <strong>nearline writes</strong>. Users query a single store and get a unified view.</p><p><strong>Venice hybrid store workflow</strong></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!BaZo!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94bfa8e8-d929-494f-ad6b-b512ede167aa_1024x375.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!BaZo!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94bfa8e8-d929-494f-ad6b-b512ede167aa_1024x375.png 424w, https://substackcdn.com/image/fetch/$s_!BaZo!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94bfa8e8-d929-494f-ad6b-b512ede167aa_1024x375.png 848w, https://substackcdn.com/image/fetch/$s_!BaZo!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94bfa8e8-d929-494f-ad6b-b512ede167aa_1024x375.png 1272w, https://substackcdn.com/image/fetch/$s_!BaZo!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94bfa8e8-d929-494f-ad6b-b512ede167aa_1024x375.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!BaZo!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94bfa8e8-d929-494f-ad6b-b512ede167aa_1024x375.png" width="1024" height="375" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/94bfa8e8-d929-494f-ad6b-b512ede167aa_1024x375.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:375,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:64710,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/187999868?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94bfa8e8-d929-494f-ad6b-b512ede167aa_1024x375.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!BaZo!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94bfa8e8-d929-494f-ad6b-b512ede167aa_1024x375.png 424w, https://substackcdn.com/image/fetch/$s_!BaZo!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94bfa8e8-d929-494f-ad6b-b512ede167aa_1024x375.png 848w, https://substackcdn.com/image/fetch/$s_!BaZo!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94bfa8e8-d929-494f-ad6b-b512ede167aa_1024x375.png 1272w, https://substackcdn.com/image/fetch/$s_!BaZo!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94bfa8e8-d929-494f-ad6b-b512ede167aa_1024x375.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Hybrid store workflow (Source: LinkedIn)</figcaption></figure></div><p>How it works:</p><ul><li><p>each bulk load creates a new store version</p></li><li><p>that version has a new Kafka topic and a new database instance</p></li><li><p>real-time updates produced by a Samza job via a real-time topic are appended to both version topics to keep them current</p></li><li><p>once the new version catches up fully, it is swapped in as the active version to serve reads</p></li></ul><p>The hybrid store is important because it gives you a clean &#8220;new version build&#8221; story without losing real-time freshness. But it creates a new challenge: the database transitions from <strong>read-only</strong> to <strong>read-write</strong>.</p><p>That&#8217;s where <a href="https://github.com/facebook/rocksdb/wiki">RocksDB</a> tuning matters, because duplicates start showing up more often. Keys get updated or deleted after they were inserted. RocksDB uses <a href="https://github.com/facebook/rocksdb/wiki/Compaction">log compaction</a> to remove stale entries, but that compaction has overhead: scan, merge, rewrite SST files, consume CPU, I/O and disk.</p><p>So the core problem becomes: tune RocksDB so you can balance <a href="https://github.com/facebook/rocksdb/wiki/RocksDB-Tuning-Guide#amplification-factors">three competing types of pain.</a></p><ul><li><p><strong>Write amplification</strong>: bytes written to storage vs bytes written to the DB</p></li><li><p><strong>Read amplification</strong>: number of disk reads per query</p></li><li><p><strong>Space amplification</strong>: size of DB files on disk vs the actual data size</p></li></ul><p>Venice uses <a href="https://github.com/facebook/rocksdb/wiki/Leveled-Compaction">leveled compaction</a> by default and relies primarily on two methods to balance those trade-offs.</p><p><strong>1. Tuning the compaction trigger</strong></p><p>The key setting here is:</p><ul><li><p><strong>level0_file_num_compaction_trigger</strong></p></li></ul><p>This controls the max number of files allowed in Level-0. Once you exceed it, compaction kicks in to push SST files from Level-0 to Level-1 and onward as upper levels fill.</p><p>Why it matters:</p><ul><li><p>higher threshold &#8594; fewer compactions &#8594; lower write amplification</p></li><li><p>but also more Level-0 files &#8594; higher read amplification since reads may need to scan multiple files</p></li><li><p>plus higher space amplification because duplicates hang around longer</p></li></ul><p>Venice tunes this per cluster because clusters have different bottlenecks:</p><ul><li><p><strong>memory-serving clusters</strong> want data in RAM to speed up lookups. Memory is the limiting resource, so they set a <strong>lower threshold</strong> to reduce space amplification</p></li><li><p><strong>disk-serving clusters</strong> are often limited by disk I/O, so they set a <strong>higher threshold</strong> to reduce compaction frequency and lower disk write rate</p></li></ul><p>This is a practical tuning philosophy: tune to your real bottleneck, not a generic best practice.</p><p><strong>2. RocksDB BlobDB integration</strong></p><p><a href="https://github.com/facebook/rocksdb/wiki/BlobDB">BlobDB</a> is aimed at large-value workloads through key-value separation:</p><ul><li><p>Large values go into blob files</p></li><li><p>LSM tree stores small pointers</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!DT0h!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F74d11f76-4d71-47dd-a41a-37f0bc48666a_1200x447.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!DT0h!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F74d11f76-4d71-47dd-a41a-37f0bc48666a_1200x447.png 424w, https://substackcdn.com/image/fetch/$s_!DT0h!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F74d11f76-4d71-47dd-a41a-37f0bc48666a_1200x447.png 848w, https://substackcdn.com/image/fetch/$s_!DT0h!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F74d11f76-4d71-47dd-a41a-37f0bc48666a_1200x447.png 1272w, https://substackcdn.com/image/fetch/$s_!DT0h!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F74d11f76-4d71-47dd-a41a-37f0bc48666a_1200x447.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!DT0h!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F74d11f76-4d71-47dd-a41a-37f0bc48666a_1200x447.png" width="1200" height="447" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/74d11f76-4d71-47dd-a41a-37f0bc48666a_1200x447.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:447,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:156086,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/187999868?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F74d11f76-4d71-47dd-a41a-37f0bc48666a_1200x447.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!DT0h!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F74d11f76-4d71-47dd-a41a-37f0bc48666a_1200x447.png 424w, https://substackcdn.com/image/fetch/$s_!DT0h!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F74d11f76-4d71-47dd-a41a-37f0bc48666a_1200x447.png 848w, https://substackcdn.com/image/fetch/$s_!DT0h!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F74d11f76-4d71-47dd-a41a-37f0bc48666a_1200x447.png 1272w, https://substackcdn.com/image/fetch/$s_!DT0h!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F74d11f76-4d71-47dd-a41a-37f0bc48666a_1200x447.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">RocksDB BlobDB structure</figcaption></figure></div><p>This avoids copying large values repeatedly during compaction, reducing write amplification. The cost is additional space amplification because blobs can become unreferenced and require garbage collection.</p><p>For Venice, BlobDB integration reduced write amplification significantly in multi-tenant clusters, especially for large-value use cases. The reported impact here is big: <strong>more than a 50% reduction of disk write throughput</strong>. That matters because it avoided scaling out clusters when CPU and storage space were still available.</p><p>The win here is: you stop paying the compaction tax over and over on the same large payloads.</p><div><hr></div><h4>Use case 3: Active/active replication with partial update</h4><p>Venice guarantees eventual consistency, not strong consistency. That matters because it means you cannot just do read-modify-write operations directly due to write delays.</p><p>To handle this, Venice introduces <strong>partial update</strong>, a specialized operation that supports field-level updates and collection merges.</p><p><strong>Venice partial update workflow</strong></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ay5v!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64784492-8dfb-479c-8521-7b2ce0f62c5d_840x1320.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ay5v!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64784492-8dfb-479c-8521-7b2ce0f62c5d_840x1320.png 424w, https://substackcdn.com/image/fetch/$s_!ay5v!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64784492-8dfb-479c-8521-7b2ce0f62c5d_840x1320.png 848w, https://substackcdn.com/image/fetch/$s_!ay5v!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64784492-8dfb-479c-8521-7b2ce0f62c5d_840x1320.png 1272w, https://substackcdn.com/image/fetch/$s_!ay5v!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64784492-8dfb-479c-8521-7b2ce0f62c5d_840x1320.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ay5v!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64784492-8dfb-479c-8521-7b2ce0f62c5d_840x1320.png" width="840" height="1320" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/64784492-8dfb-479c-8521-7b2ce0f62c5d_840x1320.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1320,&quot;width&quot;:840,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:279575,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/187999868?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64784492-8dfb-479c-8521-7b2ce0f62c5d_840x1320.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ay5v!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64784492-8dfb-479c-8521-7b2ce0f62c5d_840x1320.png 424w, https://substackcdn.com/image/fetch/$s_!ay5v!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64784492-8dfb-479c-8521-7b2ce0f62c5d_840x1320.png 848w, https://substackcdn.com/image/fetch/$s_!ay5v!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64784492-8dfb-479c-8521-7b2ce0f62c5d_840x1320.png 1272w, https://substackcdn.com/image/fetch/$s_!ay5v!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64784492-8dfb-479c-8521-7b2ce0f62c5d_840x1320.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Venice partial update (Source: LinkedIn)</figcaption></figure></div><p>Inside the Venice server, the leader replica:</p><ul><li><p>decodes the incoming payload</p></li><li><p>applies the update</p></li><li><p>re-encodes the result</p></li><li><p>writes to the local database</p></li><li><p>writes to the Version Topic</p></li><li><p>follower replicas consume the merged results</p></li></ul><p>Most of that is CPU-heavy.</p><p>Then the platform evolved further with active/active replication across multiple data centers. The key mechanism is deterministic conflict resolution (DCR), similar to CRDTs. Venice tracks update timestamps at row and field levels, compares incoming timestamps with existing ones and decides to apply or skip.</p><p><strong>Venice Active/Active workflow</strong></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!36Hk!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F285f3b43-13b3-4437-b6fd-72984728e60b_1024x1516.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!36Hk!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F285f3b43-13b3-4437-b6fd-72984728e60b_1024x1516.png 424w, https://substackcdn.com/image/fetch/$s_!36Hk!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F285f3b43-13b3-4437-b6fd-72984728e60b_1024x1516.png 848w, https://substackcdn.com/image/fetch/$s_!36Hk!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F285f3b43-13b3-4437-b6fd-72984728e60b_1024x1516.png 1272w, https://substackcdn.com/image/fetch/$s_!36Hk!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F285f3b43-13b3-4437-b6fd-72984728e60b_1024x1516.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!36Hk!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F285f3b43-13b3-4437-b6fd-72984728e60b_1024x1516.png" width="1024" height="1516" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/285f3b43-13b3-4437-b6fd-72984728e60b_1024x1516.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1516,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:510735,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/187999868?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F285f3b43-13b3-4437-b6fd-72984728e60b_1024x1516.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!36Hk!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F285f3b43-13b3-4437-b6fd-72984728e60b_1024x1516.png 424w, https://substackcdn.com/image/fetch/$s_!36Hk!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F285f3b43-13b3-4437-b6fd-72984728e60b_1024x1516.png 848w, https://substackcdn.com/image/fetch/$s_!36Hk!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F285f3b43-13b3-4437-b6fd-72984728e60b_1024x1516.png 1272w, https://substackcdn.com/image/fetch/$s_!36Hk!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F285f3b43-13b3-4437-b6fd-72984728e60b_1024x1516.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Venice Active/Active workflow (Source: LinkedIn)</figcaption></figure></div><p>Now the leader replica has even more to do for DCR:</p><ul><li><p>timestamp metadata lookup</p></li><li><p>decoding</p></li><li><p>encoding</p></li></ul><p>Again: CPU heavy. So the optimisation below focus on CPU efficiency.</p><p><strong>1. Fast-Avro adoption</strong></p><p><a href="https://github.com/linkedin/avro-util">Fast-Avro</a> was originally developed by RTBHouse but LinkedIn took over maintenance under the LinkedIn namespace and introduced many optimizations.</p><p>The key idea: Fast-Avro is an alternative to Apache Avro serialization and deserialization using runtime code generation which performs significantly better than the native implementation. It supports multiple Avro versions at runtime and is widely adopted inside LinkedIn.</p><p>Venice fully integrated Fast-Avro and saw, in one major use case, up to a <strong>90% improvement in deserialization latency at p99</strong> on the application side.</p><p><strong>2. Parallel processing</strong></p><p>In the traditional pipeline, DCR and partial update operations were executed sequentially, record by record within the same partition. That leads to CPU underutilization.</p><p>Venice introduced parallel processing so multiple records can be handled concurrently within the same partition <em>before</em> producing them to the version topic, while still preserving strict ordering in the final step.</p><p>Result: significantly improved write throughput for these complex record types.</p><div><hr></div><h4>Use Case 4: Active/active replication with deterministic write latency</h4><p>Eventually consistent systems still get judged by human expectations. People want their writes to show up and they want it to happen predictably.</p><p>Venice is versioned and can ingest backup, current and future versions concurrently in a single server instance. In practice though, only the current version serves reads so deterministic write latency guarantees focus mostly there.</p><p>To improve determinism, Venice introduced a pooling strategy in ingestion with <strong>different priorities</strong> for different workload types. The Venice consumer phase is the first phase in the server ingestion pipeline and controlling the polling rate via pools is how prioritization happens.</p><p>Broad priority tiers:</p><ul><li><p>top priority: active/active and partial update workloads for the <strong>current version on the leader replica</strong> (CPU-intensive and latency-sensitive)</p></li><li><p>next: other workload types targeting the current version</p></li><li><p>then: active/active or partial update workloads for backup or future versions on the leader replica</p></li><li><p>finally: everything else in a lower-priority bucket</p></li></ul><p>This design is trying to do a few practical things:</p><ul><li><p>isolate CPU-heavy workloads so they don&#8217;t slow down lighter ones</p></li><li><p>prioritize the current version so the most up-to-date data flows smoothly</p></li><li><p>keep the number of pools limited to avoid resource management turning into a second job</p></li></ul><p>The catch is tuning. Clusters see different workloads, store behavior varies widely even within one cluster, throughput swings over time and read traffic changes throughout the day. Static configs force you to tune for worst-case, which wastes resources most of the time.</p><p>So Venice introduced adaptive throttling: dynamically adjust ingestion based on recent performance.</p><ul><li><p>if the system is within agreed SLAs, ingestion rates are adjusted according to priorities</p></li><li><p>if an SLA is violated, ingestion is throttled back immediately</p></li></ul><p>Defining the SLAs matters. Venice focuses on two key criteria:</p><ol><li><p><strong>Read latency SLA</strong>: highest priority. Never violate read latency SLAs, even if it costs ingestion throughput</p></li><li><p><strong>Write latency SLA for the current version</strong>: while read latency SLAs are met, write latency for the current version becomes top priority, pools are tuned proportionally to maximize utilization and throughput</p></li></ol><div><hr></div><h4><strong>Wrapping up</strong></h4><p>With these optimizations, Venice at LinkedIn handles:</p><ul><li><p>Over <strong>175 million key lookups per second</strong></p></li><li><p>Over <strong>230 million writes per second</strong></p></li><li><p>While maintaining a <strong>write latency SLA under 10 minutes</strong></p></li></ul><div><hr></div><h3>The full scoop</h3><p>To learn more about this, check <a href="https://www.linkedin.com/blog/engineering/infrastructure/evolution-of-the-venice-ingestion-pipeline">LinkedIn's Engineering Blog</a> post on this topic</p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.datatinkerer.io/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">If you liked this post and don&#8217;t want to miss the next one, subscribe to Data Tinkerer!</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>If you are already subscribed and enjoyed the article, please give it a like and/or share it others, really appreciate it &#128591;</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.datatinkerer.io/p/how-linkedin-built-a-pipeline-that-scales-to-230-million-records?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.datatinkerer.io/p/how-linkedin-built-a-pipeline-that-scales-to-230-million-records?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><div><hr></div><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;7dd74b6f-84de-4b87-a0cf-3e440ec7dc65&quot;,&quot;caption&quot;:&quot;Grab needed to detect schema and value issues in Kafka streams while data was still in motion.<br /><br />This piece breaks down how they introduced real-time checks and fast alerts to catch poison events before they spread.&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;lg&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;How Grab Detects Data Issues across 100+ Kafka Topics Before They Spread&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:291590442,&quot;name&quot;:&quot;Data Tinkerer&quot;,&quot;bio&quot;:&quot;Head of analytics sharing deep dives and learnings about AI and all things data (science, engineering, analysis)&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/83d3bbe0-5fb8-4f8d-9b74-036abbd6fec9_500x500.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2026-01-15T04:15:57.055Z&quot;,&quot;cover_image&quot;:&quot;https://images.unsplash.com/photo-1624957083543-9a67140fabfd?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw0fHxncmFifGVufDB8fHx8MTc2NzkzMTIwOXww&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.datatinkerer.io/p/how-grab-detects-data-issues-across-100-kafka-topics&quot;,&quot;section_name&quot;:&quot;Data Engineering&quot;,&quot;video_upload_id&quot;:null,&quot;id&quot;:183755897,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:15,&quot;comment_count&quot;:0,&quot;publication_id&quot;:3422740,&quot;publication_name&quot;:&quot;Data Tinkerer&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!JEdj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6bea5ccd-f356-4154-a1db-16268300510e_500x500.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><h3>Keep learning</h3><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;2b5e61e3-2de5-4088-981d-80de61411bd4&quot;,&quot;caption&quot;:&quot;Uber rebuilt its data lake ingestion to move freshness from hours to minutes.<br /><br />This piece breaks down how they replaced batch Spark jobs with Flink streaming, cut compute by 25% and dealt with the very real problems that show up at petabyte scale.&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;lg&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;How Uber Cut Data Lake Freshness From Hours to Minutes With Flink&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:291590442,&quot;name&quot;:&quot;Data Tinkerer&quot;,&quot;bio&quot;:&quot;Ex-head of analytics sharing deep dives and learnings about AI and all things data (science, engineering, analysis)&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/83d3bbe0-5fb8-4f8d-9b74-036abbd6fec9_500x500.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2026-01-02T04:30:31.300Z&quot;,&quot;cover_image&quot;:&quot;https://images.unsplash.com/photo-1615929362369-4af8423ebd68?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxNHx8dWJlcnxlbnwwfHx8fDE3NjcwNzc2NDZ8MA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.datatinkerer.io/p/how-uber-cut-data-lake-freshness-from-hours-to-minutes-with-flink&quot;,&quot;section_name&quot;:&quot;Data Engineering&quot;,&quot;video_upload_id&quot;:null,&quot;id&quot;:182833470,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:17,&quot;comment_count&quot;:1,&quot;publication_id&quot;:3422740,&quot;publication_name&quot;:&quot;Data Tinkerer&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!JEdj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6bea5ccd-f356-4154-a1db-16268300510e_500x500.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div>]]></content:encoded></item><item><title><![CDATA[How Grab Detects Data Issues across 100+ Kafka Topics Before They Spread]]></title><description><![CDATA[Real-time stream validation surfaces poison records early and notifies owners with context]]></description><link>https://www.datatinkerer.io/p/how-grab-detects-data-issues-across-100-kafka-topics</link><guid isPermaLink="false">https://www.datatinkerer.io/p/how-grab-detects-data-issues-across-100-kafka-topics</guid><dc:creator><![CDATA[Data Tinkerer]]></dc:creator><pubDate>Thu, 15 Jan 2026 04:15:57 GMT</pubDate><enclosure url="https://images.unsplash.com/photo-1624957083543-9a67140fabfd?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw0fHxncmFifGVufDB8fHx8MTc2NzkzMTIwOXww&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Fellow Data Tinkerers</p><p>Today we will look at how Grab detects data issues in real-time. </p><p>But before that, I wanted to share with you what you could unlock if you share Data Tinkerer with just <strong>1 more person</strong>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!6Doc!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f76efe7-9d16-4ec3-8e25-0e3d2282705b_800x402.gif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!6Doc!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f76efe7-9d16-4ec3-8e25-0e3d2282705b_800x402.gif 424w, https://substackcdn.com/image/fetch/$s_!6Doc!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f76efe7-9d16-4ec3-8e25-0e3d2282705b_800x402.gif 848w, https://substackcdn.com/image/fetch/$s_!6Doc!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f76efe7-9d16-4ec3-8e25-0e3d2282705b_800x402.gif 1272w, https://substackcdn.com/image/fetch/$s_!6Doc!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f76efe7-9d16-4ec3-8e25-0e3d2282705b_800x402.gif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!6Doc!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f76efe7-9d16-4ec3-8e25-0e3d2282705b_800x402.gif" width="800" height="402" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5f76efe7-9d16-4ec3-8e25-0e3d2282705b_800x402.gif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:402,&quot;width&quot;:800,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:3369150,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/gif&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/183755897?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f76efe7-9d16-4ec3-8e25-0e3d2282705b_800x402.gif&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!6Doc!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f76efe7-9d16-4ec3-8e25-0e3d2282705b_800x402.gif 424w, https://substackcdn.com/image/fetch/$s_!6Doc!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f76efe7-9d16-4ec3-8e25-0e3d2282705b_800x402.gif 848w, https://substackcdn.com/image/fetch/$s_!6Doc!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f76efe7-9d16-4ec3-8e25-0e3d2282705b_800x402.gif 1272w, https://substackcdn.com/image/fetch/$s_!6Doc!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f76efe7-9d16-4ec3-8e25-0e3d2282705b_800x402.gif 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>There are 100+ resources to learn all things data (science, engineering, analysis). It includes videos, courses, projects and can be filtered by tech stack (Python, SQL, Spark and etc), skill level (Beginner, Intermediate and so on) provider name or free/paid. So if you know other people who like staying up to date on all things data, please share Data Tinkerer with them!</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.datatinkerer.io/leaderboard?&amp;utm_source=post&quot;,&quot;text&quot;:&quot;Refer a friend&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.datatinkerer.io/leaderboard?&amp;utm_source=post"><span>Refer a friend</span></a></p><p>Now, with that out of the way, let&#8217;s get to Grab&#8217;s real-time work!</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://images.unsplash.com/photo-1558899367-3cd83fb31ed8?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxfHxncmFifGVufDB8fHx8MTc2NzkzMTIwOXww&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://images.unsplash.com/photo-1558899367-3cd83fb31ed8?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxfHxncmFifGVufDB8fHx8MTc2NzkzMTIwOXww&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 424w, https://images.unsplash.com/photo-1558899367-3cd83fb31ed8?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxfHxncmFifGVufDB8fHx8MTc2NzkzMTIwOXww&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 848w, https://images.unsplash.com/photo-1558899367-3cd83fb31ed8?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxfHxncmFifGVufDB8fHx8MTc2NzkzMTIwOXww&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 1272w, https://images.unsplash.com/photo-1558899367-3cd83fb31ed8?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxfHxncmFifGVufDB8fHx8MTc2NzkzMTIwOXww&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 1456w" sizes="100vw"><img src="https://images.unsplash.com/photo-1558899367-3cd83fb31ed8?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxfHxncmFifGVufDB8fHx8MTc2NzkzMTIwOXww&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080" width="6000" height="4000" data-attrs="{&quot;src&quot;:&quot;https://images.unsplash.com/photo-1558899367-3cd83fb31ed8?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxfHxncmFifGVufDB8fHx8MTc2NzkzMTIwOXww&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:4000,&quot;width&quot;:6000,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;man riding bicycle&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="man riding bicycle" title="man riding bicycle" srcset="https://images.unsplash.com/photo-1558899367-3cd83fb31ed8?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxfHxncmFifGVufDB8fHx8MTc2NzkzMTIwOXww&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 424w, https://images.unsplash.com/photo-1558899367-3cd83fb31ed8?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxfHxncmFifGVufDB8fHx8MTc2NzkzMTIwOXww&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 848w, https://images.unsplash.com/photo-1558899367-3cd83fb31ed8?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxfHxncmFifGVufDB8fHx8MTc2NzkzMTIwOXww&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 1272w, https://images.unsplash.com/photo-1558899367-3cd83fb31ed8?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxfHxncmFifGVufDB8fHx8MTc2NzkzMTIwOXww&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Photo by <a href="https://unsplash.com/@javaistan">Afif Ramdhasuma</a> on <a href="https://unsplash.com">Unsplash</a></figcaption></figure></div><h3>TL;DR</h3><div><hr></div><h4><strong>Situation</strong></h4><p>Grab runs critical systems on Kafka streams, where bad data can spread and break downstream consumers. Existing checks were slow and mostly limited to schemas, making issues hard to catch and debug.</p><h4><strong>Task</strong></h4><p>Detect bad streaming data early, cover both schema and value-level issues and give stream owners fast, actionable visibility without centralising ownership.</p><h4><strong>Action</strong></h4><p>Grab built contract-driven stream checks on Coban, turning schemas, field rules and ownership into real-time FlinkSQL tests with Slack alerts and UI-based inspection of bad records.</p><h4><strong>Result</strong></h4><p>The system now monitors 100+ Kafka topics in real time, surfaces poison data quickly and helps teams stop issues before they cascade downstream.</p><h4><strong>Use Cases</strong></h4><p>Root cause analysis, real-time monitoring, real-time alerting</p><h4><strong>Tech Stack/Framework</strong></h4><p>Apache Kafka, Apache Flink, Amazon S3, Slack, LLM</p><div><hr></div><h3>Explained further</h3><div><hr></div><h4><strong>About Grab</strong></h4><p><a href="https://www.grab.com/">Grab</a> is often called the Uber of Southeast Asia but that might be selling it short. What started as a ride-hailing app now powers food delivery, groceries, payments and even insurance all bundled into one super app. They run across over 800 cities in 8 Southeast Asian countries. Behind the rides, meals, and payments lies an enormous stream of events flowing through Grab&#8217;s systems.</p><div><hr></div><h4>Background</h4><p>Grab runs a lot of business on streaming data. Kafka topics feed online systems, offline analytics and machine learning pipelines. When those streams are clean, life is good: teams can move faster, models behave, dashboards run smoothly. But when they&#8217;re not clean, it&#8217;s a major headache.</p><p>The tricky part is that &#8216;bad data&#8217; in Kafka isn&#8217;t always obvious. Sometimes it&#8217;s quiet: the stream still parses but key fields are wrong, missing or shaped differently than what downstream teams assume.</p><p>That&#8217;s why Grab decided to introduce a platform-level solution: Kafka stream contracts that let stream stakeholders define what &#8216;good&#8217; looks like, then automatically test streams in real time, catch issues as they happen and alert the owners quickly.</p><p>The core idea is simple:</p><ul><li><p>Let users define a data contract for a Kafka topic</p></li><li><p>Convert that contract into executable tests</p></li><li><p>Run those tests continuously</p></li><li><p>Capture the poison data plus context</p></li><li><p>Notify the right people with enough detail to act</p></li></ul><p>This supports a more decentralized, data-mesh style world where teams own their data products while still keeping the overall system reliable for everyone else.</p><div><hr></div><h4>What wasn&#8217;t working before</h4><p>Historically, monitoring Kafka stream data processing didn&#8217;t have a strong, end-to-end solution for data quality validation. That created three big issues: detecting bad data, speed of detection and lack of visibility.</p><p><strong>1- Detecting bad data</strong></p><p>This can be broken down into two further categories:</p><p><strong>1.1 Schema issues</strong></p><p>These are schema mismatches between producers and consumers that can trigger deserialization errors. Even if schema backward compatibility is validated during schema evolution, the data inside the Kafka topic can still drift from the defined schema.</p><p>One concrete example: a rogue producer writes to a topic without using the expected schema. Now you&#8217;ve got a topic that &#8216;has a schema&#8217; but real events don&#8217;t match it. The painful bit is not just knowing something broke, it&#8217;s identifying which fields are causing the mismatch.</p><p><strong>1.2 Rule and value issues</strong><br>These are disagreements about what a field <em>means</em> or what shape it should take. Kafka stream schemas define structure but they don&#8217;t enforce rules like:</p><ul><li><p>expected length for an identifier</p></li><li><p>expected string pattern</p></li><li><p>valid numeric ranges</p></li><li><p>constant values that should never change</p></li></ul><p>There wasn&#8217;t an existing framework where stakeholders could define and enforce field-level semantic rules for streams.</p><p><strong>2- Speed of detection</strong></p><p>The second issue was speed of detection. There was no real-time mechanism to automatically validate data against predefined rules, identify issues quickly and alert stakeholders promptly.</p><p>Without real-time validation, issues could stick around for a while, quietly impacting multiple online and offline downstream systems before being discovered.</p><p><strong>3- Lack of visibility</strong></p><p>Even when teams did detect a problem, it was hard to pinpoint the exact &#8216;poison data&#8217; and understand what violated the schema or the semantic expectations.</p><p>Root cause analysis becomes painful when you cannot easily answer:</p><ul><li><p>Which records were bad?</p></li><li><p>Which fields failed?</p></li><li><p>What did the bad values look like?</p></li><li><p>When did it start and how frequent is it?</p></li></ul><div><hr></div><h4>The fix</h4><p>Grab&#8217;s Coban platform provides a standardized, platform-level data quality testing and observability setup for Kafka streams. It&#8217;s built around four core ideas:</p><ol><li><p><strong>Data Contract Definition: </strong>Stream stakeholders define a contract that includes schema agreements, semantic rules the topic data must follow, and ownership metadata for alerts and notifications.</p></li><li><p><strong>Automated Test Execution: </strong>A long-running test runner automatically executes real-time tests based on that contract.</p></li><li><p><strong>Real-time Data Quality Issue Identification: </strong>The system detects data issues in real time at both schema and rules/values levels.</p></li><li><p><strong>Alerts and Result Observability: </strong>It alerts the right people and makes it easier to observe issues through the platform UI and downstream tooling.</p></li></ol><p>Put simply: define the rules once, then let the platform watch the stream continuously.</p><p>The architecture has three main components:</p><ol><li><p><strong>Data contract definition</strong></p></li><li><p><strong>Test execution and data quality issue identification</strong></p></li><li><p><strong>Result observability</strong></p></li></ol><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!opMG!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f5a043c-954d-4cfc-b21e-a3dfd346fe47_1991x743.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!opMG!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f5a043c-954d-4cfc-b21e-a3dfd346fe47_1991x743.jpeg 424w, https://substackcdn.com/image/fetch/$s_!opMG!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f5a043c-954d-4cfc-b21e-a3dfd346fe47_1991x743.jpeg 848w, https://substackcdn.com/image/fetch/$s_!opMG!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f5a043c-954d-4cfc-b21e-a3dfd346fe47_1991x743.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!opMG!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f5a043c-954d-4cfc-b21e-a3dfd346fe47_1991x743.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!opMG!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f5a043c-954d-4cfc-b21e-a3dfd346fe47_1991x743.jpeg" width="1456" height="543" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5f5a043c-954d-4cfc-b21e-a3dfd346fe47_1991x743.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:543,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:224037,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/183755897?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f5a043c-954d-4cfc-b21e-a3dfd346fe47_1991x743.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!opMG!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f5a043c-954d-4cfc-b21e-a3dfd346fe47_1991x743.jpeg 424w, https://substackcdn.com/image/fetch/$s_!opMG!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f5a043c-954d-4cfc-b21e-a3dfd346fe47_1991x743.jpeg 848w, https://substackcdn.com/image/fetch/$s_!opMG!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f5a043c-954d-4cfc-b21e-a3dfd346fe47_1991x743.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!opMG!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f5a043c-954d-4cfc-b21e-a3dfd346fe47_1991x743.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Real-time Kafka Stream Data Quality Monitoring Architecture (Source: Grab)</figcaption></figure></div><p>All Flow mentions after this refer to those diagrammed steps above</p><div><hr></div><h4><strong>Data contract definition</strong></h4><p>Coban&#8217;s contract acts as a formal agreement among Kafka stream stakeholders. It includes a few building blocks.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!KSXy!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F73bb2962-4b96-49b4-ad1d-7709b39753d7_836x758.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!KSXy!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F73bb2962-4b96-49b4-ad1d-7709b39753d7_836x758.jpeg 424w, https://substackcdn.com/image/fetch/$s_!KSXy!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F73bb2962-4b96-49b4-ad1d-7709b39753d7_836x758.jpeg 848w, https://substackcdn.com/image/fetch/$s_!KSXy!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F73bb2962-4b96-49b4-ad1d-7709b39753d7_836x758.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!KSXy!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F73bb2962-4b96-49b4-ad1d-7709b39753d7_836x758.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!KSXy!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F73bb2962-4b96-49b4-ad1d-7709b39753d7_836x758.jpeg" width="836" height="758" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/73bb2962-4b96-49b4-ad1d-7709b39753d7_836x758.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:758,&quot;width&quot;:836,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:120852,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/183755897?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F73bb2962-4b96-49b4-ad1d-7709b39753d7_836x758.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!KSXy!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F73bb2962-4b96-49b4-ad1d-7709b39753d7_836x758.jpeg 424w, https://substackcdn.com/image/fetch/$s_!KSXy!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F73bb2962-4b96-49b4-ad1d-7709b39753d7_836x758.jpeg 848w, https://substackcdn.com/image/fetch/$s_!KSXy!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F73bb2962-4b96-49b4-ad1d-7709b39753d7_836x758.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!KSXy!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F73bb2962-4b96-49b4-ad1d-7709b39753d7_836x758.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Source: Grab</figcaption></figure></div><p><strong>Kafka Stream Schema (Flow 1.1)</strong></p><p>The contract includes the schema used by the Kafka topic under test. This helps the Test Runner validate schema compatibility across data streams.</p><p>Importantly, this is not only about &#8220;did the schema change.&#8221; It&#8217;s also about &#8220;does the data actually match what everyone believes the schema is.&#8221;</p><p><strong>Kafka Stream Configuration (Flow 1.2)</strong></p><p>This includes essential config like endpoint and topic name. Coban automatically populates this so users don&#8217;t have to wire everything manually.</p><p><strong>Observability Metadata (Flow 1.3)</strong></p><p>This is where ownership becomes real. The contract includes contact details for stream stakeholders and alert configurations so the right people get notified when issues show up.</p><p><strong>Kafka Stream Semantic Test Rules (Flow 1.5)</strong></p><p>This is the heart of the semantic side. Users can define intuitive field-level rules such as:</p><ul><li><p>string pattern checks</p></li><li><p>number range checks</p></li><li><p>constant value checks</p></li></ul><p>The point is to make the &#8220;meaning&#8221; of fields enforceable, not just their data types.</p><p><strong>LLM-Based Semantic Test Rules Recommendation (Flow 1.4)</strong></p><p>Defining dozens or hundreds of field rules can overwhelm people. To reduce that setup burden, Coban uses an LLM-based feature that recommends semantic test rules based on:</p><ul><li><p>the provided Kafka stream schema</p></li><li><p>anonymized sample data</p></li></ul><p>This feature helps users set up semantic rules efficiently, as demonstrated below</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!pu8X!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff96422b2-5c1c-4f0a-8fb8-af92a409d344_1999x717.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!pu8X!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff96422b2-5c1c-4f0a-8fb8-af92a409d344_1999x717.png 424w, https://substackcdn.com/image/fetch/$s_!pu8X!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff96422b2-5c1c-4f0a-8fb8-af92a409d344_1999x717.png 848w, https://substackcdn.com/image/fetch/$s_!pu8X!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff96422b2-5c1c-4f0a-8fb8-af92a409d344_1999x717.png 1272w, https://substackcdn.com/image/fetch/$s_!pu8X!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff96422b2-5c1c-4f0a-8fb8-af92a409d344_1999x717.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!pu8X!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff96422b2-5c1c-4f0a-8fb8-af92a409d344_1999x717.png" width="1456" height="522" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f96422b2-5c1c-4f0a-8fb8-af92a409d344_1999x717.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:522,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:241835,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/183755897?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff96422b2-5c1c-4f0a-8fb8-af92a409d344_1999x717.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!pu8X!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff96422b2-5c1c-4f0a-8fb8-af92a409d344_1999x717.png 424w, https://substackcdn.com/image/fetch/$s_!pu8X!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff96422b2-5c1c-4f0a-8fb8-af92a409d344_1999x717.png 848w, https://substackcdn.com/image/fetch/$s_!pu8X!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff96422b2-5c1c-4f0a-8fb8-af92a409d344_1999x717.png 1272w, https://substackcdn.com/image/fetch/$s_!pu8X!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff96422b2-5c1c-4f0a-8fb8-af92a409d344_1999x717.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Sample UI showcasing LLM-based Kafka stream schema field-level semantic test rules (Source: Grab)</figcaption></figure></div><p>The practical benefit: users get a starting point quickly, instead of staring at a schema and trying to invent rules from scratch.</p><div><hr></div><h4><strong>Data contract transformation</strong></h4><p>Once a contract is defined, Coban&#8217;s transformation engine converts it into configurations the Test Runner can interpret (Flow 2.1).</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!nvEa!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa26d3201-18ad-4d6f-af2f-727f179fc238_1122x660.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!nvEa!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa26d3201-18ad-4d6f-af2f-727f179fc238_1122x660.jpeg 424w, https://substackcdn.com/image/fetch/$s_!nvEa!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa26d3201-18ad-4d6f-af2f-727f179fc238_1122x660.jpeg 848w, https://substackcdn.com/image/fetch/$s_!nvEa!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa26d3201-18ad-4d6f-af2f-727f179fc238_1122x660.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!nvEa!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa26d3201-18ad-4d6f-af2f-727f179fc238_1122x660.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!nvEa!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa26d3201-18ad-4d6f-af2f-727f179fc238_1122x660.jpeg" width="1122" height="660" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a26d3201-18ad-4d6f-af2f-727f179fc238_1122x660.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:660,&quot;width&quot;:1122,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:135025,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/183755897?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0d4065d7-b4c5-4f78-8761-0addce18f606_1122x660.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!nvEa!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa26d3201-18ad-4d6f-af2f-727f179fc238_1122x660.jpeg 424w, https://substackcdn.com/image/fetch/$s_!nvEa!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa26d3201-18ad-4d6f-af2f-727f179fc238_1122x660.jpeg 848w, https://substackcdn.com/image/fetch/$s_!nvEa!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa26d3201-18ad-4d6f-af2f-727f179fc238_1122x660.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!nvEa!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa26d3201-18ad-4d6f-af2f-727f179fc238_1122x660.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Source: Grab</figcaption></figure></div><p>This transformation covers four things:</p><p><strong>Kafka Stream Schema: </strong>The contract schema is translated into a schema reference format the Test Runner can parse.</p><p><strong>Kafka Stream Configuration: </strong>The Kafka stream is set up as a source for the Test Runner.</p><p><strong>Observability metadata: </strong>Contact information is turned into runtime configs for alerting and routing.</p><p><strong>Kafka Stream Semantic Test Rules: </strong>Human-readable semantic rules are transformed into an <strong>inverse SQL query</strong> that captures data violating the rules.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!SeoF!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F00cfa6bd-52be-4143-a79d-caeb71276749_1972x1104.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!SeoF!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F00cfa6bd-52be-4143-a79d-caeb71276749_1972x1104.jpeg 424w, https://substackcdn.com/image/fetch/$s_!SeoF!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F00cfa6bd-52be-4143-a79d-caeb71276749_1972x1104.jpeg 848w, https://substackcdn.com/image/fetch/$s_!SeoF!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F00cfa6bd-52be-4143-a79d-caeb71276749_1972x1104.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!SeoF!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F00cfa6bd-52be-4143-a79d-caeb71276749_1972x1104.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!SeoF!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F00cfa6bd-52be-4143-a79d-caeb71276749_1972x1104.jpeg" width="1456" height="815" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/00cfa6bd-52be-4143-a79d-caeb71276749_1972x1104.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:815,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:213548,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/183755897?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F00cfa6bd-52be-4143-a79d-caeb71276749_1972x1104.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!SeoF!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F00cfa6bd-52be-4143-a79d-caeb71276749_1972x1104.jpeg 424w, https://substackcdn.com/image/fetch/$s_!SeoF!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F00cfa6bd-52be-4143-a79d-caeb71276749_1972x1104.jpeg 848w, https://substackcdn.com/image/fetch/$s_!SeoF!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F00cfa6bd-52be-4143-a79d-caeb71276749_1972x1104.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!SeoF!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F00cfa6bd-52be-4143-a79d-caeb71276749_1972x1104.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Illustration of semantic test rules being converted from human-readable formats into inverse SQL queries (Source: Grab)</figcaption></figure></div><p>&#8216;Inverse SQL&#8217; here means the query is designed to return the <em>bad rows</em>, not the good ones. That&#8217;s a smart design choice because it keeps the output focused on what needs investigation.</p><div><hr></div><h4>Test execution &amp; data quality issue identification</h4><p>Once the transformation engine generates the configuration, the platform automatically deploys the Test Runner.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Y-bs!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbbd8212e-0700-4411-be3d-384ee376c7ec_1010x734.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Y-bs!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbbd8212e-0700-4411-be3d-384ee376c7ec_1010x734.jpeg 424w, https://substackcdn.com/image/fetch/$s_!Y-bs!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbbd8212e-0700-4411-be3d-384ee376c7ec_1010x734.jpeg 848w, https://substackcdn.com/image/fetch/$s_!Y-bs!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbbd8212e-0700-4411-be3d-384ee376c7ec_1010x734.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!Y-bs!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbbd8212e-0700-4411-be3d-384ee376c7ec_1010x734.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Y-bs!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbbd8212e-0700-4411-be3d-384ee376c7ec_1010x734.jpeg" width="1010" height="734" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/bbd8212e-0700-4411-be3d-384ee376c7ec_1010x734.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:734,&quot;width&quot;:1010,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:96110,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/183755897?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf8dc273-8996-4ec1-a825-41a85d232746_1010x734.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Y-bs!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbbd8212e-0700-4411-be3d-384ee376c7ec_1010x734.jpeg 424w, https://substackcdn.com/image/fetch/$s_!Y-bs!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbbd8212e-0700-4411-be3d-384ee376c7ec_1010x734.jpeg 848w, https://substackcdn.com/image/fetch/$s_!Y-bs!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbbd8212e-0700-4411-be3d-384ee376c7ec_1010x734.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!Y-bs!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbbd8212e-0700-4411-be3d-384ee376c7ec_1010x734.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Source: Grab</figcaption></figure></div><p><strong>Test runner</strong></p><p>The Test Runner uses FlinkSQL as its compute engine. FlinkSQL was chosen because it makes defining rules straightforward using SQL statements, which also makes it easier for the platform to convert contracts into enforceable checks.</p><p><strong>Test execution workflow and problematic data identification</strong></p><p>Below are the 4 steps undertaken to execute the test and identify problematic data:</p><ol><li><p><strong>Consume Kafka data (Flow 2.2)</strong><br>FlinkSQL consumes data from the Kafka topic under test using its own consumer group. This is important because it avoids impacting other consumers.</p></li><li><p><strong>Run inverse SQL (Flow 2.3)</strong><br>The Test Runner runs the inverse SQL query to identify:</p><ul><li><p>data that violates semantic rules</p></li><li><p>data that is syntactically incorrect &#8220;in the first place&#8221;</p></li></ul></li><li><p><strong>Publish data quality issue events (Flow 3.2)</strong><br>When bad data is found, the Test Runner packages it into a data quality issue event enriched with:</p><ul><li><p>a test summary</p></li><li><p>total count of bad records</p></li><li><p>sample bad data</p></li></ul><p>Then it publishes the event to a dedicated Kafka topic.</p></li><li><p><strong>Sink events to S3 (Flow 3.1)</strong><br>The platform also sinks all data quality events to an AWS S3 bucket for deeper observability and analysis.</p></li></ol><p>This combo (Kafka for realtime events, S3 for deeper inspection) gives both fast alerting and a more durable store for later analysis.</p><div><hr></div><h4>Result observability</h4><p>Grab&#8217;s in-house data quality observability platform, Genchi, consumes the problematic data captured by the Test Runner (Flow 3.3).</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!n2A8!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b376a29-23b4-420d-b72c-def7101963c5_838x618.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!n2A8!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b376a29-23b4-420d-b72c-def7101963c5_838x618.jpeg 424w, https://substackcdn.com/image/fetch/$s_!n2A8!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b376a29-23b4-420d-b72c-def7101963c5_838x618.jpeg 848w, https://substackcdn.com/image/fetch/$s_!n2A8!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b376a29-23b4-420d-b72c-def7101963c5_838x618.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!n2A8!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b376a29-23b4-420d-b72c-def7101963c5_838x618.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!n2A8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b376a29-23b4-420d-b72c-def7101963c5_838x618.jpeg" width="838" height="618" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8b376a29-23b4-420d-b72c-def7101963c5_838x618.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:618,&quot;width&quot;:838,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:64056,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/183755897?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b376a29-23b4-420d-b72c-def7101963c5_838x618.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!n2A8!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b376a29-23b4-420d-b72c-def7101963c5_838x618.jpeg 424w, https://substackcdn.com/image/fetch/$s_!n2A8!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b376a29-23b4-420d-b72c-def7101963c5_838x618.jpeg 848w, https://substackcdn.com/image/fetch/$s_!n2A8!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b376a29-23b4-420d-b72c-def7101963c5_838x618.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!n2A8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b376a29-23b4-420d-b72c-def7101963c5_838x618.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Source: Grab</figcaption></figure></div><p><strong>Alerting</strong></p><p>Genchi sends Slack notifications to stream owners listed in the contract&#8217;s observability metadata (Flow 3.5).</p><p>Those notifications include useful debugging context such as:</p><ul><li><p>links to sample data in the Coban UI</p></li><li><p>observed time windows</p></li><li><p>counts of bad records</p></li><li><p>other relevant details</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!avzo!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff156f000-5325-4c18-b58c-1987c5cac707_1314x478.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!avzo!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff156f000-5325-4c18-b58c-1987c5cac707_1314x478.png 424w, https://substackcdn.com/image/fetch/$s_!avzo!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff156f000-5325-4c18-b58c-1987c5cac707_1314x478.png 848w, https://substackcdn.com/image/fetch/$s_!avzo!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff156f000-5325-4c18-b58c-1987c5cac707_1314x478.png 1272w, https://substackcdn.com/image/fetch/$s_!avzo!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff156f000-5325-4c18-b58c-1987c5cac707_1314x478.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!avzo!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff156f000-5325-4c18-b58c-1987c5cac707_1314x478.png" width="1314" height="478" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f156f000-5325-4c18-b58c-1987c5cac707_1314x478.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:478,&quot;width&quot;:1314,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:104689,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/183755897?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff156f000-5325-4c18-b58c-1987c5cac707_1314x478.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!avzo!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff156f000-5325-4c18-b58c-1987c5cac707_1314x478.png 424w, https://substackcdn.com/image/fetch/$s_!avzo!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff156f000-5325-4c18-b58c-1987c5cac707_1314x478.png 848w, https://substackcdn.com/image/fetch/$s_!avzo!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff156f000-5325-4c18-b58c-1987c5cac707_1314x478.png 1272w, https://substackcdn.com/image/fetch/$s_!avzo!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff156f000-5325-4c18-b58c-1987c5cac707_1314x478.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Sample Slack notifications (Source: Grab)</figcaption></figure></div><p>The key point is that alerts are not just &#8216;something broke&#8217;, they include the information you need to start investigating.</p><p><strong>Observability</strong></p><p>Users can access the Coban UI (Flow 3.4) to see:</p><ul><li><p>Kafka stream test rules</p></li><li><p>sample bad records</p></li><li><p>highlighted fields and values that violate rules</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!iqrn!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2d9e3d9b-5030-4a02-b677-8bd277354196_1551x486.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!iqrn!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2d9e3d9b-5030-4a02-b677-8bd277354196_1551x486.jpeg 424w, https://substackcdn.com/image/fetch/$s_!iqrn!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2d9e3d9b-5030-4a02-b677-8bd277354196_1551x486.jpeg 848w, https://substackcdn.com/image/fetch/$s_!iqrn!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2d9e3d9b-5030-4a02-b677-8bd277354196_1551x486.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!iqrn!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2d9e3d9b-5030-4a02-b677-8bd277354196_1551x486.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!iqrn!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2d9e3d9b-5030-4a02-b677-8bd277354196_1551x486.jpeg" width="1456" height="456" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2d9e3d9b-5030-4a02-b677-8bd277354196_1551x486.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:456,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:108090,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/183755897?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2d9e3d9b-5030-4a02-b677-8bd277354196_1551x486.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!iqrn!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2d9e3d9b-5030-4a02-b677-8bd277354196_1551x486.jpeg 424w, https://substackcdn.com/image/fetch/$s_!iqrn!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2d9e3d9b-5030-4a02-b677-8bd277354196_1551x486.jpeg 848w, https://substackcdn.com/image/fetch/$s_!iqrn!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2d9e3d9b-5030-4a02-b677-8bd277354196_1551x486.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!iqrn!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2d9e3d9b-5030-4a02-b677-8bd277354196_1551x486.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">The highlighted fields indicate violations of the semantic test rules (Source: Grab)</figcaption></figure></div><p>That UI piece matters because it shortens the path from &#8216;alert received&#8217; to &#8216;I know what field is failing and what the bad values look like.&#8217;</p><div><hr></div><h4>Results so far</h4><p>Since deploying earlier in the year, this solution enabled Kafka stream users to:</p><ul><li><p>define contracts with both schema and semantic rules</p></li><li><p>automate real-time test execution</p></li><li><p>alert stakeholders when problematic data is detected so they can act quickly</p></li></ul><p>It has been actively monitoring data quality across <strong>100+ critical Kafka topics</strong>.</p><p>The solution also offers the capability to immediately identify and halt the propagation of invalid data across multiple streams.</p><div><hr></div><h4>Wrapping up</h4><p>Grab implemented and rolled out a real-time data quality monitoring solution for Kafka streams through the Coban platform.</p><p>The key outcomes include:</p><ul><li><p>engineers can define syntactic and semantic tests through a data contract</p></li><li><p>tests run automatically in real time via a long-running Test Runner based on FlinkSQL</p></li><li><p>issues trigger fast Slack alerts through Genchi using ownership metadata in the contract</p></li><li><p>teams get better visibility into exactly which data fields violate rules via the Coban UI</p></li></ul><p>In short: Coban turned data quality from a vague hope into something stream owners can specify, enforce and observe in real time.</p><div><hr></div><h3>The full scoop</h3><p>To learn more about this, check <a href="https://engineering.grab.com/real-time-data-quality-monitoring">Grab's Engineering Blog</a> post on this topic</p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.datatinkerer.io/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">If you liked this post and don&#8217;t want to miss the next one, subscribe to Data Tinkerer!</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>If you are already subscribed and enjoyed the article, please give it a like and/or share it others, really appreciate it &#128591;</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.datatinkerer.io/p/how-grab-detects-data-issues-across-100-kafka-topics?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.datatinkerer.io/p/how-grab-detects-data-issues-across-100-kafka-topics?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><div><hr></div><h3>Keep learning</h3><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;2b5e61e3-2de5-4088-981d-80de61411bd4&quot;,&quot;caption&quot;:&quot;Uber rebuilt its data lake ingestion to move freshness from hours to minutes.<br /><br />This piece breaks down how they replaced batch Spark jobs with Flink streaming, cut compute by 25% and dealt with the very real problems that show up at petabyte scale.&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;lg&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;How Uber Cut Data Lake Freshness From Hours to Minutes With Flink&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:291590442,&quot;name&quot;:&quot;Data Tinkerer&quot;,&quot;bio&quot;:&quot;Ex-head of analytics sharing deep dives and learnings about AI and all things data (science, engineering, analysis)&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/83d3bbe0-5fb8-4f8d-9b74-036abbd6fec9_500x500.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2026-01-02T04:30:31.300Z&quot;,&quot;cover_image&quot;:&quot;https://images.unsplash.com/photo-1615929362369-4af8423ebd68?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxNHx8dWJlcnxlbnwwfHx8fDE3NjcwNzc2NDZ8MA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.datatinkerer.io/p/how-uber-cut-data-lake-freshness-from-hours-to-minutes-with-flink&quot;,&quot;section_name&quot;:&quot;Data Engineering&quot;,&quot;video_upload_id&quot;:null,&quot;id&quot;:182833470,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:17,&quot;comment_count&quot;:1,&quot;publication_id&quot;:3422740,&quot;publication_name&quot;:&quot;Data Tinkerer&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!JEdj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6bea5ccd-f356-4154-a1db-16268300510e_500x500.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;1904c23f-5462-4150-9c60-b6ad712234b6&quot;,&quot;caption&quot;:&quot;How do you keep ML teams fast when every experiment blasts your Spark cluster with spiky workloads, huge datasets and five different file formats?<br /><br />Snap&#8217;s answer: Prism, a unified Spark platform that hides infra pain, standardises pipelines and supports everything from ad-hoc exploration to 10k+ daily jobs in production.<br /><br />This post breaks down why raw Spark wasn&#8217;t enough, what Prism fixes and how Snap rebuilt their ML data layer without ditching Spark.&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;lg&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;How Snap Rebuilt Its ML Platform to Handle 10,000+ Daily Spark Jobs&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:291590442,&quot;name&quot;:&quot;Data Tinkerer&quot;,&quot;bio&quot;:&quot;Ex-head of analytics sharing deep dives and learnings about AI and all things data (science, engineering, analysis)&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/83d3bbe0-5fb8-4f8d-9b74-036abbd6fec9_500x500.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2025-11-20T04:59:47.340Z&quot;,&quot;cover_image&quot;:&quot;https://images.unsplash.com/photo-1620396749162-d61bdb715793?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw1fHxzbmFwY2hhdHxlbnwwfHx8fDE3NjM1Mjk0MDZ8MA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.datatinkerer.io/p/how-snap-rebuilt-its-ml-platform-to-handle-10000-daily-spark-jobs&quot;,&quot;section_name&quot;:&quot;Data Engineering&quot;,&quot;video_upload_id&quot;:null,&quot;id&quot;:179211962,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:9,&quot;comment_count&quot;:0,&quot;publication_id&quot;:3422740,&quot;publication_name&quot;:&quot;Data Tinkerer&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!JEdj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6bea5ccd-f356-4154-a1db-16268300510e_500x500.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div>]]></content:encoded></item><item><title><![CDATA[How Uber Cut Data Lake Freshness From Hours to Minutes With Flink]]></title><description><![CDATA[Why Uber moved ingestion from Spark batch to Flink streaming and what it took to run thousands of jobs reliably at petabyte scale.]]></description><link>https://www.datatinkerer.io/p/how-uber-cut-data-lake-freshness-from-hours-to-minutes-with-flink</link><guid isPermaLink="false">https://www.datatinkerer.io/p/how-uber-cut-data-lake-freshness-from-hours-to-minutes-with-flink</guid><dc:creator><![CDATA[Data Tinkerer]]></dc:creator><pubDate>Fri, 02 Jan 2026 04:30:31 GMT</pubDate><enclosure url="https://images.unsplash.com/photo-1615929362369-4af8423ebd68?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxNHx8dWJlcnxlbnwwfHx8fDE3NjcwNzc2NDZ8MA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Fellow Data Tinkerers!</p><p>Today we will look at how Uber moved from batch to streaming in their data lake.</p><p>But before that, I wanted to share with you what you could unlock if you share Data Tinkerer with just <strong>1 more person</strong>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!05-P!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a2affdf-e44f-452e-b02a-93809ad29f60_800x402.gif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!05-P!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a2affdf-e44f-452e-b02a-93809ad29f60_800x402.gif 424w, https://substackcdn.com/image/fetch/$s_!05-P!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a2affdf-e44f-452e-b02a-93809ad29f60_800x402.gif 848w, https://substackcdn.com/image/fetch/$s_!05-P!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a2affdf-e44f-452e-b02a-93809ad29f60_800x402.gif 1272w, https://substackcdn.com/image/fetch/$s_!05-P!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a2affdf-e44f-452e-b02a-93809ad29f60_800x402.gif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!05-P!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a2affdf-e44f-452e-b02a-93809ad29f60_800x402.gif" width="800" height="402" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6a2affdf-e44f-452e-b02a-93809ad29f60_800x402.gif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:402,&quot;width&quot;:800,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:3369150,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/gif&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/182833470?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a2affdf-e44f-452e-b02a-93809ad29f60_800x402.gif&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!05-P!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a2affdf-e44f-452e-b02a-93809ad29f60_800x402.gif 424w, https://substackcdn.com/image/fetch/$s_!05-P!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a2affdf-e44f-452e-b02a-93809ad29f60_800x402.gif 848w, https://substackcdn.com/image/fetch/$s_!05-P!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a2affdf-e44f-452e-b02a-93809ad29f60_800x402.gif 1272w, https://substackcdn.com/image/fetch/$s_!05-P!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a2affdf-e44f-452e-b02a-93809ad29f60_800x402.gif 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>There are 100+ resources to learn all things data (science, engineering, analysis). It includes videos, courses, projects and can be filtered by tech stack (Python, SQL, Spark and etc), skill level (Beginner, Intermediate and so on) provider name or free/paid. So if you know other people who like staying up to date on all things data, please share Data Tinkerer with them!</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.datatinkerer.io/leaderboard?&amp;utm_source=post&quot;,&quot;text&quot;:&quot;Refer a friend&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.datatinkerer.io/leaderboard?&amp;utm_source=post"><span>Refer a friend</span></a></p><p>Now, with that out of the way, let&#8217;s get to Uber&#8217;s streaming solution</p>
      <p>
          <a href="https://www.datatinkerer.io/p/how-uber-cut-data-lake-freshness-from-hours-to-minutes-with-flink">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[How Snap Rebuilt Its ML Platform to Handle 10,000+ Daily Spark Jobs]]></title><description><![CDATA[Inside Prism, the system that turned scattered Spark workflows into a unified, ML-ready platform.]]></description><link>https://www.datatinkerer.io/p/how-snap-rebuilt-its-ml-platform-to-handle-10000-daily-spark-jobs</link><guid isPermaLink="false">https://www.datatinkerer.io/p/how-snap-rebuilt-its-ml-platform-to-handle-10000-daily-spark-jobs</guid><dc:creator><![CDATA[Data Tinkerer]]></dc:creator><pubDate>Thu, 20 Nov 2025 04:59:47 GMT</pubDate><enclosure url="https://images.unsplash.com/photo-1620396749162-d61bdb715793?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw1fHxzbmFwY2hhdHxlbnwwfHx8fDE3NjM1Mjk0MDZ8MA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Fellow Data Tinkerers!</p><p>Today we will look at how Snap unified Spark, ML workflows and 10k+ daily jobs under one platform.</p><p>But before that, I wanted to share with you what you could unlock if you share Data Tinkerer with just <strong>1 more person</strong>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!TBRd!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F177f23c2-03d2-4ec5-bc67-33e5bb8b23fd_800x402.gif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!TBRd!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F177f23c2-03d2-4ec5-bc67-33e5bb8b23fd_800x402.gif 424w, https://substackcdn.com/image/fetch/$s_!TBRd!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F177f23c2-03d2-4ec5-bc67-33e5bb8b23fd_800x402.gif 848w, https://substackcdn.com/image/fetch/$s_!TBRd!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F177f23c2-03d2-4ec5-bc67-33e5bb8b23fd_800x402.gif 1272w, https://substackcdn.com/image/fetch/$s_!TBRd!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F177f23c2-03d2-4ec5-bc67-33e5bb8b23fd_800x402.gif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!TBRd!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F177f23c2-03d2-4ec5-bc67-33e5bb8b23fd_800x402.gif" width="800" height="402" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/177f23c2-03d2-4ec5-bc67-33e5bb8b23fd_800x402.gif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:402,&quot;width&quot;:800,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:3369150,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/gif&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/179211962?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F177f23c2-03d2-4ec5-bc67-33e5bb8b23fd_800x402.gif&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!TBRd!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F177f23c2-03d2-4ec5-bc67-33e5bb8b23fd_800x402.gif 424w, https://substackcdn.com/image/fetch/$s_!TBRd!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F177f23c2-03d2-4ec5-bc67-33e5bb8b23fd_800x402.gif 848w, https://substackcdn.com/image/fetch/$s_!TBRd!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F177f23c2-03d2-4ec5-bc67-33e5bb8b23fd_800x402.gif 1272w, https://substackcdn.com/image/fetch/$s_!TBRd!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F177f23c2-03d2-4ec5-bc67-33e5bb8b23fd_800x402.gif 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>There are 100+ resources to learn all things data (science, engineering, analysis). It includes videos, courses, projects and can be filtered by tech stack (Python, SQL, Spark and etc), skill level (Beginner, Intermediate and so on) provider name or free/paid. So if you know other people who like staying up to date on all things data, please share Data Tinkerer with them!</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.datatinkerer.io/leaderboard?&amp;utm_source=post&quot;,&quot;text&quot;:&quot;Refer a friend&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.datatinkerer.io/leaderboard?&amp;utm_source=post"><span>Refer a friend</span></a></p><p>Now, with that out of the way, let&#8217;s get to Snap&#8217;s ML platform transformation.</p>
      <p>
          <a href="https://www.datatinkerer.io/p/how-snap-rebuilt-its-ml-platform-to-handle-10000-daily-spark-jobs">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[From Marketing to Data Engineering: How I Made the Switch]]></title><description><![CDATA[How one marketer followed the trail of tracking pixels into pipelines and built a career turning messy data into usable systems.]]></description><link>https://www.datatinkerer.io/p/from-marketing-to-data-engineering-how-i-made-the-switch</link><guid isPermaLink="false">https://www.datatinkerer.io/p/from-marketing-to-data-engineering-how-i-made-the-switch</guid><dc:creator><![CDATA[Data Tinkerer]]></dc:creator><pubDate>Thu, 16 Oct 2025 04:01:28 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/9f252334-7437-40b5-9f82-08c981de2f6d_761x764.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Fellow Data Tinkerers,</p><p>Lately, I&#8217;ve been thinking about starting a new series where people working in data share how they got here, what they&#8217;ve learned along the way and what their day-to-day looks like.</p><p>So, I&#8217;m kicking it off today with <span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Alejandro Aboy&quot;,&quot;id&quot;:22949723,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!u1Ao!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdca2c63d-9f5e-4cd3-99ac-7d8e71dc114b_1024x1024.jpeg&quot;,&quot;uuid&quot;:&quot;ee6ab692-9d69-4714-8986-9b599e2b5557&quot;}" data-component-name="MentionToDOM"></span>, Senior Data Engineer at Workpath and writer of <em>The Pipe and The Line</em> newsletter.</p><div class="embedded-publication-wrap" data-attrs="{&quot;id&quot;:1196229,&quot;name&quot;:&quot;The Pipe &amp; The Line&quot;,&quot;logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!vmrQ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d5b2131-da28-4621-ad6f-9574cbc41a1e_500x500.png&quot;,&quot;base_url&quot;:&quot;https://thepipeandtheline.substack.com&quot;,&quot;hero_text&quot;:&quot;Hands-on guides, tools, and experiments to sharpen your Data &amp; AI Engineering skills from someone who learned it all in the wild.&quot;,&quot;author_name&quot;:&quot;Alejandro Aboy&quot;,&quot;show_subscribe&quot;:true,&quot;logo_bg_color&quot;:&quot;#131826&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="EmbeddedPublicationToDOMWithSubscribe"><div class="embedded-publication show-subscribe"><a class="embedded-publication-link-part" native="true" href="https://thepipeandtheline.substack.com?utm_source=substack&amp;utm_campaign=publication_embed&amp;utm_medium=web"><img class="embedded-publication-logo" src="https://substackcdn.com/image/fetch/$s_!vmrQ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d5b2131-da28-4621-ad6f-9574cbc41a1e_500x500.png" width="56" height="56" style="background-color: rgb(19, 24, 38);"><span class="embedded-publication-name">The Pipe &amp; The Line</span><div class="embedded-publication-hero-text">Hands-on guides, tools, and experiments to sharpen your Data &amp; AI Engineering skills from someone who learned it all in the wild.</div><div class="embedded-publication-author-name">By Alejandro Aboy</div></a><form class="embedded-publication-subscribe" method="GET" action="https://thepipeandtheline.substack.com/subscribe?"><input type="hidden" name="source" value="publication-embed"><input type="hidden" name="autoSubmit" value="true"><input type="email" class="email-input" name="email" placeholder="Type your email..."><input type="submit" class="button primary" value="Subscribe"></form></div></div><p>We talked about how he went from marketing to data engineering, what his workflow looks like, why he was called an <em>octopus</em> and why he thinks &#8220;big data&#8221; is a fool&#8217;s errand for most teams.</p><p>So without further ado, let&#8217;s get into it!</p>
      <p>
          <a href="https://www.datatinkerer.io/p/from-marketing-to-data-engineering-how-i-made-the-switch">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[How Shopify Uses Change Data Capture to Serve Millions of Merchants]]></title><description><![CDATA[From batch queries to streaming 100k records per second during peak load]]></description><link>https://www.datatinkerer.io/p/how-shopify-uses-change-data-capture-to-serve-millions-of-merchants</link><guid isPermaLink="false">https://www.datatinkerer.io/p/how-shopify-uses-change-data-capture-to-serve-millions-of-merchants</guid><dc:creator><![CDATA[Data Tinkerer]]></dc:creator><pubDate>Thu, 18 Sep 2025 07:53:42 GMT</pubDate><enclosure url="https://images.unsplash.com/photo-1730818874996-dea4bddf5554?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw1fHxzaG9waWZ5fGVufDB8fHx8MTc1ODE4MDY0NHww&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Fellow Data Tinkerers!</p><p>Today we will look at how Shopify built a real-time data pipeline at 400TB scale.</p><p>But before that, I wanted to share with you what you could unlock if you share Data Tinkerer with just <strong>1 more person</strong>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!K9fn!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa1ed777b-9a73-44d8-9095-d6c0df4aa853_800x402.gif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!K9fn!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa1ed777b-9a73-44d8-9095-d6c0df4aa853_800x402.gif 424w, https://substackcdn.com/image/fetch/$s_!K9fn!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa1ed777b-9a73-44d8-9095-d6c0df4aa853_800x402.gif 848w, https://substackcdn.com/image/fetch/$s_!K9fn!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa1ed777b-9a73-44d8-9095-d6c0df4aa853_800x402.gif 1272w, https://substackcdn.com/image/fetch/$s_!K9fn!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa1ed777b-9a73-44d8-9095-d6c0df4aa853_800x402.gif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!K9fn!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa1ed777b-9a73-44d8-9095-d6c0df4aa853_800x402.gif" width="800" height="402" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a1ed777b-9a73-44d8-9095-d6c0df4aa853_800x402.gif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:402,&quot;width&quot;:800,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:3369150,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/gif&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/173822667?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa1ed777b-9a73-44d8-9095-d6c0df4aa853_800x402.gif&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!K9fn!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa1ed777b-9a73-44d8-9095-d6c0df4aa853_800x402.gif 424w, https://substackcdn.com/image/fetch/$s_!K9fn!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa1ed777b-9a73-44d8-9095-d6c0df4aa853_800x402.gif 848w, https://substackcdn.com/image/fetch/$s_!K9fn!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa1ed777b-9a73-44d8-9095-d6c0df4aa853_800x402.gif 1272w, https://substackcdn.com/image/fetch/$s_!K9fn!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa1ed777b-9a73-44d8-9095-d6c0df4aa853_800x402.gif 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>There are 100+ resources to learn all things data (science, engineering, analysis). It includes videos, courses, projects and can be filtered by tech stack (Python, SQL, Spark and etc), skill level (Beginner, Intermediate and so on) provider name or free/paid. So if you know other people who like staying up to date on all things data, please share Data Tinkerer with them!</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.datatinkerer.io/leaderboard?&amp;utm_source=post&quot;,&quot;text&quot;:&quot;Refer a friend&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.datatinkerer.io/leaderboard?&amp;utm_source=post"><span>Refer a friend</span></a></p><p>Now, with that out of the way, let&#8217;s get to change data capture at Shopify!</p>
      <p>
          <a href="https://www.datatinkerer.io/p/how-shopify-uses-change-data-capture-to-serve-millions-of-merchants">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[How Grab Shrunk Real-Time Queries from 5 Minutes to 1 with FlinkSQL and Kafka]]></title><description><![CDATA[With SQL as the interface, analysts and engineers can now explore streams and deploy pipelines in under 10 minutes.]]></description><link>https://www.datatinkerer.io/p/how-grab-shrunk-real-time-queries-from-five-minutes-to-one</link><guid isPermaLink="false">https://www.datatinkerer.io/p/how-grab-shrunk-real-time-queries-from-five-minutes-to-one</guid><dc:creator><![CDATA[Data Tinkerer]]></dc:creator><pubDate>Thu, 21 Aug 2025 06:45:35 GMT</pubDate><enclosure url="https://images.unsplash.com/photo-1587476351660-e9fa4bb8b26c?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwyfHxncmFifGVufDB8fHx8MTc1NTQ5OTAzOHww&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Fellow Data Tinkerers!</p><p>Today we will look at how Grab made real-time processing faster for its users. </p><p>But before that, I wanted to share with you what you could unlock if you share Data Tinkerer with just <strong>1 more person</strong>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!4JBO!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc192a623-a624-4963-a1d5-a293bf3f7e08_800x402.gif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!4JBO!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc192a623-a624-4963-a1d5-a293bf3f7e08_800x402.gif 424w, https://substackcdn.com/image/fetch/$s_!4JBO!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc192a623-a624-4963-a1d5-a293bf3f7e08_800x402.gif 848w, https://substackcdn.com/image/fetch/$s_!4JBO!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc192a623-a624-4963-a1d5-a293bf3f7e08_800x402.gif 1272w, https://substackcdn.com/image/fetch/$s_!4JBO!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc192a623-a624-4963-a1d5-a293bf3f7e08_800x402.gif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!4JBO!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc192a623-a624-4963-a1d5-a293bf3f7e08_800x402.gif" width="800" height="402" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c192a623-a624-4963-a1d5-a293bf3f7e08_800x402.gif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:402,&quot;width&quot;:800,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:3369150,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/gif&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/171226398?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc192a623-a624-4963-a1d5-a293bf3f7e08_800x402.gif&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!4JBO!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc192a623-a624-4963-a1d5-a293bf3f7e08_800x402.gif 424w, https://substackcdn.com/image/fetch/$s_!4JBO!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc192a623-a624-4963-a1d5-a293bf3f7e08_800x402.gif 848w, https://substackcdn.com/image/fetch/$s_!4JBO!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc192a623-a624-4963-a1d5-a293bf3f7e08_800x402.gif 1272w, https://substackcdn.com/image/fetch/$s_!4JBO!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc192a623-a624-4963-a1d5-a293bf3f7e08_800x402.gif 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>There are 100+ resources to learn all things data (science, engineering, analysis). It includes videos, courses, projects and can be filtered by tech stack (Python, SQL, Spark and etc), skill level (Beginner, Intermediate and so on) provider name or free/paid. So if you know other people who like staying up to date on all things data, please share Data Tinkerer with them!</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.datatinkerer.io/leaderboard?&amp;utm_source=post&quot;,&quot;text&quot;:&quot;Refer a friend&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.datatinkerer.io/leaderboard?&amp;utm_source=post"><span>Refer a friend</span></a></p><p>Now, with that out of the way, let&#8217;s get to real-time processing at Grab!</p>
      <p>
          <a href="https://www.datatinkerer.io/p/how-grab-shrunk-real-time-queries-from-five-minutes-to-one">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[How Expedia Monitors 1000+ A/B Tests in Real Time with Flink and Kafka]]></title><description><![CDATA[A look inside the pipeline that spots underperforming experiments in minutes and not days]]></description><link>https://www.datatinkerer.io/p/how-expedia-monitors-1000-ab-tests-in-real-time-with-flink-and-kafka</link><guid isPermaLink="false">https://www.datatinkerer.io/p/how-expedia-monitors-1000-ab-tests-in-real-time-with-flink-and-kafka</guid><dc:creator><![CDATA[Data Tinkerer]]></dc:creator><pubDate>Thu, 24 Jul 2025 07:55:43 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!jZ_b!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F961833e9-d13b-4216-a74d-8aebaa3c9fc1_1024x1024.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Fellow Data Tinkerers!</p><p>Today we will look at how Expedia Group monitors A/B tests at a large scale</p><p>But before that, I wanted to share an example of what you could unlock if you share Data Tinkerer with just <strong>2 other people </strong>and they subscribe to the newsletter.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!b1ZD!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d63d619-0614-4aba-ae5d-81e6863b4663_1650x1275.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!b1ZD!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d63d619-0614-4aba-ae5d-81e6863b4663_1650x1275.jpeg 424w, https://substackcdn.com/image/fetch/$s_!b1ZD!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d63d619-0614-4aba-ae5d-81e6863b4663_1650x1275.jpeg 848w, https://substackcdn.com/image/fetch/$s_!b1ZD!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d63d619-0614-4aba-ae5d-81e6863b4663_1650x1275.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!b1ZD!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d63d619-0614-4aba-ae5d-81e6863b4663_1650x1275.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!b1ZD!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d63d619-0614-4aba-ae5d-81e6863b4663_1650x1275.jpeg" width="1456" height="1125" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5d63d619-0614-4aba-ae5d-81e6863b4663_1650x1275.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1125,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1291918,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/169094273?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d63d619-0614-4aba-ae5d-81e6863b4663_1650x1275.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!b1ZD!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d63d619-0614-4aba-ae5d-81e6863b4663_1650x1275.jpeg 424w, https://substackcdn.com/image/fetch/$s_!b1ZD!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d63d619-0614-4aba-ae5d-81e6863b4663_1650x1275.jpeg 848w, https://substackcdn.com/image/fetch/$s_!b1ZD!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d63d619-0614-4aba-ae5d-81e6863b4663_1650x1275.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!b1ZD!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d63d619-0614-4aba-ae5d-81e6863b4663_1650x1275.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>There are 100+ more cheat sheets covering everything from Python, R, SQL, Spark to Power BI, Tableau, Git and many more. So if you know other people who like staying up to date on all things data, please share Data Tinkerer with them!</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.datatinkerer.io/leaderboard?&amp;referrer_token=4tlsmi&amp;utm_source=post&quot;,&quot;text&quot;:&quot;Refer a friend&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://www.datatinkerer.io/leaderboard?&amp;referrer_token=4tlsmi&amp;utm_source=post"><span>Refer a friend</span></a></p><p>Now, with that out of the way, let&#8217;s get the real-time monitoring of A/B tests by Expedia</p>
      <p>
          <a href="https://www.datatinkerer.io/p/how-expedia-monitors-1000-ab-tests-in-real-time-with-flink-and-kafka">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[How Bolt Reconciles €2B in Revenue Using Airflow, Spark and dbt]]></title><description><![CDATA[A look under the hood of a multi-country finance pipeline that ingests raw data, models discrepancies and reconciles cash flows at scale.]]></description><link>https://www.datatinkerer.io/p/how-bolt-reconciles-2b-in-revenue-using-airflow-spark-and-dbt</link><guid isPermaLink="false">https://www.datatinkerer.io/p/how-bolt-reconciles-2b-in-revenue-using-airflow-spark-and-dbt</guid><dc:creator><![CDATA[Data Tinkerer]]></dc:creator><pubDate>Thu, 03 Jul 2025 09:53:52 GMT</pubDate><enclosure url="https://images.unsplash.com/photo-1715351123666-6a9c4f180c54?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw1MXx8Ym9sdHxlbnwwfHx8fDE3NTE0MzU2NjN8MA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Fellow Data Tinkerers!</p><p>Today we will look at how Bolt tracks payments at scale.</p><p>But before that, I wanted to share an example of what you could unlock if you share Data Tinkerer with just <strong>2 other people </strong>and they subscribe to the newsletter.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ufW-!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fea22c22d-33fa-419b-a1db-9e654a2510d6_1650x1275.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ufW-!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fea22c22d-33fa-419b-a1db-9e654a2510d6_1650x1275.jpeg 424w, https://substackcdn.com/image/fetch/$s_!ufW-!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fea22c22d-33fa-419b-a1db-9e654a2510d6_1650x1275.jpeg 848w, https://substackcdn.com/image/fetch/$s_!ufW-!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fea22c22d-33fa-419b-a1db-9e654a2510d6_1650x1275.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!ufW-!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fea22c22d-33fa-419b-a1db-9e654a2510d6_1650x1275.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ufW-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fea22c22d-33fa-419b-a1db-9e654a2510d6_1650x1275.jpeg" width="1456" height="1125" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ea22c22d-33fa-419b-a1db-9e654a2510d6_1650x1275.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1125,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1357928,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/167406813?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fea22c22d-33fa-419b-a1db-9e654a2510d6_1650x1275.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ufW-!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fea22c22d-33fa-419b-a1db-9e654a2510d6_1650x1275.jpeg 424w, https://substackcdn.com/image/fetch/$s_!ufW-!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fea22c22d-33fa-419b-a1db-9e654a2510d6_1650x1275.jpeg 848w, https://substackcdn.com/image/fetch/$s_!ufW-!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fea22c22d-33fa-419b-a1db-9e654a2510d6_1650x1275.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!ufW-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fea22c22d-33fa-419b-a1db-9e654a2510d6_1650x1275.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>There are 100+ more cheat sheets covering everything from Python, R, SQL, Spark to Power BI, Tableau, Git and many more. So if you know other people who like staying up to date on all things data, please share Data Tinkerer with them!</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.datatinkerer.io/leaderboard?&amp;referrer_token=4tlsmi&amp;utm_source=post&quot;,&quot;text&quot;:&quot;Refer a friend&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://www.datatinkerer.io/leaderboard?&amp;referrer_token=4tlsmi&amp;utm_source=post"><span>Refer a friend</span></a></p><p>Now, with that out of the way, let&#8217;s get to how Bolt deals with processing payments from millions of customers</p>
      <p>
          <a href="https://www.datatinkerer.io/p/how-bolt-reconciles-2b-in-revenue-using-airflow-spark-and-dbt">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[How Flipkart Scaled Delivery Date Calculation 10x While Slashing Latency by 90%]]></title><description><![CDATA[Optimising for 100 items in 100ms without breaking the backend (or the bank)]]></description><link>https://www.datatinkerer.io/p/how-flipkart-scaled-delivery-date-calculation-10x-while-slashing-latency-by-90-percent</link><guid isPermaLink="false">https://www.datatinkerer.io/p/how-flipkart-scaled-delivery-date-calculation-10x-while-slashing-latency-by-90-percent</guid><dc:creator><![CDATA[Data Tinkerer]]></dc:creator><pubDate>Fri, 13 Jun 2025 09:10:50 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5e325e31-1188-4171-a6d4-9d88b490de17_1024x1536.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Fellow Data Tinkerers!</p><p>Today we will look at how Flipkart solved the problem of calculating delivery date.</p><p>But before that, I wanted to share an example of what you could unlock if you share Data Tinkerer with just <strong>2 other people</strong>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!-CMo!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fadf294af-e15c-41c8-9dfd-c0541054526b_3300x2550.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!-CMo!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fadf294af-e15c-41c8-9dfd-c0541054526b_3300x2550.jpeg 424w, https://substackcdn.com/image/fetch/$s_!-CMo!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fadf294af-e15c-41c8-9dfd-c0541054526b_3300x2550.jpeg 848w, https://substackcdn.com/image/fetch/$s_!-CMo!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fadf294af-e15c-41c8-9dfd-c0541054526b_3300x2550.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!-CMo!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fadf294af-e15c-41c8-9dfd-c0541054526b_3300x2550.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!-CMo!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fadf294af-e15c-41c8-9dfd-c0541054526b_3300x2550.jpeg" width="1456" height="1125" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/adf294af-e15c-41c8-9dfd-c0541054526b_3300x2550.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1125,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2910331,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/165748507?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fadf294af-e15c-41c8-9dfd-c0541054526b_3300x2550.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!-CMo!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fadf294af-e15c-41c8-9dfd-c0541054526b_3300x2550.jpeg 424w, https://substackcdn.com/image/fetch/$s_!-CMo!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fadf294af-e15c-41c8-9dfd-c0541054526b_3300x2550.jpeg 848w, https://substackcdn.com/image/fetch/$s_!-CMo!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fadf294af-e15c-41c8-9dfd-c0541054526b_3300x2550.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!-CMo!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fadf294af-e15c-41c8-9dfd-c0541054526b_3300x2550.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>There are 100+ more cheat sheets covering everything from Python, R, SQL, Spark to Power BI, Tableau, Git and many more. So if you know other people who like staying up to date on all things data, please share Data Tinkerer with them!</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.datatinkerer.io/leaderboard?&amp;referrer_token=4tlsmi&amp;utm_source=post&quot;,&quot;text&quot;:&quot;Refer a friend&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://www.datatinkerer.io/leaderboard?&amp;referrer_token=4tlsmi&amp;utm_source=post"><span>Refer a friend</span></a></p><p>Now, with that out of the way, let&#8217;s get to Flipkart&#8217;s challenge of calculating delivery date at scale</p>
      <p>
          <a href="https://www.datatinkerer.io/p/how-flipkart-scaled-delivery-date-calculation-10x-while-slashing-latency-by-90-percent">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[How Notion Brought Order to Its Data Chaos (And Why Their First Catalog Failed)]]></title><description><![CDATA[A behind-the-scenes look at the real challenges, missed steps and what finally made their data catalog work.]]></description><link>https://www.datatinkerer.io/p/how-notion-brought-order-to-its-data-chaos</link><guid isPermaLink="false">https://www.datatinkerer.io/p/how-notion-brought-order-to-its-data-chaos</guid><dc:creator><![CDATA[Data Tinkerer]]></dc:creator><pubDate>Thu, 22 May 2025 09:07:34 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!6QIL!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F072d6737-aa00-471d-9ee2-b95adfcb8012_2520x1323.avif" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Fellow Data Tinkerers!</p><p>Quick note before this week&#8217;s deep dive. Thanks for reading and subscribing, I really do mean it. If you&#8217;ve got feedback, just hit reply. I read every response.</p><p>Data Tinkerer has always been about sharing what actually works in data, beyond just tools and tech. The deep dives will keep coming but I want to start spotlighting the stuff we don&#8217;t talk about enough: the day-to-day challenges, business outcomes, the challenges and the learnings.</p><p>I want to feature stories from people in data roles: senior data engineers, lead analysts, heads of data, you name it. If you&#8217;ve got a story, lesson, recent technical win or even a battle scar from the data trenches, let&#8217;s get it in front of almost 1,000 smart and engaged peers.</p><p>You don&#8217;t need to be a &#8220;writer&#8221;, I&#8217;ll help your story shine. Plus, guest contributors get a shoutout in the newsletter and on LinkedIn (If you want).</p><p><strong>Keen to share your data story? Just reply to this email or message me on Substack and we&#8217;ll tee it up.</strong></p><div class="directMessage button" data-attrs="{&quot;userId&quot;:291590442,&quot;userName&quot;:&quot;Data Tinkerer&quot;,&quot;canDm&quot;:null,&quot;dmUpgradeOptions&quot;:null,&quot;isEditorNode&quot;:true}" data-component-name="DirectMessageToDOM"></div><p>Now &#8230;</p>
      <p>
          <a href="https://www.datatinkerer.io/p/how-notion-brought-order-to-its-data-chaos">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[How Canva Rebuilt Its Data Pipelines for Billions of Events per Month]]></title><description><![CDATA[What it takes to track usage, pay creators fairly and not drown in incident recovery hell.]]></description><link>https://www.datatinkerer.io/p/how-canva-rebuilt-its-data-pipelines-for-billions-of-events-per-month</link><guid isPermaLink="false">https://www.datatinkerer.io/p/how-canva-rebuilt-its-data-pipelines-for-billions-of-events-per-month</guid><dc:creator><![CDATA[Data Tinkerer]]></dc:creator><pubDate>Thu, 01 May 2025 07:39:39 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc4515c9b-2ca5-4dec-a2b4-98c5358d205a_1920x1080.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Fellow Data Tinkerers!</p><p>Today we will look at how Canva solved the surprisingly messy problem of counting at scale</p><p>But before that, I wanted to share an example of what you could unlock if you share Data Tinkerer with just <strong>2 other people</strong>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!46-k!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd46a3f23-4582-459f-8820-6b06a7d85138_3300x2550.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!46-k!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd46a3f23-4582-459f-8820-6b06a7d85138_3300x2550.jpeg 424w, https://substackcdn.com/image/fetch/$s_!46-k!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd46a3f23-4582-459f-8820-6b06a7d85138_3300x2550.jpeg 848w, https://substackcdn.com/image/fetch/$s_!46-k!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd46a3f23-4582-459f-8820-6b06a7d85138_3300x2550.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!46-k!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd46a3f23-4582-459f-8820-6b06a7d85138_3300x2550.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!46-k!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd46a3f23-4582-459f-8820-6b06a7d85138_3300x2550.jpeg" width="1456" height="1125" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d46a3f23-4582-459f-8820-6b06a7d85138_3300x2550.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1125,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2470943,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/162515057?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd46a3f23-4582-459f-8820-6b06a7d85138_3300x2550.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!46-k!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd46a3f23-4582-459f-8820-6b06a7d85138_3300x2550.jpeg 424w, https://substackcdn.com/image/fetch/$s_!46-k!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd46a3f23-4582-459f-8820-6b06a7d85138_3300x2550.jpeg 848w, https://substackcdn.com/image/fetch/$s_!46-k!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd46a3f23-4582-459f-8820-6b06a7d85138_3300x2550.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!46-k!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd46a3f23-4582-459f-8820-6b06a7d85138_3300x2550.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>There are 100+ more cheat sheets covering everything from Python, R, SQL, Spark to Power BI, Tableau, Git and many more. So if you know other people who like staying up to date on all things data, please share Data Tinkerer with them!</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.datatinkerer.io/leaderboard?&amp;referrer_token=4tlsmi&amp;utm_source=post&quot;,&quot;text&quot;:&quot;Refer a friend&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://www.datatinkerer.io/leaderboard?&amp;referrer_token=4tlsmi&amp;utm_source=post"><span>Refer a friend</span></a></p><p>Now, with that out of the way, let&#8217;s get to Canva&#8217;s work to build a scalable data pipeline</p>
      <p>
          <a href="https://www.datatinkerer.io/p/how-canva-rebuilt-its-data-pipelines-for-billions-of-events-per-month">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[How Airtable Made Archive Validation Work at Petabyte Scale]]></title><description><![CDATA[They handled billions of rows using joins, hashes and a lot of buckets.]]></description><link>https://www.datatinkerer.io/p/how-airtable-made-archive-validation-work-at-petabyte-scale</link><guid isPermaLink="false">https://www.datatinkerer.io/p/how-airtable-made-archive-validation-work-at-petabyte-scale</guid><dc:creator><![CDATA[Data Tinkerer]]></dc:creator><pubDate>Thu, 10 Apr 2025 06:22:40 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!T_KN!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F682b198e-3ef1-40ee-b823-b52f80699458_1024x1024.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!T_KN!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F682b198e-3ef1-40ee-b823-b52f80699458_1024x1024.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!T_KN!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F682b198e-3ef1-40ee-b823-b52f80699458_1024x1024.png 424w, https://substackcdn.com/image/fetch/$s_!T_KN!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F682b198e-3ef1-40ee-b823-b52f80699458_1024x1024.png 848w, https://substackcdn.com/image/fetch/$s_!T_KN!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F682b198e-3ef1-40ee-b823-b52f80699458_1024x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!T_KN!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F682b198e-3ef1-40ee-b823-b52f80699458_1024x1024.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!T_KN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F682b198e-3ef1-40ee-b823-b52f80699458_1024x1024.png" width="1024" height="1024" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/682b198e-3ef1-40ee-b823-b52f80699458_1024x1024.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1024,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1708241,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/160983286?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F682b198e-3ef1-40ee-b823-b52f80699458_1024x1024.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!T_KN!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F682b198e-3ef1-40ee-b823-b52f80699458_1024x1024.png 424w, https://substackcdn.com/image/fetch/$s_!T_KN!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F682b198e-3ef1-40ee-b823-b52f80699458_1024x1024.png 848w, https://substackcdn.com/image/fetch/$s_!T_KN!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F682b198e-3ef1-40ee-b823-b52f80699458_1024x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!T_KN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F682b198e-3ef1-40ee-b823-b52f80699458_1024x1024.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h3>TL;DR</h3>
      <p>
          <a href="https://www.datatinkerer.io/p/how-airtable-made-archive-validation-work-at-petabyte-scale">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[How HubSpot Optimized Logging to Save Millions]]></title><description><![CDATA[By refining log storage and retention, HubSpot reduced costs by 55.7% and improved query performance by 50x]]></description><link>https://www.datatinkerer.io/p/how-hubspot-optimized-logging-to-save-millions</link><guid isPermaLink="false">https://www.datatinkerer.io/p/how-hubspot-optimized-logging-to-save-millions</guid><dc:creator><![CDATA[Data Tinkerer]]></dc:creator><pubDate>Thu, 20 Mar 2025 03:55:16 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!Ozzx!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd038ac92-6001-4b71-88f3-b1cf0fe98b56_1146x703.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h3>TL;DR</h3><div><hr></div><h4><strong>Situation</strong></h4><p>HubSpot's backend performance team identified that Amazon S3 storage costs accounted for approximately 45% to 50% of daily expenses, with the 'hubspot-live-logs-prod' bucket alone responsible for 20% of these costs.</p><h4><strong>Task</strong></h4><p>The team aimed to reduce storage costs by addressing the inefficiencies in their logging system, particularly focusing on the large volumes of raw JSON logs that were not being efficiently compacted.</p><h4><strong>Action</strong></h4><ul><li><p><strong>Log Retention Review</strong>: They discovered that raw JSON logs were retained for 730 days, while compressed ORC logs were kept for 460 days. Aligning the retention period to 460 days for both formats reduced unnecessary storage.&#8203;</p></li><li><p><strong>Improved Compression</strong>: By enhancing their Spark compaction process, they increased the conversion rate of raw JSON logs to the more storage-efficient ORC format, achieving a compression ratio where ORC logs were about 5% the size of the original JSON logs.</p></li></ul><h4><strong>Result</strong></h4><p>These measures led to a 55.7% reduction in monthly JSON log storage costs, translating to annual savings in the seven-figure range. Additionally, engineers experienced faster log query times, with some reporting reductions from 30 minutes to just 36 seconds.</p><h4><strong>Use Cases</strong></h4><p>Cost monitoring, Log retention, Log volume reduction</p><h4><strong>Tech Stack/Framework</strong></h4><p>AWS Athena, Amazon S3, Apache Spark, Apache Mesos, Redash</p><div><hr></div><h3>Explained Further</h3><div><hr></div><h4><strong>Saving Millions on Logging</strong></h4>
      <p>
          <a href="https://www.datatinkerer.io/p/how-hubspot-optimized-logging-to-save-millions">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[Scaling Apache Flink: How Reddit Cut Memory Usage by 60%]]></title><description><![CDATA[Optimizing real-time ad validation with field filtering, tiered storage, and infrastructure enhancements.]]></description><link>https://www.datatinkerer.io/p/scaling-apache-flink-how-reddit-cut-memory-usage-by-60-percent</link><guid isPermaLink="false">https://www.datatinkerer.io/p/scaling-apache-flink-how-reddit-cut-memory-usage-by-60-percent</guid><dc:creator><![CDATA[Data Tinkerer]]></dc:creator><pubDate>Wed, 19 Feb 2025 06:18:34 GMT</pubDate><enclosure url="https://images.unsplash.com/photo-1616509091215-57bbece93654?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxfHxyZWRkaXR8ZW58MHx8fHwxNzM5OTQ0NTgwfDA&amp;ixlib=rb-4.0.3&amp;q=80&amp;w=1080" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://images.unsplash.com/photo-1616509091215-57bbece93654?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxfHxyZWRkaXR8ZW58MHx8fHwxNzM5OTQ0NTgwfDA&amp;ixlib=rb-4.0.3&amp;q=80&amp;w=1080" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://images.unsplash.com/photo-1616509091215-57bbece93654?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxfHxyZWRkaXR8ZW58MHx8fHwxNzM5OTQ0NTgwfDA&amp;ixlib=rb-4.0.3&amp;q=80&amp;w=1080 424w, https://images.unsplash.com/photo-1616509091215-57bbece93654?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxfHxyZWRkaXR8ZW58MHx8fHwxNzM5OTQ0NTgwfDA&amp;ixlib=rb-4.0.3&amp;q=80&amp;w=1080 848w, https://images.unsplash.com/photo-1616509091215-57bbece93654?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxfHxyZWRkaXR8ZW58MHx8fHwxNzM5OTQ0NTgwfDA&amp;ixlib=rb-4.0.3&amp;q=80&amp;w=1080 1272w, https://images.unsplash.com/photo-1616509091215-57bbece93654?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxfHxyZWRkaXR8ZW58MHx8fHwxNzM5OTQ0NTgwfDA&amp;ixlib=rb-4.0.3&amp;q=80&amp;w=1080 1456w" sizes="100vw"><img src="https://images.unsplash.com/photo-1616509091215-57bbece93654?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxfHxyZWRkaXR8ZW58MHx8fHwxNzM5OTQ0NTgwfDA&amp;ixlib=rb-4.0.3&amp;q=80&amp;w=1080" width="5184" height="3888" data-attrs="{&quot;src&quot;:&quot;https://images.unsplash.com/photo-1616509091215-57bbece93654?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxfHxyZWRkaXR8ZW58MHx8fHwxNzM5OTQ0NTgwfDA&amp;ixlib=rb-4.0.3&amp;q=80&amp;w=1080&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:3888,&quot;width&quot;:5184,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;red and white 8 logo&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="red and white 8 logo" title="red and white 8 logo" srcset="https://images.unsplash.com/photo-1616509091215-57bbece93654?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxfHxyZWRkaXR8ZW58MHx8fHwxNzM5OTQ0NTgwfDA&amp;ixlib=rb-4.0.3&amp;q=80&amp;w=1080 424w, https://images.unsplash.com/photo-1616509091215-57bbece93654?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxfHxyZWRkaXR8ZW58MHx8fHwxNzM5OTQ0NTgwfDA&amp;ixlib=rb-4.0.3&amp;q=80&amp;w=1080 848w, https://images.unsplash.com/photo-1616509091215-57bbece93654?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxfHxyZWRkaXR8ZW58MHx8fHwxNzM5OTQ0NTgwfDA&amp;ixlib=rb-4.0.3&amp;q=80&amp;w=1080 1272w, https://images.unsplash.com/photo-1616509091215-57bbece93654?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxfHxyZWRkaXR8ZW58MHx8fHwxNzM5OTQ0NTgwfDA&amp;ixlib=rb-4.0.3&amp;q=80&amp;w=1080 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Photo by <a href="true">Brett Jordan</a> on <a href="https://unsplash.com">Unsplash</a></figcaption></figure></div><div><hr></div><h4><strong>Situation</strong></h4><p>Reddit's advertising platform processes thousands of ad engagement events per second, necessitating real-time validation and enrichment to ensure accurate reporting and prevent budget overdelivery.</p><h4><strong>Task</strong></h4><p>Develop a scalable, real-time ad event validation system capable of efficiently handling high event volumes while maintaining performance and reliability.</p><h4><strong>Action</strong></h4><p>The engineering team developed the Ad Events Validator (AEV) utilizing Apache Flink to correlate ad server events with user engagement events. To overcome issues related to large state sizes and resource demands, they implemented:</p><ul><li><p><strong>Field Filtering:</strong> Conducted a thorough analysis of downstream data consumption, establishing an allowlist that significantly reduced the event payload size by 90%, leading to CPU and memory usage reductions of 25% and 60%, respectively.</p></li><li><p><strong>Tiered State Storage:</strong> Integrated Apache Cassandra for external state storage, effectively reducing in-memory state size and enhancing the efficiency of checkpointing and system recovery processes.</p></li></ul><h4><strong>Result</strong></h4><p>These strategic enhancements resulted in a more scalable and cost-efficient AEV system, improving overall performance and operational effectiveness.</p><h4><strong>Use Cases</strong></h4><p>Real-Time Event Validation, Data Enrichment, Resource Optimization</p><h4><strong>Tech Stack/Framework</strong></h4><p>Apache Flink, Apache Kafka, Apache Cassandra</p><div><hr></div><h3>Explained Further</h3><div><hr></div><h4><strong>Background</strong></h4><p>Reddit processes thousands of ad engagement events per second. These events require validation and enrichment before being sent to downstream systems. Key components of this validation process include applying a standardized look-back window and filtering out suspected invalid traffic.</p><p>In addition to a batch validation pipeline, a near real-time pipeline improves budget spend accuracy and provides advertisers with real-time insights into campaign performance. This real-time component, known as the <strong>Ad Events Validator (AEV)</strong>, is built using Apache Flink. AEV matches ad server events with engagement events and writes the validated results to a separate Kafka topic for downstream consumption. </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!6utL!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F441259f5-fa8b-48c2-a11e-6a8df3128807_1080x441.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!6utL!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F441259f5-fa8b-48c2-a11e-6a8df3128807_1080x441.webp 424w, https://substackcdn.com/image/fetch/$s_!6utL!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F441259f5-fa8b-48c2-a11e-6a8df3128807_1080x441.webp 848w, https://substackcdn.com/image/fetch/$s_!6utL!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F441259f5-fa8b-48c2-a11e-6a8df3128807_1080x441.webp 1272w, https://substackcdn.com/image/fetch/$s_!6utL!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F441259f5-fa8b-48c2-a11e-6a8df3128807_1080x441.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!6utL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F441259f5-fa8b-48c2-a11e-6a8df3128807_1080x441.webp" width="1080" height="441" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/441259f5-fa8b-48c2-a11e-6a8df3128807_1080x441.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:441,&quot;width&quot;:1080,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:16750,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/webp&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!6utL!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F441259f5-fa8b-48c2-a11e-6a8df3128807_1080x441.webp 424w, https://substackcdn.com/image/fetch/$s_!6utL!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F441259f5-fa8b-48c2-a11e-6a8df3128807_1080x441.webp 848w, https://substackcdn.com/image/fetch/$s_!6utL!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F441259f5-fa8b-48c2-a11e-6a8df3128807_1080x441.webp 1272w, https://substackcdn.com/image/fetch/$s_!6utL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F441259f5-fa8b-48c2-a11e-6a8df3128807_1080x441.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Overview of the real-time ad engagement event validation system (Source: Reddit)</figcaption></figure></div><p>Building and maintaining AEV though, presented several challenges to the Reddit team</p><div><hr></div><h4><strong>1st Challenge: Addressing High State Size Issues</strong></h4>
      <p>
          <a href="https://www.datatinkerer.io/p/scaling-apache-flink-how-reddit-cut-memory-usage-by-60-percent">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[ML Training Too Slow? Yelp’s 1,400x Speed Boost Fixes That]]></title><description><![CDATA[Discover the data pipeline and GPU optimisations that made it happen]]></description><link>https://www.datatinkerer.io/p/ml-training-too-slow-yelps-1400x-speed-boost-fixes-that</link><guid isPermaLink="false">https://www.datatinkerer.io/p/ml-training-too-slow-yelps-1400x-speed-boost-fixes-that</guid><dc:creator><![CDATA[Data Tinkerer]]></dc:creator><pubDate>Wed, 12 Feb 2025 05:46:32 GMT</pubDate><enclosure url="https://images.unsplash.com/photo-1730818876892-7e7c6ddf3dc6?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwzfHx5ZWxwfGVufDB8fHx8MTczOTMzODU5MHww&amp;ixlib=rb-4.0.3&amp;q=80&amp;w=1080" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://images.unsplash.com/photo-1730818876892-7e7c6ddf3dc6?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwzfHx5ZWxwfGVufDB8fHx8MTczOTMzODU5MHww&amp;ixlib=rb-4.0.3&amp;q=80&amp;w=1080" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://images.unsplash.com/photo-1730818876892-7e7c6ddf3dc6?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwzfHx5ZWxwfGVufDB8fHx8MTczOTMzODU5MHww&amp;ixlib=rb-4.0.3&amp;q=80&amp;w=1080 424w, https://images.unsplash.com/photo-1730818876892-7e7c6ddf3dc6?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwzfHx5ZWxwfGVufDB8fHx8MTczOTMzODU5MHww&amp;ixlib=rb-4.0.3&amp;q=80&amp;w=1080 848w, https://images.unsplash.com/photo-1730818876892-7e7c6ddf3dc6?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwzfHx5ZWxwfGVufDB8fHx8MTczOTMzODU5MHww&amp;ixlib=rb-4.0.3&amp;q=80&amp;w=1080 1272w, https://images.unsplash.com/photo-1730818876892-7e7c6ddf3dc6?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwzfHx5ZWxwfGVufDB8fHx8MTczOTMzODU5MHww&amp;ixlib=rb-4.0.3&amp;q=80&amp;w=1080 1456w" sizes="100vw"><img src="https://images.unsplash.com/photo-1730818876892-7e7c6ddf3dc6?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwzfHx5ZWxwfGVufDB8fHx8MTczOTMzODU5MHww&amp;ixlib=rb-4.0.3&amp;q=80&amp;w=1080" width="6240" height="4160" data-attrs="{&quot;src&quot;:&quot;https://images.unsplash.com/photo-1730818876892-7e7c6ddf3dc6?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwzfHx5ZWxwfGVufDB8fHx8MTczOTMzODU5MHww&amp;ixlib=rb-4.0.3&amp;q=80&amp;w=1080&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:4160,&quot;width&quot;:6240,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;A cell phone sitting on top of a wooden table&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="A cell phone sitting on top of a wooden table" title="A cell phone sitting on top of a wooden table" srcset="https://images.unsplash.com/photo-1730818876892-7e7c6ddf3dc6?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwzfHx5ZWxwfGVufDB8fHx8MTczOTMzODU5MHww&amp;ixlib=rb-4.0.3&amp;q=80&amp;w=1080 424w, https://images.unsplash.com/photo-1730818876892-7e7c6ddf3dc6?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwzfHx5ZWxwfGVufDB8fHx8MTczOTMzODU5MHww&amp;ixlib=rb-4.0.3&amp;q=80&amp;w=1080 848w, https://images.unsplash.com/photo-1730818876892-7e7c6ddf3dc6?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwzfHx5ZWxwfGVufDB8fHx8MTczOTMzODU5MHww&amp;ixlib=rb-4.0.3&amp;q=80&amp;w=1080 1272w, https://images.unsplash.com/photo-1730818876892-7e7c6ddf3dc6?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwzfHx5ZWxwfGVufDB8fHx8MTczOTMzODU5MHww&amp;ixlib=rb-4.0.3&amp;q=80&amp;w=1080 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Photo by <a href="true">appshunter.io</a> on <a href="https://unsplash.com">Unsplash</a></figcaption></figure></div><h3>TL;DR</h3><div><hr></div><h4><strong>Situation</strong></h4><p>Yelp's ad revenue relies on predicting which ads users are likely to click on, using a model called "Wide and Deep Neural Network." Initially, training this model on 450 million data samples took 75 hours per cycle, which was too slow. Yelp wanted to handle 2 billion samples and reduce training time to under an hour per cycle.</p><h4><strong>Task</strong></h4><p>The goal was to speed up the training process by improving how data is stored and read, and by using multiple GPUs to handle more data at once.</p><h4><strong>Action</strong></h4><ul><li><p><strong>Data Storage</strong>: Yelp stored the training data in Parquet format on Amazon's S3 storage, which works well with their data processing system, Spark. They found that a tool called Petastorm was too slow for their needs, so they developed their own system called ArrowStreamServer. This new system reads and sends data more efficiently, reducing the time to process 9 million samples from over 13 minutes to about 19 seconds.</p></li><li><p><strong>Distributed Training</strong>: Yelp initially used a method called MirroredStrategy to train the model on multiple GPUs but found it didn't work well as they added more GPUs. They switched to a tool called Horovod, which allowed them to efficiently use up to 8 GPUs at once, significantly speeding up the training process.</p></li></ul><h4><strong>Result</strong></h4><p>By implementing these changes, Yelp achieved a total speed increase of about 1,400 times in their model training. This means they can now train their ad prediction models much faster, allowing them to handle more data and improve their ad services.</p><h4><strong>Use Cases</strong></h4><p>Large-Scale ML Training, ML Training Optimisation, Enhancing Data Pipeline Efficiency</p><h4><strong>Tech Stack/Framework</strong></h4><p>TensorFlow, Horovod, Keras, PyArrow, Amazon S3, Apache Spark</p><div><hr></div><h3>Explained Further</h3><div><hr></div><h4><strong>The Challenge</strong></h4>
      <p>
          <a href="https://www.datatinkerer.io/p/ml-training-too-slow-yelps-1400x-speed-boost-fixes-that">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[Inside Meta's Data Flow Discovery]]></title><description><![CDATA[Discover How Meta Tracks Data Journeys to Safeguard User Privacy at Scale]]></description><link>https://www.datatinkerer.io/p/inside-metas-data-flow-discovery</link><guid isPermaLink="false">https://www.datatinkerer.io/p/inside-metas-data-flow-discovery</guid><dc:creator><![CDATA[Data Tinkerer]]></dc:creator><pubDate>Tue, 04 Feb 2025 23:39:49 GMT</pubDate><enclosure url="https://images.unsplash.com/photo-1533895328642-8035bacd565a?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxMnx8ZmFjZWJvb2t8ZW58MHx8fHwxNzM4NjQyMDg4fDA&amp;ixlib=rb-4.0.3&amp;q=80&amp;w=1080" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://images.unsplash.com/photo-1533895328642-8035bacd565a?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxMnx8ZmFjZWJvb2t8ZW58MHx8fHwxNzM4NjQyMDg4fDA&amp;ixlib=rb-4.0.3&amp;q=80&amp;w=1080" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://images.unsplash.com/photo-1533895328642-8035bacd565a?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxMnx8ZmFjZWJvb2t8ZW58MHx8fHwxNzM4NjQyMDg4fDA&amp;ixlib=rb-4.0.3&amp;q=80&amp;w=1080 424w, https://images.unsplash.com/photo-1533895328642-8035bacd565a?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxMnx8ZmFjZWJvb2t8ZW58MHx8fHwxNzM4NjQyMDg4fDA&amp;ixlib=rb-4.0.3&amp;q=80&amp;w=1080 848w, https://images.unsplash.com/photo-1533895328642-8035bacd565a?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxMnx8ZmFjZWJvb2t8ZW58MHx8fHwxNzM4NjQyMDg4fDA&amp;ixlib=rb-4.0.3&amp;q=80&amp;w=1080 1272w, https://images.unsplash.com/photo-1533895328642-8035bacd565a?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxMnx8ZmFjZWJvb2t8ZW58MHx8fHwxNzM4NjQyMDg4fDA&amp;ixlib=rb-4.0.3&amp;q=80&amp;w=1080 1456w" sizes="100vw"><img src="https://images.unsplash.com/photo-1533895328642-8035bacd565a?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxMnx8ZmFjZWJvb2t8ZW58MHx8fHwxNzM4NjQyMDg4fDA&amp;ixlib=rb-4.0.3&amp;q=80&amp;w=1080" width="5472" height="3648" data-attrs="{&quot;src&quot;:&quot;https://images.unsplash.com/photo-1533895328642-8035bacd565a?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxMnx8ZmFjZWJvb2t8ZW58MHx8fHwxNzM4NjQyMDg4fDA&amp;ixlib=rb-4.0.3&amp;q=80&amp;w=1080&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:3648,&quot;width&quot;:5472,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;television showing man using binoculars&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="television showing man using binoculars" title="television showing man using binoculars" srcset="https://images.unsplash.com/photo-1533895328642-8035bacd565a?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxMnx8ZmFjZWJvb2t8ZW58MHx8fHwxNzM4NjQyMDg4fDA&amp;ixlib=rb-4.0.3&amp;q=80&amp;w=1080 424w, https://images.unsplash.com/photo-1533895328642-8035bacd565a?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxMnx8ZmFjZWJvb2t8ZW58MHx8fHwxNzM4NjQyMDg4fDA&amp;ixlib=rb-4.0.3&amp;q=80&amp;w=1080 848w, https://images.unsplash.com/photo-1533895328642-8035bacd565a?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxMnx8ZmFjZWJvb2t8ZW58MHx8fHwxNzM4NjQyMDg4fDA&amp;ixlib=rb-4.0.3&amp;q=80&amp;w=1080 1272w, https://images.unsplash.com/photo-1533895328642-8035bacd565a?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxMnx8ZmFjZWJvb2t8ZW58MHx8fHwxNzM4NjQyMDg4fDA&amp;ixlib=rb-4.0.3&amp;q=80&amp;w=1080 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Photo by <a href="true">Glen Carrie</a> on <a href="https://unsplash.com">Unsplash</a></figcaption></figure></div><h3>TL;DR</h3><div><hr></div><h4><strong>Situation</strong></h4><p>Meta handles vast amounts of user data across its platforms, requiring strong privacy controls to protect sensitive information. A critical component of this effort is <strong>data lineage</strong>, which helps trace how data moves across different systems, ensuring compliance with privacy policies like <strong>purpose limitation</strong>.</p><h4><strong>Task</strong></h4><p>Meta needed a scalable and automated way to <strong>track data lineage</strong> across millions of assets, including databases, web services, and AI systems. This required moving beyond <strong>manual data flow documentation</strong> to a more robust, automated discovery process.</p><h4><strong>Action</strong></h4><ul><li><p><strong>Data Flow Collection</strong> &#8211; Used <strong>static code analysis, runtime instrumentation, and input/output matching</strong> to track data across stacks (Hack, C++, Python, SQL).</p></li><li><p><strong>Privacy Probes</strong> &#8211; Captured real-time <strong>runtime signals</strong>, identifying how and where sensitive data is logged, stored, or transformed.</p></li><li><p><strong>Automated Lineage Graphs</strong> &#8211; Created <strong>scalable data flow visualizations</strong> to streamline privacy control implementation.</p></li><li><p><strong>AI &amp; Data Warehouse Integration</strong> &#8211; Ensured <strong>end-to-end traceability</strong> across AI models, databases, and batch-processing systems.</p></li><li><p><strong>Iterative Filtering Tool</strong> &#8211; Allowed developers to <strong>refine lineage graphs</strong>, isolating relevant data flows and removing noise.</p></li></ul><h4><strong>Result</strong></h4><p>Meta&#8217;s data lineage system <strong>reduced engineering time, improved compliance accuracy, and automated privacy enforcement</strong>. It enabled developers to quickly identify and secure sensitive data flows while ensuring continuous monitoring at scale. These innovations enhanced user data protection across Meta&#8217;s ecosystem</p><h4><strong>Use Cases</strong></h4><p>Privacy Enforcement, Compliance Monitoring, Data Lineage</p><h4><strong>Tech Stack/Framework</strong></h4><p>Python, SQL, C++, PyTorch, Presto, Spark</p><div><hr></div><h3>Explained Further</h3><div><hr></div><p>Meta's Privacy Aware Infrastructure (PAI) is designed to embed privacy controls within its systems, ensuring user data is handled responsibly. A foundational element of PAI is data lineage, which traces the journey of data across various platforms, providing a comprehensive view of its flow from collection to processing and storage. This capability is crucial for implementing privacy measures like purpose limitation, which restricts data usage to specific, intended purposes</p><div><hr></div><h4><strong>Understanding Data Lineage at Meta</strong></h4><p>Data lineage involves mapping out how data moves through Meta's vast ecosystem, connecting source assets (e.g., database tables where data originates) to sink assets (e.g., tables or systems where data is stored or processed). </p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!cqvn!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F68826e8b-0ab0-472a-8287-e441f86a68a6_1536x326.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!cqvn!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F68826e8b-0ab0-472a-8287-e441f86a68a6_1536x326.webp 424w, https://substackcdn.com/image/fetch/$s_!cqvn!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F68826e8b-0ab0-472a-8287-e441f86a68a6_1536x326.webp 848w, https://substackcdn.com/image/fetch/$s_!cqvn!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F68826e8b-0ab0-472a-8287-e441f86a68a6_1536x326.webp 1272w, https://substackcdn.com/image/fetch/$s_!cqvn!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F68826e8b-0ab0-472a-8287-e441f86a68a6_1536x326.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!cqvn!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F68826e8b-0ab0-472a-8287-e441f86a68a6_1536x326.webp" width="1456" height="309" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/68826e8b-0ab0-472a-8287-e441f86a68a6_1536x326.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:309,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:36606,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/webp&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!cqvn!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F68826e8b-0ab0-472a-8287-e441f86a68a6_1536x326.webp 424w, https://substackcdn.com/image/fetch/$s_!cqvn!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F68826e8b-0ab0-472a-8287-e441f86a68a6_1536x326.webp 848w, https://substackcdn.com/image/fetch/$s_!cqvn!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F68826e8b-0ab0-472a-8287-e441f86a68a6_1536x326.webp 1272w, https://substackcdn.com/image/fetch/$s_!cqvn!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F68826e8b-0ab0-472a-8287-e441f86a68a6_1536x326.webp 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption">PAI Workflow (Source: Meta)</figcaption></figure></div><p>This mapping is essential for:</p>
      <p>
          <a href="https://www.datatinkerer.io/p/inside-metas-data-flow-discovery">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[Scaling Real-Time Analytics: How Expedia Cut Costs by 40% While Supporting 450+ Concurrent Users]]></title><description><![CDATA[Learn how the Optics Framework enabled seamless data insights with <15-second latency for global teams]]></description><link>https://www.datatinkerer.io/p/scaling-real-time-analytics-how-expedia-cut-costs-by-40-percent</link><guid isPermaLink="false">https://www.datatinkerer.io/p/scaling-real-time-analytics-how-expedia-cut-costs-by-40-percent</guid><dc:creator><![CDATA[Data Tinkerer]]></dc:creator><pubDate>Tue, 28 Jan 2025 22:20:41 GMT</pubDate><enclosure url="https://images.unsplash.com/photo-1660991473393-3612a4e49127?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwzfHxleHBlZGlhfGVufDB8fHx8MTczODAzOTc0M3ww&amp;ixlib=rb-4.0.3&amp;q=80&amp;w=1080" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://images.unsplash.com/photo-1660991473393-3612a4e49127?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwzfHxleHBlZGlhfGVufDB8fHx8MTczODAzOTc0M3ww&amp;ixlib=rb-4.0.3&amp;q=80&amp;w=1080" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://images.unsplash.com/photo-1660991473393-3612a4e49127?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwzfHxleHBlZGlhfGVufDB8fHx8MTczODAzOTc0M3ww&amp;ixlib=rb-4.0.3&amp;q=80&amp;w=1080 424w, https://images.unsplash.com/photo-1660991473393-3612a4e49127?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwzfHxleHBlZGlhfGVufDB8fHx8MTczODAzOTc0M3ww&amp;ixlib=rb-4.0.3&amp;q=80&amp;w=1080 848w, https://images.unsplash.com/photo-1660991473393-3612a4e49127?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwzfHxleHBlZGlhfGVufDB8fHx8MTczODAzOTc0M3ww&amp;ixlib=rb-4.0.3&amp;q=80&amp;w=1080 1272w, https://images.unsplash.com/photo-1660991473393-3612a4e49127?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwzfHxleHBlZGlhfGVufDB8fHx8MTczODAzOTc0M3ww&amp;ixlib=rb-4.0.3&amp;q=80&amp;w=1080 1456w" sizes="100vw"><img src="https://images.unsplash.com/photo-1660991473393-3612a4e49127?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwzfHxleHBlZGlhfGVufDB8fHx8MTczODAzOTc0M3ww&amp;ixlib=rb-4.0.3&amp;q=80&amp;w=1080" width="5635" height="3757" data-attrs="{&quot;src&quot;:&quot;https://images.unsplash.com/photo-1660991473393-3612a4e49127?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwzfHxleHBlZGlhfGVufDB8fHx8MTczODAzOTc0M3ww&amp;ixlib=rb-4.0.3&amp;q=80&amp;w=1080&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:3757,&quot;width&quot;:5635,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;a room with a large window and a couch and potted plants&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="a room with a large window and a couch and potted plants" title="a room with a large window and a couch and potted plants" srcset="https://images.unsplash.com/photo-1660991473393-3612a4e49127?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwzfHxleHBlZGlhfGVufDB8fHx8MTczODAzOTc0M3ww&amp;ixlib=rb-4.0.3&amp;q=80&amp;w=1080 424w, https://images.unsplash.com/photo-1660991473393-3612a4e49127?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwzfHxleHBlZGlhfGVufDB8fHx8MTczODAzOTc0M3ww&amp;ixlib=rb-4.0.3&amp;q=80&amp;w=1080 848w, https://images.unsplash.com/photo-1660991473393-3612a4e49127?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwzfHxleHBlZGlhfGVufDB8fHx8MTczODAzOTc0M3ww&amp;ixlib=rb-4.0.3&amp;q=80&amp;w=1080 1272w, https://images.unsplash.com/photo-1660991473393-3612a4e49127?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwzfHxleHBlZGlhfGVufDB8fHx8MTczODAzOTc0M3ww&amp;ixlib=rb-4.0.3&amp;q=80&amp;w=1080 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Photo by <a href="true">Hotel Lal Garh Fort and Palace</a> on <a href="https://unsplash.com">Unsplash</a></figcaption></figure></div><h3>TL;DR</h3><div><hr></div><h4><strong>Situation</strong></h4><p>Expedia Group needed a scalable and cost-effective real-time analytics solution (&lt;15 seconds latency) to process high-volume data (~4500 events/sec) and support global service partners in optimizing operations and enhancing performance.</p><h4><strong>Task</strong></h4><p>Design a solution to process and present real-time data with blazing-fast query speeds while addressing limitations of existing tools (e.g., Snowflake and Looker) in terms of scalability, latency, and user experience.</p><h4><strong>Action</strong></h4><p>Developed a new architecture using Apache Druid for real-time ingestion, optimized microservices for data processing, and built a custom modular UI library with a Data Resolver API to deliver tailored analytics based on user roles.</p><h4><strong>Result</strong></h4><p>The solution achieved a 5x increase in user base, 30-40% reduction in costs, 15-second data latency, and 99.9% SLA uptime. It supported 1,800 users with sub-1-second response times, enhancing decision-making and operational efficiency globally.</p><h4><strong>Use Cases</strong></h4><p>Real-Time Insights, Operational Efficiency, Scalability for Concurrent Users</p><h4><strong>Tech Stack/Framework</strong></h4><p>Python, Apache Druid, Apache Hive, Apache Kafka, Looker, Snowflake</p><div><hr></div><h3>Explained Further</h3><div><hr></div><h4><strong>Real-Time Challenges for Expedia</strong></h4>
      <p>
          <a href="https://www.datatinkerer.io/p/scaling-real-time-analytics-how-expedia-cut-costs-by-40-percent">
              Read more
          </a>
      </p>
   ]]></content:encoded></item></channel></rss>