<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:googleplay="http://www.google.com/schemas/play-podcasts/1.0"><channel><title><![CDATA[Data Tinkerer]]></title><description><![CDATA[The latest updates on data science, data engineering and data analysis - for free!]]></description><link>https://www.datatinkerer.io</link><image><url>https://substackcdn.com/image/fetch/$s_!JEdj!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6bea5ccd-f356-4154-a1db-16268300510e_500x500.png</url><title>Data Tinkerer</title><link>https://www.datatinkerer.io</link></image><generator>Substack</generator><lastBuildDate>Wed, 13 May 2026 10:16:10 GMT</lastBuildDate><atom:link href="https://www.datatinkerer.io/feed" rel="self" type="application/rss+xml"/><copyright><![CDATA[Data Tinkerer]]></copyright><language><![CDATA[en]]></language><webMaster><![CDATA[datatinkerer@substack.com]]></webMaster><itunes:owner><itunes:email><![CDATA[datatinkerer@substack.com]]></itunes:email><itunes:name><![CDATA[Data Tinkerer]]></itunes:name></itunes:owner><itunes:author><![CDATA[Data Tinkerer]]></itunes:author><googleplay:owner><![CDATA[datatinkerer@substack.com]]></googleplay:owner><googleplay:email><![CDATA[datatinkerer@substack.com]]></googleplay:email><googleplay:author><![CDATA[Data Tinkerer]]></googleplay:author><itunes:block><![CDATA[Yes]]></itunes:block><item><title><![CDATA[What the Data Crowd Was Reading in April 2026]]></title><description><![CDATA[Tools, techniques and deep dives worth reading that I came across in April 2026.]]></description><link>https://www.datatinkerer.io/p/what-the-data-crowd-was-reading-in-april-2026</link><guid isPermaLink="false">https://www.datatinkerer.io/p/what-the-data-crowd-was-reading-in-april-2026</guid><dc:creator><![CDATA[Data Tinkerer]]></dc:creator><pubDate>Thu, 07 May 2026 04:46:14 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/702ecc81-f707-4116-bd80-2196c6e889c9_500x500.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Fellow Data Tinkerers</p><p>It&#8217;s time for another round-up on all things data and AI!</p><p>But before that, I wanted to share with you what you could unlock if you share Data Tinkerer with just <strong>1 more person</strong>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!N8eu!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcbaf2f9e-22b4-45bf-9119-a468a02b58b6_800x402.gif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!N8eu!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcbaf2f9e-22b4-45bf-9119-a468a02b58b6_800x402.gif 424w, https://substackcdn.com/image/fetch/$s_!N8eu!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcbaf2f9e-22b4-45bf-9119-a468a02b58b6_800x402.gif 848w, https://substackcdn.com/image/fetch/$s_!N8eu!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcbaf2f9e-22b4-45bf-9119-a468a02b58b6_800x402.gif 1272w, https://substackcdn.com/image/fetch/$s_!N8eu!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcbaf2f9e-22b4-45bf-9119-a468a02b58b6_800x402.gif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!N8eu!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcbaf2f9e-22b4-45bf-9119-a468a02b58b6_800x402.gif" width="800" height="402" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/cbaf2f9e-22b4-45bf-9119-a468a02b58b6_800x402.gif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:402,&quot;width&quot;:800,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:3369150,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/gif&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/196288004?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcbaf2f9e-22b4-45bf-9119-a468a02b58b6_800x402.gif&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!N8eu!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcbaf2f9e-22b4-45bf-9119-a468a02b58b6_800x402.gif 424w, https://substackcdn.com/image/fetch/$s_!N8eu!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcbaf2f9e-22b4-45bf-9119-a468a02b58b6_800x402.gif 848w, https://substackcdn.com/image/fetch/$s_!N8eu!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcbaf2f9e-22b4-45bf-9119-a468a02b58b6_800x402.gif 1272w, https://substackcdn.com/image/fetch/$s_!N8eu!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcbaf2f9e-22b4-45bf-9119-a468a02b58b6_800x402.gif 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>There are 100+ resources to learn all things data (science, engineering, analysis). It includes videos, courses, projects and can be filtered by tech stack (Python, SQL, Spark and etc), skill level (Beginner, Intermediate and so on) provider name or free/paid. So if you know other people who like staying up to date on all things data, please share Data Tinkerer with them!</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.datatinkerer.io/leaderboard?&amp;utm_source=post&quot;,&quot;text&quot;:&quot;Refer a friend&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.datatinkerer.io/leaderboard?&amp;utm_source=post"><span>Refer a friend</span></a></p><p>Without further ado, let&#8217;s get to the round up for April!</p><div><hr></div><h3>Data science &amp; AI</h3><ul><li><p><strong><a href="https://magazine.sebastianraschka.com/p/components-of-a-coding-agent?utm_source=datatinkerer.io&amp;utm_medium=newsletter">Components of a Coding Agent</a> (19 minute read)<br></strong><span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Sebastian Raschka, PhD&quot;,&quot;id&quot;:27393275,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F61f4c017-506f-4e9b-a24f-76340dad0309_800x800.jpeg&quot;,&quot;uuid&quot;:&quot;ac4cf24d-aa30-4d5b-989a-12c4d2004865&quot;}" data-component-name="MentionToDOM"></span> provides a great break down of the six core components of coding agents, showing why tools, repo context, memory, caching and context management often matter as much as the underlying LLM.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!bBgM!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd7ef3b4-9132-4c07-a058-867ee3a4978a_1456x416.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!bBgM!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd7ef3b4-9132-4c07-a058-867ee3a4978a_1456x416.webp 424w, https://substackcdn.com/image/fetch/$s_!bBgM!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd7ef3b4-9132-4c07-a058-867ee3a4978a_1456x416.webp 848w, https://substackcdn.com/image/fetch/$s_!bBgM!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd7ef3b4-9132-4c07-a058-867ee3a4978a_1456x416.webp 1272w, https://substackcdn.com/image/fetch/$s_!bBgM!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd7ef3b4-9132-4c07-a058-867ee3a4978a_1456x416.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!bBgM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd7ef3b4-9132-4c07-a058-867ee3a4978a_1456x416.webp" width="1456" height="416" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/fd7ef3b4-9132-4c07-a058-867ee3a4978a_1456x416.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:416,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:26878,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/webp&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/196288004?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd7ef3b4-9132-4c07-a058-867ee3a4978a_1456x416.webp&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!bBgM!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd7ef3b4-9132-4c07-a058-867ee3a4978a_1456x416.webp 424w, https://substackcdn.com/image/fetch/$s_!bBgM!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd7ef3b4-9132-4c07-a058-867ee3a4978a_1456x416.webp 848w, https://substackcdn.com/image/fetch/$s_!bBgM!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd7ef3b4-9132-4c07-a058-867ee3a4978a_1456x416.webp 1272w, https://substackcdn.com/image/fetch/$s_!bBgM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd7ef3b4-9132-4c07-a058-867ee3a4978a_1456x416.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div></li><li><p><strong><a href="https://hamel.dev/blog/posts/revenge/?utm_source=datatinkerer.io&amp;utm_medium=newsletter">The Revenge of the Data Scientist</a> (8 minute read)<br></strong>Hamel Husain argues data science is making a comeback in the LLM era because evals, traces, metrics, labels and experimental design are exactly the skills needed to make AI systems work reliably.</p></li><li><p><strong><a href="https://ml-visualized.com/index.html">Machine Learning Visualized</a> (4 minute read)<br></strong>Gavin Hung&#8217;s Machine Learning Visualized is a free, notebook-based guide that derives ML algorithms from first principles and uses interactive visuals to show how models train, learn and converge.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!SmBy!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa37b413-ffd5-49ae-bb46-3f5d54f6864d_800x435.gif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!SmBy!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa37b413-ffd5-49ae-bb46-3f5d54f6864d_800x435.gif 424w, https://substackcdn.com/image/fetch/$s_!SmBy!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa37b413-ffd5-49ae-bb46-3f5d54f6864d_800x435.gif 848w, https://substackcdn.com/image/fetch/$s_!SmBy!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa37b413-ffd5-49ae-bb46-3f5d54f6864d_800x435.gif 1272w, https://substackcdn.com/image/fetch/$s_!SmBy!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa37b413-ffd5-49ae-bb46-3f5d54f6864d_800x435.gif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!SmBy!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa37b413-ffd5-49ae-bb46-3f5d54f6864d_800x435.gif" width="800" height="435" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/fa37b413-ffd5-49ae-bb46-3f5d54f6864d_800x435.gif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:435,&quot;width&quot;:800,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:4189171,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/gif&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/196288004?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa37b413-ffd5-49ae-bb46-3f5d54f6864d_800x435.gif&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!SmBy!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa37b413-ffd5-49ae-bb46-3f5d54f6864d_800x435.gif 424w, https://substackcdn.com/image/fetch/$s_!SmBy!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa37b413-ffd5-49ae-bb46-3f5d54f6864d_800x435.gif 848w, https://substackcdn.com/image/fetch/$s_!SmBy!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa37b413-ffd5-49ae-bb46-3f5d54f6864d_800x435.gif 1272w, https://substackcdn.com/image/fetch/$s_!SmBy!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa37b413-ffd5-49ae-bb46-3f5d54f6864d_800x435.gif 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div></li><li><p><strong><a href="https://williamoconnell.me/blog/post/ai-ide/?utm_source=datatinkerer.io&amp;utm_medium=newsletter">Your AI Might be Lying to Your Boss</a> (17 minute read)<br></strong>William O&#8217;Connell argues while AI IDEs can help developers save time, their metrics around AI-generated code are inaccurate and can't be trusted because of their self-serving interests.</p></li><li><p><strong><a href="https://mfatihtuzen.github.io/posts/2026-04-16_timeseries_stationary/?utm_source=datatinkerer.io&amp;utm_medium=newsletter">Why Most Time Series Models Fail Before They Start</a> (18 minute read)</strong><br>This article explains why many time series models fail before they start, showing how stationarity checks, visual diagnostics and transformations like differencing can stop models from chasing trends instead of learning real signal.</p></li><li><p><strong><a href="https://claude.com/blog/multi-agent-coordination-patterns?utm_source=datatinkerer.io&amp;utm_medium=newsletter">Multi-agent coordination patterns: Five approaches and when to use them</a> (5 minute read)</strong><br>Anthropic breaks down five multi-agent coordination patterns, showing when to use generator-verifier, orchestrator-subagent, agent teams, message bus or shared state instead of overcomplicating agent systems too early.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!4n4L!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe79b92a5-8d07-4e98-8407-b2ac4433eee2_1600x901.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!4n4L!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe79b92a5-8d07-4e98-8407-b2ac4433eee2_1600x901.png 424w, https://substackcdn.com/image/fetch/$s_!4n4L!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe79b92a5-8d07-4e98-8407-b2ac4433eee2_1600x901.png 848w, https://substackcdn.com/image/fetch/$s_!4n4L!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe79b92a5-8d07-4e98-8407-b2ac4433eee2_1600x901.png 1272w, https://substackcdn.com/image/fetch/$s_!4n4L!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe79b92a5-8d07-4e98-8407-b2ac4433eee2_1600x901.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!4n4L!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe79b92a5-8d07-4e98-8407-b2ac4433eee2_1600x901.png" width="1456" height="820" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e79b92a5-8d07-4e98-8407-b2ac4433eee2_1600x901.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:820,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:228221,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/196288004?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe79b92a5-8d07-4e98-8407-b2ac4433eee2_1600x901.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!4n4L!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe79b92a5-8d07-4e98-8407-b2ac4433eee2_1600x901.png 424w, https://substackcdn.com/image/fetch/$s_!4n4L!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe79b92a5-8d07-4e98-8407-b2ac4433eee2_1600x901.png 848w, https://substackcdn.com/image/fetch/$s_!4n4L!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe79b92a5-8d07-4e98-8407-b2ac4433eee2_1600x901.png 1272w, https://substackcdn.com/image/fetch/$s_!4n4L!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe79b92a5-8d07-4e98-8407-b2ac4433eee2_1600x901.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div></li><li><p><strong><a href="https://www.lesswrong.com/posts/dKpC6wHFqDrGZwnah/ais-can-now-often-do-massive-easy-to-verify-swe-tasks-and-i?utm_source=datatinkerer.io&amp;utm_medium=newsletter">AIs can now often do massive easy-to-verify SWE tasks and I&#8217;ve updated towards shorter timelines</a> (20 minute read)<br></strong>Ryan Greenblatt argues that AIs are now much better at massive, easy-to-verify software engineering tasks than expected, which has pushed him toward shorter AI timelines even though judgment-heavy work remains a major bottleneck.</p></li><li><p><strong><a href="https://brandonrohrer.org/ds_roles?utm_source=datatinkerer.io&amp;utm_medium=newsletter">Being a Staff+ Data Scientist in 2026</a> (14 minute read)<br></strong><span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Brandon Rohrer&quot;,&quot;id&quot;:2260656,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/5840b407-c014-451c-bc68-33f6b8785189_6240x4160.jpeg&quot;,&quot;uuid&quot;:&quot;25ba772b-a789-4c4e-9f24-b44d77dc1e2d&quot;}" data-component-name="MentionToDOM"></span> argues staff+ data science is less about fancy models and more about navigating messy stakeholder incentives, uncertainty, self-serve myths and the human work needed to turn data into decisions.</p></li><li><p><strong><a href="https://www.datatinkerer.io/p/how-pinterest-used-multimodal-ai-to-help-shoppers">How Pinterest Used Multimodal AI to Help Millions of Shoppers</a> (14 minute read)<br></strong>Pinterest turned billions of products into 4.2 million shopping landing pages and improved search performance by 35%. This piece breaks down how Pinterest used vision-language models, contrastive learning and distributed inference to make products easier to discover.</p></li></ul><div><hr></div><h3>Data engineering</h3><ul><li><p><strong><a href="https://dataengineeringcentral.substack.com/p/architectural-foundations-and-infrastructure?utm_source=datatinkerer.io&amp;utm_medium=newsletter">Architectural Foundations &amp; Infrastructure</a> (12 minute read)<br></strong><span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Daniel Beach&quot;,&quot;id&quot;:21715962,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F81caaeec-9053-487c-a59c-ba5f8e4644ad_256x256.jpeg&quot;,&quot;uuid&quot;:&quot;43c2f8dd-6b64-4227-890f-6335483c9bfd&quot;}" data-component-name="MentionToDOM"></span> kicks off a good data platform architecture series, arguing that good infrastructure choices start with business requirements, data shape and boring fundamentals like scalability, resilience, modularity, observability and cost.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!7sGV!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc9e0b80-8c7a-4c29-b868-dc3a83392d90_1456x971.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!7sGV!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc9e0b80-8c7a-4c29-b868-dc3a83392d90_1456x971.webp 424w, https://substackcdn.com/image/fetch/$s_!7sGV!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc9e0b80-8c7a-4c29-b868-dc3a83392d90_1456x971.webp 848w, https://substackcdn.com/image/fetch/$s_!7sGV!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc9e0b80-8c7a-4c29-b868-dc3a83392d90_1456x971.webp 1272w, https://substackcdn.com/image/fetch/$s_!7sGV!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc9e0b80-8c7a-4c29-b868-dc3a83392d90_1456x971.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!7sGV!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc9e0b80-8c7a-4c29-b868-dc3a83392d90_1456x971.webp" width="1456" height="971" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/bc9e0b80-8c7a-4c29-b868-dc3a83392d90_1456x971.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:971,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:110706,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/webp&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/196288004?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc9e0b80-8c7a-4c29-b868-dc3a83392d90_1456x971.webp&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!7sGV!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc9e0b80-8c7a-4c29-b868-dc3a83392d90_1456x971.webp 424w, https://substackcdn.com/image/fetch/$s_!7sGV!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc9e0b80-8c7a-4c29-b868-dc3a83392d90_1456x971.webp 848w, https://substackcdn.com/image/fetch/$s_!7sGV!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc9e0b80-8c7a-4c29-b868-dc3a83392d90_1456x971.webp 1272w, https://substackcdn.com/image/fetch/$s_!7sGV!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc9e0b80-8c7a-4c29-b868-dc3a83392d90_1456x971.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div></li><li><p><strong><a href="https://seattledataguy.substack.com/p/the-5-silent-failures-in-data-pipelines?utm_source=datatinkerer.io&amp;utm_medium=newsletter">The 5 Silent Failures in Data Pipelines</a> (11 minute read)<br></strong><span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;SeattleDataGuy&quot;,&quot;id&quot;:4963622,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F1ec905aa-9a7b-4f21-b0ff-fec92e8916d1_512x512.jpeg&quot;,&quot;uuid&quot;:&quot;d7ff8631-7d15-4706-8fcf-38a601868fef&quot;}" data-component-name="MentionToDOM"></span> breaks down five silent pipeline failures, from schema drift to stale data and brittle logic, showing how bad data can reach dashboards without throwing a single error.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!sFK4!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F88911cad-2907-4473-b1c8-71bc9dec4ea9_500x500.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!sFK4!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F88911cad-2907-4473-b1c8-71bc9dec4ea9_500x500.webp 424w, https://substackcdn.com/image/fetch/$s_!sFK4!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F88911cad-2907-4473-b1c8-71bc9dec4ea9_500x500.webp 848w, https://substackcdn.com/image/fetch/$s_!sFK4!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F88911cad-2907-4473-b1c8-71bc9dec4ea9_500x500.webp 1272w, https://substackcdn.com/image/fetch/$s_!sFK4!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F88911cad-2907-4473-b1c8-71bc9dec4ea9_500x500.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!sFK4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F88911cad-2907-4473-b1c8-71bc9dec4ea9_500x500.webp" width="500" height="500" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/88911cad-2907-4473-b1c8-71bc9dec4ea9_500x500.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:500,&quot;width&quot;:500,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:38724,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/webp&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/196288004?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F88911cad-2907-4473-b1c8-71bc9dec4ea9_500x500.webp&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!sFK4!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F88911cad-2907-4473-b1c8-71bc9dec4ea9_500x500.webp 424w, https://substackcdn.com/image/fetch/$s_!sFK4!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F88911cad-2907-4473-b1c8-71bc9dec4ea9_500x500.webp 848w, https://substackcdn.com/image/fetch/$s_!sFK4!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F88911cad-2907-4473-b1c8-71bc9dec4ea9_500x500.webp 1272w, https://substackcdn.com/image/fetch/$s_!sFK4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F88911cad-2907-4473-b1c8-71bc9dec4ea9_500x500.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div></li><li><p><strong><a href="https://pipeline2insights.substack.com/p/data-observabilty-fundamentals-for-data-engineers?utm_source=datatinkerer.io&amp;utm_medium=newsletter">Data Observability Fundamentals for Data Engineers</a> (18 minute read)<br></strong>Good article by <span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Pipeline to Insights&quot;,&quot;id&quot;:42238863,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd98ddb69-fdec-4599-b3f2-906f7673c8de_408x408.png&quot;,&quot;uuid&quot;:&quot;7ac1188c-ef1b-4917-a668-daade1118ab5&quot;}" data-component-name="MentionToDOM"></span> explaining data observability as a tool-agnostic way to catch silent pipeline failures, using freshness, volume, schema, quality and lineage signals to keep data and AI systems trustworthy.</p></li><li><p><strong><a href="https://luminousmen.substack.com/p/the-power-of-data-sketches-a-comprehensive?utm_source=datatinkerer&amp;utm_medium=newsletter">The Power of Data Sketches: A Comprehensive Guide</a> (19 minute read)</strong><br><span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;luminousmen&quot;,&quot;id&quot;:29227863,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffead33a9-5e35-4522-b96e-c1a523419524_300x297.jpeg&quot;,&quot;uuid&quot;:&quot;3328d5e8-8d61-45ea-bce7-15a7c30b6596&quot;}" data-component-name="MentionToDOM"></span> explains how data sketches trade perfect accuracy for tiny memory, fast queries and bounded error, making them useful when exact counts and aggregations become too expensive at scale.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!92yl!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f03b264-f6cc-43dd-a475-235e5af02190_1456x1017.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!92yl!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f03b264-f6cc-43dd-a475-235e5af02190_1456x1017.webp 424w, https://substackcdn.com/image/fetch/$s_!92yl!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f03b264-f6cc-43dd-a475-235e5af02190_1456x1017.webp 848w, https://substackcdn.com/image/fetch/$s_!92yl!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f03b264-f6cc-43dd-a475-235e5af02190_1456x1017.webp 1272w, https://substackcdn.com/image/fetch/$s_!92yl!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f03b264-f6cc-43dd-a475-235e5af02190_1456x1017.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!92yl!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f03b264-f6cc-43dd-a475-235e5af02190_1456x1017.webp" width="1456" height="1017" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6f03b264-f6cc-43dd-a475-235e5af02190_1456x1017.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1017,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:65046,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/webp&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/196288004?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f03b264-f6cc-43dd-a475-235e5af02190_1456x1017.webp&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!92yl!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f03b264-f6cc-43dd-a475-235e5af02190_1456x1017.webp 424w, https://substackcdn.com/image/fetch/$s_!92yl!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f03b264-f6cc-43dd-a475-235e5af02190_1456x1017.webp 848w, https://substackcdn.com/image/fetch/$s_!92yl!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f03b264-f6cc-43dd-a475-235e5af02190_1456x1017.webp 1272w, https://substackcdn.com/image/fetch/$s_!92yl!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f03b264-f6cc-43dd-a475-235e5af02190_1456x1017.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div></li><li><p><strong><a href="https://andreaskretz.substack.com/p/stop-letting-tools-lead-your-platform?utm_source=datatinkerer.io&amp;utm_medium=newsletter">Stop Letting Tools Lead Your Platform Decisions</a> (5 minute read)<br></strong><span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Andreas Kretz&quot;,&quot;id&quot;:181692620,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/30eec9a4-a54a-4412-b304-761478dcccb6_4000x6000.jpeg&quot;,&quot;uuid&quot;:&quot;2fa00c43-c188-4f16-8c5c-a1751a04ddd1&quot;}" data-component-name="MentionToDOM"></span> argues data platform decisions should start with use cases, constraints and users, not fashionable tools, because the simplest architecture is often the one that actually fits.</p></li><li><p><strong><a href="https://arpitbhayani.me/blogs/defensive-databases?utm_source=datatinkerer.io&amp;utm_medium=newsletter">Databases Were Not Designed For This</a> (16 minute read)<br></strong><span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Arpit Bhayani&quot;,&quot;id&quot;:5901422,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/cde06b4c-2818-4e8d-a271-e3df5fbc61cb_400x400.jpeg&quot;,&quot;uuid&quot;:&quot;ee108bd8-586d-4c31-9542-171327c240eb&quot;}" data-component-name="MentionToDOM"></span> argues agentic AI makes databases riskier, so teams need defensive patterns like least privilege, timeouts, idempotency, soft deletes and query tagging before giving agents access to production data.</p></li><li><p><strong><a href="https://arturastutkus.substack.com/p/the-last-mile-to-apache-iceberg-building?utm_source=datatinkerer.io&amp;utm_medium=newsletter">The Last Mile to Apache Iceberg - Building a Basement Data Platform</a> (13 minute read)</strong><br><span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Arturas Tutkus&quot;,&quot;id&quot;:6157692,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/bd98183d-5d24-4396-935e-ad3697e8c44e_191x191.jpeg&quot;,&quot;uuid&quot;:&quot;4b4fec92-bf6e-4d28-8c55-8d9a40c3b04a&quot;}" data-component-name="MentionToDOM"></span> shows how he solved the &#8216;last mile&#8217; into Apache Iceberg by building a tiny low-cost event ingestion path for side-project analytics without Kafka, SaaS pipelines or a scary cloud bill.</p></li><li><p><strong><a href="https://www.junaideffendi.com/p/lyft-data-tech-stack">Lyft Data Tech Stack</a> (5 minute read)<br></strong><span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Junaid Effendi&quot;,&quot;id&quot;:21393641,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb06559f3-ee33-46f8-bfa0-50964179f235_1200x1200.png&quot;,&quot;uuid&quot;:&quot;ff415716-01fc-41a9-81a3-4649b502adc3&quot;}" data-component-name="MentionToDOM"></span> breaks down Lyft&#8217;s data stack, showing how Kafka, Flink, Trino, Airflow, Flyte and 100+ PB on S3 support real-time analytics, ML and millions of daily rides.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!XyWu!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6bb8dd2f-065a-4bb6-85df-028f09c45535_1456x841.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!XyWu!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6bb8dd2f-065a-4bb6-85df-028f09c45535_1456x841.webp 424w, https://substackcdn.com/image/fetch/$s_!XyWu!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6bb8dd2f-065a-4bb6-85df-028f09c45535_1456x841.webp 848w, https://substackcdn.com/image/fetch/$s_!XyWu!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6bb8dd2f-065a-4bb6-85df-028f09c45535_1456x841.webp 1272w, https://substackcdn.com/image/fetch/$s_!XyWu!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6bb8dd2f-065a-4bb6-85df-028f09c45535_1456x841.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!XyWu!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6bb8dd2f-065a-4bb6-85df-028f09c45535_1456x841.webp" width="1456" height="841" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6bb8dd2f-065a-4bb6-85df-028f09c45535_1456x841.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:841,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:63602,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/webp&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/196288004?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6bb8dd2f-065a-4bb6-85df-028f09c45535_1456x841.webp&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!XyWu!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6bb8dd2f-065a-4bb6-85df-028f09c45535_1456x841.webp 424w, https://substackcdn.com/image/fetch/$s_!XyWu!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6bb8dd2f-065a-4bb6-85df-028f09c45535_1456x841.webp 848w, https://substackcdn.com/image/fetch/$s_!XyWu!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6bb8dd2f-065a-4bb6-85df-028f09c45535_1456x841.webp 1272w, https://substackcdn.com/image/fetch/$s_!XyWu!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6bb8dd2f-065a-4bb6-85df-028f09c45535_1456x841.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div></li><li><p><strong><a href="https://hackernoon.com/30-bi-engineering-interview-questions-that-actually-matter-in-the-ai-era?utm_source=datatinkerer.io&amp;utm_medium=newsletter">30 BI Engineering Interview Questions That Actually Matter in the AI Era</a> (32 minute read)<br></strong>Anusha Kovi reframes BI engineering interviews for the AI era around the skills that still matter: SQL judgment, semantic modeling, data governance, trustworthy metrics and turning messy business questions into reliable analytics.</p></li><li><p><strong><a href="https://www.datatinkerer.io/p/how-airtable-saved-millions-by-cutting">How Airtable Saved Millions by Cutting Archive Storage Costs by 100x</a> (16 minute read)<br></strong>Airtable moved petabytes of archive data out of MySQL, made storage 100x cheaper and kept interactive query latency intact. This piece breaks down how Airtable handled the migration, why it chose DataFusion and the optimizations that made cold storage feel fast.</p></li></ul><div><hr></div><h3><strong>Data analysis and visualisation</strong></h3><ul><li><p><strong><a href="https://thaiscooke.substack.com/p/the-data-role-is-being-reborn">The data role is being reborn</a> (8 minute read)<br></strong><span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Thais Cooke&quot;,&quot;id&quot;:61993584,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!G0X2!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb384f538-870a-44af-809c-33b4fc046389_800x800.jpeg&quot;,&quot;uuid&quot;:&quot;03dc6ca1-96af-4a2b-94b9-c10c5ff2ef5a&quot;}" data-component-name="MentionToDOM"></span> thinks the data role is being reborn, with AI shrinking old output-heavy analyst work while increasing the need for judgment, business context and decision-shaping.</p></li><li><p><strong><a href="https://nastengraph.substack.com/p/stop-coloring-retention-tables-the?utm_source=datatinkerer&amp;utm_medium=newsletter">Stop Coloring Retention Tables the Classic Way</a> (3 minute read)</strong><br><span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Anastasiya Kuznetsova&quot;,&quot;id&quot;:99725349,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!2E6h!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa7eb9d9c-d4e0-4f30-bc37-73eb9ffe4d53_516x534.png&quot;,&quot;uuid&quot;:&quot;baf9405f-e466-4d4e-8e35-d8befab61b81&quot;}" data-component-name="MentionToDOM"></span> shows that retention tables become much more useful when you color cohorts by deviation from the period average, rather than using classic gradients that mostly show the obvious decline over time.</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!u_3F!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64bcd7c1-b1b3-40a2-8dc8-23a2708e702f_1456x789.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!u_3F!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64bcd7c1-b1b3-40a2-8dc8-23a2708e702f_1456x789.webp 424w, https://substackcdn.com/image/fetch/$s_!u_3F!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64bcd7c1-b1b3-40a2-8dc8-23a2708e702f_1456x789.webp 848w, https://substackcdn.com/image/fetch/$s_!u_3F!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64bcd7c1-b1b3-40a2-8dc8-23a2708e702f_1456x789.webp 1272w, https://substackcdn.com/image/fetch/$s_!u_3F!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64bcd7c1-b1b3-40a2-8dc8-23a2708e702f_1456x789.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!u_3F!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64bcd7c1-b1b3-40a2-8dc8-23a2708e702f_1456x789.webp" width="1456" height="789" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/64bcd7c1-b1b3-40a2-8dc8-23a2708e702f_1456x789.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:789,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:46232,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/webp&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/196288004?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64bcd7c1-b1b3-40a2-8dc8-23a2708e702f_1456x789.webp&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!u_3F!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64bcd7c1-b1b3-40a2-8dc8-23a2708e702f_1456x789.webp 424w, https://substackcdn.com/image/fetch/$s_!u_3F!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64bcd7c1-b1b3-40a2-8dc8-23a2708e702f_1456x789.webp 848w, https://substackcdn.com/image/fetch/$s_!u_3F!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64bcd7c1-b1b3-40a2-8dc8-23a2708e702f_1456x789.webp 1272w, https://substackcdn.com/image/fetch/$s_!u_3F!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64bcd7c1-b1b3-40a2-8dc8-23a2708e702f_1456x789.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><ul><li><p><strong><a href="https://www.counting-stuff.com/dashboard-rot-as-org-attention-grave-markers?utm_source=datatinkerer.io&amp;utm_medium=newsletter">Dashboard rot as org attention grave markers</a> (10 minute read)<br></strong>Randy Au argues dashboard rot is not really a dashboard problem, but a sign of shifting organisational attention, stale priorities and abandoned decisions fossilised as BI clutter.</p></li></ul><div><hr></div><h3><strong>Other interesting reads</strong></h3><ul><li><p><strong><a href="https://joereis.substack.com/p/were-in-1905-why-electricity-not?utm_source=datatinkerer.io&amp;utm_medium=newsletter">We&#8217;re in 1905: Why Electricity (Not Dot-Com) Is the Right AI Analogy</a> (12 minute read)<br></strong>Interesting article by <span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Joe Reis&quot;,&quot;id&quot;:3531217,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6e4716b1-c223-41e3-b943-def0291bf217_1175x783.jpeg&quot;,&quot;uuid&quot;:&quot;fe21d8c6-46e2-4240-bdfb-e3b0f5f54f56&quot;}" data-component-name="MentionToDOM"></span> arguing AI is less like the dot-com boom and more like early electricity: the technology works but the real productivity gains will only come when companies redesign their workflows, org structures and data &#8220;factories&#8221; around it.</p></li><li><p><strong><a href="https://epochai.substack.com/p/keeping-up-with-the-gpts">Keeping up with the GPTs</a> (24 minute read)<br></strong><span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Anson Ho&quot;,&quot;id&quot;:327131465,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!YpJm!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9a56c6b-d918-48a9-b335-58313d2bb76f_3097x3182.jpeg&quot;,&quot;uuid&quot;:&quot;6f22e237-cddd-46a8-b6b7-55af4dcc389a&quot;}" data-component-name="MentionToDOM"></span> thinks compute-poor AI labs can narrow the gap through efficiency tricks like distillation, but compute still dominates, so catching up with frontier labs is much easier than overtaking them</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!U5Co!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a8d75fb-229e-4b73-b14b-03f20a7ddb4e_1026x1283.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!U5Co!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a8d75fb-229e-4b73-b14b-03f20a7ddb4e_1026x1283.webp 424w, https://substackcdn.com/image/fetch/$s_!U5Co!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a8d75fb-229e-4b73-b14b-03f20a7ddb4e_1026x1283.webp 848w, https://substackcdn.com/image/fetch/$s_!U5Co!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a8d75fb-229e-4b73-b14b-03f20a7ddb4e_1026x1283.webp 1272w, https://substackcdn.com/image/fetch/$s_!U5Co!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a8d75fb-229e-4b73-b14b-03f20a7ddb4e_1026x1283.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!U5Co!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a8d75fb-229e-4b73-b14b-03f20a7ddb4e_1026x1283.webp" width="1026" height="1283" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5a8d75fb-229e-4b73-b14b-03f20a7ddb4e_1026x1283.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1283,&quot;width&quot;:1026,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:28198,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/webp&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/196288004?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a8d75fb-229e-4b73-b14b-03f20a7ddb4e_1026x1283.webp&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!U5Co!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a8d75fb-229e-4b73-b14b-03f20a7ddb4e_1026x1283.webp 424w, https://substackcdn.com/image/fetch/$s_!U5Co!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a8d75fb-229e-4b73-b14b-03f20a7ddb4e_1026x1283.webp 848w, https://substackcdn.com/image/fetch/$s_!U5Co!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a8d75fb-229e-4b73-b14b-03f20a7ddb4e_1026x1283.webp 1272w, https://substackcdn.com/image/fetch/$s_!U5Co!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a8d75fb-229e-4b73-b14b-03f20a7ddb4e_1026x1283.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div></li><li><p><strong><a href="https://davidoks.blog/p/how-the-spreadsheet-reshaped-america">Seeing like a spreadsheet</a> (23 minute read)</strong><br>Interesting article by <span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;David Oks&quot;,&quot;id&quot;:2088240,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/553a38f8-f363-424f-8648-742af2eacc8d_1024x1024.png&quot;,&quot;uuid&quot;:&quot;6fad954b-221b-40a9-9b3a-e3776a4e571c&quot;}" data-component-name="MentionToDOM"></span> discussing how spreadsheets reshaped American business into a numbers-first optimisation machine and that AI may reorganise work and decision-making in a similar way.</p></li></ul><div><hr></div><h3><strong>Quick favor - need your take</strong></h3><div class="poll-embed" data-attrs="{&quot;id&quot;:506080}" data-component-name="PollToDOM"></div><p><strong>Was there any standout article or topic from April I missed? Feel free to drop a comment or hit reply, even a quick line helps.</strong></p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.datatinkerer.io/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">If you liked this post and don&#8217;t want to miss the next one, subscribe to Data Tinkerer!</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>If you are already subscribed and enjoyed the article, please give it a like and/or share it others, really appreciate it &#128591;</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.datatinkerer.io/p/how-expedia-monitors-1000-ab-tests-in-real-time-with-flink-and-kafka?utm_source=substack&amp;utm_medium=email&amp;utm_content=share&amp;action=share&amp;token=eyJ1c2VyX2lkIjoyOTE1OTA0NDIsInBvc3RfaWQiOjE2OTA5NDI3MywiaWF0IjoxNzU0NTE5MDY3LCJleHAiOjE3NTcxMTEwNjcsImlzcyI6InB1Yi0zNDIyNzQwIiwic3ViIjoicG9zdC1yZWFjdGlvbiJ9.oZvHOJmFWdVqE7IbG0eqLLsohZgpmGBltKU1W08ZN4c&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://www.datatinkerer.io/p/how-expedia-monitors-1000-ab-tests-in-real-time-with-flink-and-kafka?utm_source=substack&amp;utm_medium=email&amp;utm_content=share&amp;action=share&amp;token=eyJ1c2VyX2lkIjoyOTE1OTA0NDIsInBvc3RfaWQiOjE2OTA5NDI3MywiaWF0IjoxNzU0NTE5MDY3LCJleHAiOjE3NTcxMTEwNjcsImlzcyI6InB1Yi0zNDIyNzQwIiwic3ViIjoicG9zdC1yZWFjdGlvbiJ9.oZvHOJmFWdVqE7IbG0eqLLsohZgpmGBltKU1W08ZN4c"><span>Share</span></a></p><div><hr></div><h3>Keep learning</h3><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;317d3c47-fbec-470a-beeb-20020b66ccbc&quot;,&quot;caption&quot;:&quot;It&#8217;s time for another data/AI roundup and here are the highlights from March &#128071;<br /><br />&#119811;&#119834;&#119853;&#119834; &#119826;&#119836;&#119842;&#119838;&#119847;&#119836;&#119838; &amp;amp; &#119808;&#119816;<br />Why context engineering matters more than prompt hacks<br />Bayesian statistics in plain English<br />Why most agentic AI systems fail<br />The problem with treating context like tokens<br />A visual guide to modern attention variants<br />How GPT 5.4 improves Codex<br /><br />&#119811;&#119834;&#119853;&#119834; &#119812;&#119847;&#119840;&#119842;&#119847;&#119838;&#119838;&#119851;&#119842;&#119847;&#119840;<br />Why modern data stacks are getting harder to understand<br />Why ETL is losing its central role<br />What makes a strong data engineering GitHub portfolio<br />Why AI builders need data engineering fundamentals<br />A beginner&#8217;s guide to database internals<br />Why refactoring beats rebuilding data models<br /><br />Plus: what frontier AI job postings reveal about the market, Why data teams should start with the business model and Why strategy matters more than dashboards&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;lg&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;What the Data Crowd Was Reading in March 2026&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:291590442,&quot;name&quot;:&quot;Data Tinkerer&quot;,&quot;bio&quot;:&quot;Head of analytics sharing deep dives and learnings about AI and all things data (science, engineering, analysis)&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/83d3bbe0-5fb8-4f8d-9b74-036abbd6fec9_500x500.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2026-04-02T03:30:16.133Z&quot;,&quot;cover_image&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e2a58f73-a2c0-4074-8db5-67153f3c5d45_500x500.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.datatinkerer.io/p/what-the-data-crowd-was-reading-in-march-2026&quot;,&quot;section_name&quot;:&quot;Data Roundup&quot;,&quot;video_upload_id&quot;:null,&quot;id&quot;:192483521,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:13,&quot;comment_count&quot;:2,&quot;publication_id&quot;:3422740,&quot;publication_name&quot;:&quot;Data Tinkerer&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!JEdj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6bea5ccd-f356-4154-a1db-16268300510e_500x500.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;79d232de-3d15-4abe-b46e-997945fb8a86&quot;,&quot;caption&quot;:&quot;It&#8217;s time for another data/AI roundup and here are the highlights from February&#128071;<br /><br />&#119811;&#119834;&#119853;&#119834; &#119826;&#119836;&#119842;&#119838;&#119847;&#119836;&#119838; &amp;amp; &#119808;&#119816;<br />Inside OpenAI&#8217;s in-house data agent<br />A practical guide to which AI to use in the agentic era<br />Why judgment may not be uniquely human after all<br />How Codex is being used for serious research automation<br />Why semantic linking matters for giving data meaning<br /><br />&#119811;&#119834;&#119853;&#119834; &#119812;&#119847;&#119840;&#119842;&#119847;&#119838;&#119838;&#119851;&#119842;&#119847;&#119840;<br />A portable analytics stack built on DuckDB, DuckLake, dlt and SQLMesh<br />Why healing tables beat slow-motion backfill disasters<br />The case for MetadataOps engineers<br />How to use AI tools without losing data engineering fundamentals<br />Why 5-second BigQuery queries can still be expensive<br /><br />&#119811;&#119834;&#119853;&#119834; &#119808;&#119847;&#119834;&#119845;&#119858;&#119852;&#119842;&#119852; &amp;amp; &#119809;&#119816;<br />The state of machine learning competitions in 2025<br /><br />Plus: why AI is eating software&#8217;s TAM, what world models could unlock in robotics and why AI may intensify work instead of reducing it.&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;lg&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;What the Data Crowd Was Reading in February 2026&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:291590442,&quot;name&quot;:&quot;Data Tinkerer&quot;,&quot;bio&quot;:&quot;Head of analytics sharing deep dives and learnings about AI and all things data (science, engineering, analysis)&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/83d3bbe0-5fb8-4f8d-9b74-036abbd6fec9_500x500.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2026-03-12T04:00:47.277Z&quot;,&quot;cover_image&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b9b4d29f-5a76-44a8-902d-bc2983dbe445_500x500.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.datatinkerer.io/p/what-the-data-crowd-was-reading-in-february-2026&quot;,&quot;section_name&quot;:&quot;Data Roundup&quot;,&quot;video_upload_id&quot;:null,&quot;id&quot;:190247984,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:15,&quot;comment_count&quot;:8,&quot;publication_id&quot;:3422740,&quot;publication_name&quot;:&quot;Data Tinkerer&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!JEdj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6bea5ccd-f356-4154-a1db-16268300510e_500x500.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div>]]></content:encoded></item><item><title><![CDATA[The Bitter Lesson (of Decision Making)]]></title><description><![CDATA[Why simple rules often beat human judgment over time]]></description><link>https://www.datatinkerer.io/p/the-bitter-lesson-of-decision-making</link><guid isPermaLink="false">https://www.datatinkerer.io/p/the-bitter-lesson-of-decision-making</guid><dc:creator><![CDATA[Data Tinkerer]]></dc:creator><pubDate>Thu, 30 Apr 2026 04:30:07 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/43e93bab-4b07-4111-95de-dd786efcf772_1672x941.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Fellow Data Tinkerers!</p><p>Today we will look at how decision making can be improved.</p><p>But before that, I wanted to share with you what you could unlock if you share Data Tinkerer with just <strong>1 more person</strong>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!v6qD!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F077525b7-6888-4573-8bb8-89e118fc4e28_800x402.gif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!v6qD!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F077525b7-6888-4573-8bb8-89e118fc4e28_800x402.gif 424w, https://substackcdn.com/image/fetch/$s_!v6qD!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F077525b7-6888-4573-8bb8-89e118fc4e28_800x402.gif 848w, https://substackcdn.com/image/fetch/$s_!v6qD!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F077525b7-6888-4573-8bb8-89e118fc4e28_800x402.gif 1272w, https://substackcdn.com/image/fetch/$s_!v6qD!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F077525b7-6888-4573-8bb8-89e118fc4e28_800x402.gif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!v6qD!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F077525b7-6888-4573-8bb8-89e118fc4e28_800x402.gif" width="800" height="402" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/077525b7-6888-4573-8bb8-89e118fc4e28_800x402.gif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:402,&quot;width&quot;:800,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:3369150,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/gif&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/185492577?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F077525b7-6888-4573-8bb8-89e118fc4e28_800x402.gif&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!v6qD!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F077525b7-6888-4573-8bb8-89e118fc4e28_800x402.gif 424w, https://substackcdn.com/image/fetch/$s_!v6qD!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F077525b7-6888-4573-8bb8-89e118fc4e28_800x402.gif 848w, https://substackcdn.com/image/fetch/$s_!v6qD!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F077525b7-6888-4573-8bb8-89e118fc4e28_800x402.gif 1272w, https://substackcdn.com/image/fetch/$s_!v6qD!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F077525b7-6888-4573-8bb8-89e118fc4e28_800x402.gif 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>There are 100+ resources to learn all things data (science, engineering, analysis). It includes videos, courses, projects and can be filtered by tech stack (Python, SQL, Spark and etc), skill level (Beginner, Intermediate and so on) provider name or free/paid. So if you know other people who like staying up to date on all things data, please share Data Tinkerer with them!</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.datatinkerer.io/leaderboard?&amp;utm_source=post&quot;,&quot;text&quot;:&quot;Refer a friend&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.datatinkerer.io/leaderboard?&amp;utm_source=post"><span>Refer a friend</span></a></p><p>With that out of the way, let&#8217;s get to the today&#8217;s topic of decision making!</p><div><hr></div><p>Last week I was reading <em><a href="http://www.incompleteideas.net/IncIdeas/BitterLesson.html">the bitter lesson</a></em> by Rich Sutton again. Great read, if you haven&#8217;t already. And it got me thinking: are there similar patterns in the world of data? Are there cases where we overvalue the human knowledge of the domain? And I think there is a close analogy in decision-making.</p><p>The biggest lesson that can be read from years of judgment and decision-making research is that simple, consistent rules beat human judgement most of the time.</p><p>This can be hard to accept because it feels like an attack on expertise but the evidence is blunt: human judgment is often noisy, inconsistent and overconfident. <a href="https://psycnet.apa.org/PsycBOOKS/toc/11281">Paul Meehl made the argument</a> in 1954 that statistical rules often outperform expert clinical judgment. <a href="https://pubmed.ncbi.nlm.nih.gov/10752360/">Later reviews</a> found the same pattern across many fields.</p><p>We like to believe that experience gives us a special ability to see the truth of a case. We like to believe that judgment lives in the rich details, the subtle signals, the human context. But often, when we test that belief against outcomes, the details we trusted were noise, the subtle signals were distractions and the human context gave us more confidence than accuracy.</p><p><a href="https://www.amazon.com.au/Thinking-Fast-Slow-Daniel-Kahneman/dp/0374533555">Daniel Kahneman&#8217;s work</a> helps explain why. Intuition is pattern recognition, not magic. It works when the environment is stable and feedback is fast and clear. Chess players can build good intuition because they see repeated patterns and receive immediate feedback. Many business, hiring, policy and strategy decisions are different. Feedback is delayed, messy and often ambiguous. A bad decision can look good because of luck. A good decision can look bad because the environment changed. In that kind of setting, intuition can feel like expertise while behaving more like confidence.</p><p>Hiring is a simple example. Managers often believe they can spot talent through conversation. They notice confidence, polish, energy and &#8216;fit&#8217;. But unstructured interviews are noisy. Two interviewers can see the same candidate and walk away with different conclusions. The same interviewer may judge differently depending on mood, fatigue or one memorable answer. A structured process feels less impressive: define criteria, ask similar questions, score each dimension and combine the scores. But that boring process is usually fairer and more predictive.</p><p>The same pattern appears in admissions, forecasting, insurance, performance reviews and risk assessment. Kahneman and Sunstein called this problem &#8216;noise&#8217;: unwanted variation in judgments that should be much closer together. In <em><a href="https://www.amazon.com.au/Noise-Human-Judgment-Daniel-Kahneman/dp/0316451401">Noise</a></em>, they describe an insurance company where underwriters independently priced the same fictitious cases and the median variation in premiums was 55%, far higher than executives expected.</p><p>Bias gets most of the attention because it has a story. Noise is harder to see because there is no obvious villain. No one needs to be irrational, corrupt or incompetent. People can be smart, experienced and honest and still produce wildly different judgments.</p><p>In AI, methods that scaled with computation beat methods that tried to encode human cleverness. In decision-making, methods that scale with evidence and consistency beat methods that try to preserve human intuition.</p><p>That does not mean algorithms should replace people. Humans still define the goal, decide what data is legitimate, handle ethical trade-offs and question whether history still applies. The lesson is narrower: stop using intuition for the parts of decision-making where tested, consistent rules do better. Use people to frame the problem but use algorithms for repeated decision making scenarios.</p><p>What are your thoughts?</p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.datatinkerer.io/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">If you liked this post and don&#8217;t want to miss the next one, subscribe to Data Tinkerer!</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>If you are already subscribed and enjoyed the article, please give it a like and/or share it others, really appreciate it &#128591;</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.datatinkerer.io/p/how-flipkart-scaled-delivery-date-calculation-10x-while-slashing-latency-by-90-percent?utm_source=substack&amp;utm_medium=email&amp;utm_content=share&amp;action=share&amp;token=eyJ1c2VyX2lkIjoyOTE1OTA0NDIsInBvc3RfaWQiOjE2NTc0ODUwNywiaWF0IjoxNzUwMzk3MzY0LCJleHAiOjE3NTI5ODkzNjQsImlzcyI6InB1Yi0zNDIyNzQwIiwic3ViIjoicG9zdC1yZWFjdGlvbiJ9.xMVHaKGWTC47WqhR6hcO-xxeFYygxQHHETN-klF2KxQ&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://www.datatinkerer.io/p/how-flipkart-scaled-delivery-date-calculation-10x-while-slashing-latency-by-90-percent?utm_source=substack&amp;utm_medium=email&amp;utm_content=share&amp;action=share&amp;token=eyJ1c2VyX2lkIjoyOTE1OTA0NDIsInBvc3RfaWQiOjE2NTc0ODUwNywiaWF0IjoxNzUwMzk3MzY0LCJleHAiOjE3NTI5ODkzNjQsImlzcyI6InB1Yi0zNDIyNzQwIiwic3ViIjoicG9zdC1yZWFjdGlvbiJ9.xMVHaKGWTC47WqhR6hcO-xxeFYygxQHHETN-klF2KxQ"><span>Share</span></a></p><div><hr></div><h3>Keep learning</h3><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;bb408743-6593-4eef-a3a4-809e4ed44c6f&quot;,&quot;caption&quot;:&quot;The tough part of analytics isn&#8217;t the SQL syntax or the dashboard design.<br /><br />It&#8217;s the judgment behind when to stop.<br /><br />Push too far toward speed, and your work becomes unreliable. Lean too hard into accuracy, and you&#8217;re the person who slows decisions down. Mastering that balance is what separates a good analyst from a trusted one<br /><br />In this article we look at how to strike the balance between the two!&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;lg&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;The Data Analyst&#8217;s Dilemma: Accuracy vs Speed&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:291590442,&quot;name&quot;:&quot;Data Tinkerer&quot;,&quot;bio&quot;:&quot;Head of analytics sharing deep dives and learnings about AI and all things data (science, engineering, analysis)&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/83d3bbe0-5fb8-4f8d-9b74-036abbd6fec9_500x500.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2025-09-25T06:30:36.780Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/$s_!SqVD!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9b6848e8-0008-4a8b-8952-e8bf1028cb64_1024x1024.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.datatinkerer.io/p/the-data-analysts-dilemma-accuracy-vs-speed&quot;,&quot;section_name&quot;:&quot;Data Analysis&quot;,&quot;video_upload_id&quot;:null,&quot;id&quot;:174416277,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:8,&quot;comment_count&quot;:2,&quot;publication_id&quot;:3422740,&quot;publication_name&quot;:&quot;Data Tinkerer&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!JEdj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6bea5ccd-f356-4154-a1db-16268300510e_500x500.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;66c737e2-2235-4e7f-9fda-780220e97a03&quot;,&quot;caption&quot;:&quot;Ever delivered a great analysis and then had no idea if it changed anything?<br /><br />This one&#8217;s for you: how to track impact, prove it and actually talk about it when it matters.&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;lg&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;How to Show Impact as a Data Analyst&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:291590442,&quot;name&quot;:&quot;Data Tinkerer&quot;,&quot;bio&quot;:&quot;Ex-head of analytics sharing deep dives and learnings about AI and all things data (science, engineering, analysis)&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/83d3bbe0-5fb8-4f8d-9b74-036abbd6fec9_500x500.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2025-07-31T08:15:33.990Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/$s_!2GHL!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a1bc544-ff07-4deb-865b-6fed89366c42_1024x1536.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.datatinkerer.io/p/how-to-show-impact-as-a-data-analyst&quot;,&quot;section_name&quot;:&quot;Data Analysis&quot;,&quot;video_upload_id&quot;:null,&quot;id&quot;:169717748,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:11,&quot;comment_count&quot;:0,&quot;publication_id&quot;:3422740,&quot;publication_name&quot;:&quot;Data Tinkerer&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!JEdj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6bea5ccd-f356-4154-a1db-16268300510e_500x500.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div>]]></content:encoded></item><item><title><![CDATA[How Airtable Saved Millions by Cutting Archive Storage Costs by 100x]]></title><description><![CDATA[Airtable moved petabytes of cold log data out of MySQL and built a cheaper archive layer on S3 and Parquet without sacrificing fast queries.]]></description><link>https://www.datatinkerer.io/p/how-airtable-saved-millions-by-cutting</link><guid isPermaLink="false">https://www.datatinkerer.io/p/how-airtable-saved-millions-by-cutting</guid><dc:creator><![CDATA[Data Tinkerer]]></dc:creator><pubDate>Thu, 23 Apr 2026 04:53:35 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/0887cd3f-fd83-4fbd-93df-e009c31ed22b_1536x1024.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Fellow Data Tinkerers</p><p>Today we will look at how Airtable cut archive storage costs by 100x and saved millions.</p><p>But before that, I wanted to share with you what you could unlock if you share Data Tinkerer with just <strong>1 more person</strong>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!-UN6!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa84d7578-0b5f-40e6-9c87-afcbb488bda1_800x402.gif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!-UN6!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa84d7578-0b5f-40e6-9c87-afcbb488bda1_800x402.gif 424w, https://substackcdn.com/image/fetch/$s_!-UN6!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa84d7578-0b5f-40e6-9c87-afcbb488bda1_800x402.gif 848w, https://substackcdn.com/image/fetch/$s_!-UN6!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa84d7578-0b5f-40e6-9c87-afcbb488bda1_800x402.gif 1272w, https://substackcdn.com/image/fetch/$s_!-UN6!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa84d7578-0b5f-40e6-9c87-afcbb488bda1_800x402.gif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!-UN6!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa84d7578-0b5f-40e6-9c87-afcbb488bda1_800x402.gif" width="800" height="402" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a84d7578-0b5f-40e6-9c87-afcbb488bda1_800x402.gif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:402,&quot;width&quot;:800,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:3369150,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/gif&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/194669935?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa84d7578-0b5f-40e6-9c87-afcbb488bda1_800x402.gif&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!-UN6!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa84d7578-0b5f-40e6-9c87-afcbb488bda1_800x402.gif 424w, https://substackcdn.com/image/fetch/$s_!-UN6!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa84d7578-0b5f-40e6-9c87-afcbb488bda1_800x402.gif 848w, https://substackcdn.com/image/fetch/$s_!-UN6!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa84d7578-0b5f-40e6-9c87-afcbb488bda1_800x402.gif 1272w, https://substackcdn.com/image/fetch/$s_!-UN6!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa84d7578-0b5f-40e6-9c87-afcbb488bda1_800x402.gif 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>There are 100+ resources to learn all things data (science, engineering, analysis). It includes videos, courses, projects and can be filtered by tech stack (Python, SQL, Spark and etc), skill level (Beginner, Intermediate and so on) provider name or free/paid. So if you know other people who like staying up to date on all things data, please share Data Tinkerer with them!</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.datatinkerer.io/leaderboard?&amp;utm_source=post&quot;,&quot;text&quot;:&quot;Refer a friend&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.datatinkerer.io/leaderboard?&amp;utm_source=post"><span>Refer a friend</span></a></p><p>Now, with that out of the way, let&#8217;s get into how Airtable pulled it off.</p><h3>TL;DR</h3><div><hr></div><h4><strong>Situation</strong></h4><p>Airtable&#8217;s MySQL storage had grown to petabytes, with archive tables driving major cost and scale issues. Some databases were also getting close to the 64TB RDS limit.</p><h4><strong>Task</strong></h4><p>The team needed to move cold archive data out of MySQL without breaking revision history or slowing queries. They also had to keep durability, availability and enterprise requirements intact.</p><h4><strong>Action</strong></h4><p>Airtable built a two-tier system: recent data stayed in MySQL, old data moved to S3 as Parquet files. They used DataFusion for querying, plus Flink, compaction, validation, caching, indexes and bloom filters.</p><h4><strong>Result</strong></h4><p>Parquet made the archive dataset about 10x smaller and S3 was about 10x cheaper than MySQL. That led to roughly <strong>100x lower storage costs</strong> and <strong>millions in annual savings</strong>.</p><h4><strong>Use Cases</strong></h4><p>Archiving data, reducing storage cost, Improving query latency</p><h4><strong>Tech Stack/Framework</strong></h4><p>Apache DataFusion, AWS MySQL RDS, Apache Flink, AWS SQS</p><div><hr></div><h3>Explained further</h3><div><hr></div><h4>Context</h4><p>Airtable&#8217;s storage team went into 2024 with a pretty blunt problem: too much archive data sitting in the wrong place.</p><p>Their AWS MySQL RDS footprint had grown to petabytes, some of the biggest databases were getting dangerously close to the 64TB RDS disk limit and one particular class of data was doing most of the damage: cell history and action log tables. Together, these acted as Airtable&#8217;s archive layer, powering revision history and helping with internal debugging. For some enterprise customers, that data also had to be retained for up to 10 years.</p><p>The issue was not that the data had no value. It clearly did. The issue was that MySQL was an expensive home for a workload that was mostly cold.</p><p>Most of this archive data was old, rarely touched and read-only except for hard deletion cases. When it was queried, the access patterns were fairly predictable: point selects and paginated range queries, always scoped to a specific base. That made it a poor match for row-oriented OLTP storage at this scale, especially when Airtable still needed interactive latency and strong durability and availability guarantees.</p><p>So the team built a new two-tier storage system. Recent rows would stay in MySQL. Older archive data would move to S3, be stored as Parquet files partitioned by base and queried through a new engine built on Apache DataFusion.</p><p>That shift did more than trim costs around the edges. The final archived dataset became 10x smaller than the original data in MySQL thanks to Parquet compression and S3 itself was around 10x cheaper per byte than MySQL storage. Put those together and the result was a storage layer that was about 100x cheaper.</p><div><hr></div><h4>Why Airtable needed a better archive layer</h4><p>Airtable&#8217;s archive data had a few characteristics that mattered a lot:</p><ul><li><p>the overwhelming majority (trillions of rows) of it was old and infrequently accessed</p></li><li><p>most reads were point selects or range queries used for pagination</p></li><li><p>queries were always filtered to a single base</p></li><li><p>old data was effectively immutable</p></li><li><p>the data was keyed by MySQL&#8217;s <code>autoincr_id</code>, so it naturally followed insertion order</p></li></ul><p>That combination is useful. It tells you the team did not need a general-purpose database for this layer. They needed something cheaper that still handled a narrow set of read patterns well.</p><p>The first key idea was to move archive data from MySQL into S3. S3 was already much cheaper byte-for-byte. The second was to store that data in Parquet and partition it by base. The third was to place a query engine in front of it that could answer interactive requests without forcing full scans.</p><p>This became a two-tier system: hot and recent archive rows in MySQL, older rows in S3-backed Parquet. That let Airtable keep the user-facing experience intact while steadily pulling massive amounts of cold data out of an expensive OLTP system.</p><div><hr></div><h4>Architecture overview</h4><p>At a high level, the architecture is simple enough to explain in one sentence: archive data moved from MySQL into S3 Parquet files, and <a href="https://datafusion.apache.org/">DataFusion</a> was used to query those files directly with low enough latency to support product features.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!uA6C!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5812c0d-7db5-4ff0-a5df-70d50bfc39e0_720x405.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!uA6C!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5812c0d-7db5-4ff0-a5df-70d50bfc39e0_720x405.webp 424w, https://substackcdn.com/image/fetch/$s_!uA6C!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5812c0d-7db5-4ff0-a5df-70d50bfc39e0_720x405.webp 848w, https://substackcdn.com/image/fetch/$s_!uA6C!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5812c0d-7db5-4ff0-a5df-70d50bfc39e0_720x405.webp 1272w, https://substackcdn.com/image/fetch/$s_!uA6C!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5812c0d-7db5-4ff0-a5df-70d50bfc39e0_720x405.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!uA6C!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5812c0d-7db5-4ff0-a5df-70d50bfc39e0_720x405.webp" width="720" height="405" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c5812c0d-7db5-4ff0-a5df-70d50bfc39e0_720x405.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:405,&quot;width&quot;:720,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:21292,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/webp&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/194669935?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5812c0d-7db5-4ff0-a5df-70d50bfc39e0_720x405.webp&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!uA6C!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5812c0d-7db5-4ff0-a5df-70d50bfc39e0_720x405.webp 424w, https://substackcdn.com/image/fetch/$s_!uA6C!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5812c0d-7db5-4ff0-a5df-70d50bfc39e0_720x405.webp 848w, https://substackcdn.com/image/fetch/$s_!uA6C!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5812c0d-7db5-4ff0-a5df-70d50bfc39e0_720x405.webp 1272w, https://substackcdn.com/image/fetch/$s_!uA6C!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5812c0d-7db5-4ff0-a5df-70d50bfc39e0_720x405.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Architecture overview (Source: Airtable)</figcaption></figure></div><p>What made it work was the alignment between storage layout and query patterns.</p><p>The team kept the Parquet schema close to the original MySQL schema. They also preserved ordering by <code>autoincr_id</code>, because a large share of their reads depended on point lookups or ranged access on that field. That meant the query engine could use Parquet metadata to narrow down what bytes to fetch from S3 instead of pulling full files.</p><p>They also partitioned files by base, which mattered just as much. Since queries were always scoped to a specific base, partitioning by base meant Airtable could avoid touching unrelated data entirely.</p><p>On top of that, Airtable stored S3 file location metadata in DynamoDB. This gave the client layer a clean way to register the right files with the query engine and helped support enterprise requirements such as regional data residency and encryption with customer-provided keys.</p><div><hr></div><h4>How Parquet fit the workload</h4><p>Parquet was not just a cheaper format choice. It was the thing that made interactive querying on S3 plausible.</p><p>Unlike MySQL&#8217;s row-oriented <a href="https://en.wikipedia.org/wiki/InnoDB">InnoDB</a> layout, Parquet is columnar. It stores each column contiguously and groups rows into row groups, with each row group containing column chunks. More importantly, Parquet files include metadata that query engines can exploit for pruning. File metadata carries offsets and sizes, while page-level metadata can include statistics such as min/max values and bloom filters.</p><p>That matters because Airtable&#8217;s queries were usually not broad analytical scans. They were targeted reads. If a query asks for a narrow range of <code>autoincr_id</code> values within one base, and the files are sorted on that column, the engine can inspect metadata and skip most row groups without reading them.</p><p>Airtable leaned directly into that. They kept the original schema mostly intact and preserved sorting by <code>autoincr_id</code>. Because of that, the engine could selectively download the relevant byte ranges from S3 rather than treat each Parquet file like a blob.</p><p>There was also a second major upside: compression. Thanks to the columnar layout, the archived dataset ended up about 10x smaller than the original MySQL version. That is a huge result on its own. Pair that with S3&#8217;s lower storage cost and the economics really shifted in Airtable&#8217;s favor.</p><div><hr></div><h4>Picking the right query engine</h4><p>Once the storage format was settled, Airtable benchmarked several engines capable of querying Parquet in S3:</p><ul><li><p>AWS Athena</p></li><li><p>DuckDB</p></li><li><p>StarRocks</p></li><li><p>DataFusion</p></li></ul><p>Athena was ruled out quickly for latency reasons. Its API pattern of starting a query and polling for completion made it better suited to general OLAP workloads than user-facing interactive queries. Airtable was seeing query latencies in the seconds, which was too slow for revision history use cases. It also lacked the strong isolation Airtable cared about across bases.</p><p>DuckDB was useful, but not ideal for this workload. They found that query planning did not always use projection pushdowns effectively, which sometimes led to full file downloads. Simple point queries on one <code>autoincr_id</code> could still be subsecond, but overall it trailed DataFusion. The team still used DuckDB heavily during development because it was convenient for debugging Parquet contents from the command line.</p><p>StarRocks produced performance results comparable to DataFusion, but it came with the operational burden of running a full-time cluster in Kubernetes to serve relatively low-QPS cold-storage queries. Like Athena, it also did not give Airtable the same kind of strong base-level isolation.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!2acq!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5b3365c-3555-4a3f-b846-885bf7347342_720x105.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!2acq!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5b3365c-3555-4a3f-b846-885bf7347342_720x105.webp 424w, https://substackcdn.com/image/fetch/$s_!2acq!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5b3365c-3555-4a3f-b846-885bf7347342_720x105.webp 848w, https://substackcdn.com/image/fetch/$s_!2acq!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5b3365c-3555-4a3f-b846-885bf7347342_720x105.webp 1272w, https://substackcdn.com/image/fetch/$s_!2acq!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5b3365c-3555-4a3f-b846-885bf7347342_720x105.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!2acq!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5b3365c-3555-4a3f-b846-885bf7347342_720x105.webp" width="720" height="105" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c5b3365c-3555-4a3f-b846-885bf7347342_720x105.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:105,&quot;width&quot;:720,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:8406,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/webp&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/194669935?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5b3365c-3555-4a3f-b846-885bf7347342_720x105.webp&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!2acq!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5b3365c-3555-4a3f-b846-885bf7347342_720x105.webp 424w, https://substackcdn.com/image/fetch/$s_!2acq!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5b3365c-3555-4a3f-b846-885bf7347342_720x105.webp 848w, https://substackcdn.com/image/fetch/$s_!2acq!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5b3365c-3555-4a3f-b846-885bf7347342_720x105.webp 1272w, https://substackcdn.com/image/fetch/$s_!2acq!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5b3365c-3555-4a3f-b846-885bf7347342_720x105.webp 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption">Query performance results (Source: Airtable)</figcaption></figure></div><p>That left DataFusion.</p><p>For Airtable, DataFusion hit the sweet spot. It was strong at exploiting Parquet metadata, it was extensible and because it is an embedded Rust library, the team could run it inside their existing worker architecture.</p><p>That brought a few clear benefits.</p><p>Operationally, there was no extra service to deploy and babysit. Isolation came for free because each base already had its own process boundary. And request affinity stayed high because the same workers kept serving the same bases, which later made caching hit rates excellent.</p><p>In other words, DataFusion fit Airtable&#8217;s architecture.</p><div><hr></div><h4>Migrating data out of MySQL</h4><p>Designing the new storage layer was one thing. Moving petabytes of live data into it without breaking anything was another.</p><p>Airtable wanted a one-time migration process that would start cutting MySQL storage costs quickly. To get a consistent export view, they used <a href="https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_ExportSnapshot.html">AWS RDS snapshot capabilities</a>, which produced large Parquet files of full tables. They had also prototyped direct SQL extraction into Parquet, but chose not to productionize that approach at scale. Snapshots were preferred because they ran against backup instances and avoided extra pressure on production systems.</p><p>The challenge was that these snapshots were massive table-level exports across many shards, while Airtable&#8217;s serving model required files partitioned by base.</p><p>So the team added a repartitioning and compaction pipeline.</p><p>First, Flink jobs parallelized across shard snapshots and repartitioned records by base into intermediate S3 directories. Then AWS Step Functions scanned those intermediate outputs and enqueued bases into SQS. From there, custom compactor code merged the files, merge-sorted them, deduplicated records and produced final serving Parquet files capped at 1GB each.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!YQr0!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F303b0a8f-3f61-473c-a08c-b54e818cee02_720x475.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!YQr0!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F303b0a8f-3f61-473c-a08c-b54e818cee02_720x475.webp 424w, https://substackcdn.com/image/fetch/$s_!YQr0!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F303b0a8f-3f61-473c-a08c-b54e818cee02_720x475.webp 848w, https://substackcdn.com/image/fetch/$s_!YQr0!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F303b0a8f-3f61-473c-a08c-b54e818cee02_720x475.webp 1272w, https://substackcdn.com/image/fetch/$s_!YQr0!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F303b0a8f-3f61-473c-a08c-b54e818cee02_720x475.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!YQr0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F303b0a8f-3f61-473c-a08c-b54e818cee02_720x475.webp" width="720" height="475" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/303b0a8f-3f61-473c-a08c-b54e818cee02_720x475.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:475,&quot;width&quot;:720,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:26436,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/webp&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/194669935?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F303b0a8f-3f61-473c-a08c-b54e818cee02_720x475.webp&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!YQr0!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F303b0a8f-3f61-473c-a08c-b54e818cee02_720x475.webp 424w, https://substackcdn.com/image/fetch/$s_!YQr0!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F303b0a8f-3f61-473c-a08c-b54e818cee02_720x475.webp 848w, https://substackcdn.com/image/fetch/$s_!YQr0!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F303b0a8f-3f61-473c-a08c-b54e818cee02_720x475.webp 1272w, https://substackcdn.com/image/fetch/$s_!YQr0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F303b0a8f-3f61-473c-a08c-b54e818cee02_720x475.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Overview of the migration process (Source: Airtable)</figcaption></figure></div><p>That 1GB size was not arbitrary. It was chosen through benchmarking as a good serving size with a useful density of page groups per file. Again, a small design detail with big latency implications.</p><div><hr></div><h4>Validating the migration</h4><p>A migration like this lives or dies on validation.</p><p>Airtable first ran bulk validation to confirm that data had not been corrupted during export, repartitioning and compaction. For this, they spun up a StarRocks cluster and compared the serving Parquet files against the original RDS snapshots, finding zero cases of data corruption.</p><p>We covered this validation approach in more detail in an earlier piece on how Airtable made archive validation work at petabyte scale. </p><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;d35a32c3-a903-40ce-8426-c018883774af&quot;,&quot;caption&quot;:&quot;&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;lg&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;How Airtable Made Archive Validation Work at Petabyte Scale&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:291590442,&quot;name&quot;:&quot;Data Tinkerer&quot;,&quot;bio&quot;:&quot;Head of analytics sharing deep dives and learnings about AI and all things data (science, engineering, analysis)&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/83d3bbe0-5fb8-4f8d-9b74-036abbd6fec9_500x500.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2025-04-10T06:22:40.886Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/$s_!T_KN!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F682b198e-3ef1-40ee-b823-b52f80699458_1024x1024.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.datatinkerer.io/p/how-airtable-made-archive-validation-work-at-petabyte-scale&quot;,&quot;section_name&quot;:&quot;Data Engineering&quot;,&quot;video_upload_id&quot;:null,&quot;id&quot;:160983286,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:3,&quot;comment_count&quot;:0,&quot;publication_id&quot;:3422740,&quot;publication_name&quot;:&quot;Data Tinkerer&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!JEdj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6bea5ccd-f356-4154-a1db-16268300510e_500x500.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><p>The important point is that bulk validation gave the team confidence that the migration pipeline itself had preserved the data correctly.</p><p>That is a strong result, though it only answered one part of the problem. The bigger challenge was all the new storage client logic that now sat around the data:</p><ul><li><p>a Rust query engine built with DataFusion</p></li><li><p>integration into Node.js using napi-rs</p></li><li><p>logic to combine MySQL and S3 results</p></li><li><p>logic to identify which S3 files to read</p></li><li><p>support for enterprise features like customer keys, data residency and hard deletes</p></li></ul><p>Bulk validation could confirm the data files were right. It could not prove that the full end-to-end user experience would behave exactly as before.</p><p>So Airtable moved to shadow validation on live traffic. Requests continued reading from MySQL as normal, while the same queries were also executed in the background through the new system. That let the team compare outputs under real conditions and catch implementation issues before rollout.</p><p>The bugs they found were exactly the sort of things that show up when systems cross language runtimes and execution models:</p><ul><li><p>float precision mismatches between JavaScript and Rust&#8217;s serde JSON handling</p></li><li><p>a sorting issue where DataFusion used lexicographic rather than numeric ordering</p></li><li><p>a crashing <code>SIGABRT</code> issue tied to async napi-rs and Node.js worker threads</p></li><li><p>latency problems</p></li></ul><p>They resolved these before launch and before deleting the MySQL copies of the migrated data.</p><div><hr></div><h4>Fixing latency bottlenecks</h4><p>Once staged rollout began, latency became the main battleground.</p><p>That is not surprising. S3-backed query systems usually do not fail because storage is too expensive. They fail because network round trips and scan inefficiencies make them annoyingly slow.</p><p>Airtable saw a mix of problems: inefficient query plans, too many S3 requests and cases where sparse filters still caused too much data to be downloaded. They responded with a few targeted optimizations.</p><p><strong>Building a tiered cache for archive queries</strong></p><p>Caching turned out to be one of the biggest wins.</p><p>Under the hood, DataFusion translates SQL queries into S3 GET operations. It fetches Parquet footer metadata, column chunk metadata and then decides what row groups and byte ranges need to be read. If every step involves another network trip, latency stacks up quickly.</p><p>So Airtable built a tiered caching system.</p><p>The first layer used DataFusion&#8217;s built-in cache support to store Parquet file metadata and S3 <code>ListObjects</code> results.</p><p>The second layer cached additional Parquet page header metadata in memory. Combined with the first layer, this reduced how often the engine had to round-trip to S3 during query planning. Airtable wrote a custom implementation around DataFusion&#8217;s parquet reader interfaces, which let them cache metadata results directly and add instrumentation. The result was a reported 99%+ cache hit ratio.</p><p>That number is believable in context because DataFusion ran inside per-base workers and the files themselves were partitioned by base. The system had strong locality by design.</p><p>Finally, Airtable added an on-disk cache for full Parquet files. This was reserved for a very small number of heavy bases with bad enough query patterns to justify the extra work and cost. Unlike metadata caching, downloading whole files is not something you want to do casually. But for outlier cases, it gave the team another escape hatch.</p><p><strong>Building custom indexes for sparse queries</strong></p><p>Not every query could be handled efficiently by the base file layout alone.</p><p>Most reads were anchored on <code>autoincr_id</code>, but Airtable also had filters on other fields that could reduce result sets dramatically. Examples included filtering by action type, filtering by row or excluding sync-generated updates.</p><p>For some bases, those additional conditions matched only a tiny slice of rows. In those cases, even if <code>autoincr_id</code> helped somewhat, reading broad sections of Parquet files was still wasteful.</p><p>So Airtable built a secondary indexing system.</p><p>Using DataFusion, they scanned Parquet files and wrote index data out as new Parquet files. The client layer knew how to query those indexes first, then use the result to build a more targeted query against the original archive files.</p><p>This was much easier to do because the data was effectively read-only. Airtable did not need to solve the usual headache of synchronizing constantly changing base tables and secondary indexes. Static data makes a lot of index ideas suddenly practical.</p><p><strong>Using Bloom Filters for faster lookups</strong></p><p>There was one more edge case: lower-QPS point lookups on a different unique identifier that was randomly distributed.</p><p>That broke the usual min/max pruning strategy. Since the identifier values were not ordered, Parquet statistics were not helpful. Without another technique, the engine would need to fetch and scan every page group before applying the filter.</p><p>Airtable could have solved this with another custom index, but they chose a simpler route: <a href="https://parquet.apache.org/docs/file-format/bloomfilter/">Parquet bloom filters</a>.</p><p>Bloom filters are probabilistic membership structures. They can tell you if a value is definitely not present, or maybe present. False positives are possible. False negatives are not.</p><p>That property is enough for pruning. If the bloom filter says a page group definitely does not contain the target identifier, the engine can skip it safely. DataFusion already understood Parquet bloom filter metadata, so Airtable could rely on native support instead of bolting on another indexing layer.</p><div><hr></div><h4>Conclusion</h4><p>Airtable&#8217;s storage team took a dataset that had clearly outgrown MySQL&#8217;s economics and built a system that matched the workload far better.</p><p>They moved petabytes of archive data out of MySQL, kept recent data in the transactional store, archived old rows into base-partitioned Parquet files on S3 and queried those files with an embedded DataFusion engine. Along the way, they layered in DynamoDB metadata registration, a large-scale migration pipeline, bulk and shadow validation, multiple caching layers, custom secondary indexes and bloom filter-based pruning.</p><p>The result was a storage system that stayed durable and queryable at interactive latency while cutting storage costs by around 100x and saving millions of dollars per year.</p><p>There is still more to do. Airtable&#8217;s first implementation focused on bulk migration so they could start saving money quickly. The longer-term goal is incremental archiving, likely through a CDC-style system such as Flink. That opens up a new set of engineering problems around compaction, index rebuilds and operations. There are also other log-like tables that could be migrated onto the same platform.</p><p>Still, the core idea is already proven.</p><p>If a dataset is mostly cold, mostly read-only and queried through a narrow set of predictable access patterns, keeping it in an expensive OLTP database is often just inertia dressed up as architecture. Airtable looked at the shape of the workload, changed the storage model to match it and got the kind of result every infra team wants: better economics without making the product worse.</p><div><hr></div><h3>The full scoop</h3><p>To learn more about this, check <a href="https://medium.com/airtable-eng/how-we-reduced-archive-storage-costs-by-100x-and-saved-millions-21754b5a6c8e">Airtable's Engineering Blog</a> post on this topic</p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.datatinkerer.io/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">If you liked this post and don&#8217;t want to miss the next one, subscribe to Data Tinkerer!</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>If you are already subscribed and enjoyed the article, please give it a like and/or share it others, really appreciate it &#128591;</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.datatinkerer.io/p/how-airtable-saved-millions-by-cutting?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.datatinkerer.io/p/how-airtable-saved-millions-by-cutting?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><div><hr></div><h3>Keep learning</h3><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;1bbcb654-6113-4e9e-8400-1e3da752d647&quot;,&quot;caption&quot;:&quot;Notion scaled AI Q&amp;amp;A to millions of workspaces while increasing onboarding throughput 600x and cutting costs by up to 90%.<br /><br />Under the hood, that meant rethinking everything from sharding and indexing to embeddings generation, moving from a dual Spark + API setup to a simpler, unified pipeline.<br /><br />This piece breaks down how they handled multi-tenant vector search at scale, avoided unnecessary recomputation and rebuilt their search stack to be faster and easier to operate.&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;lg&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;How Notion Scaled AI Q&amp;A to Millions of Workspaces&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:291590442,&quot;name&quot;:&quot;Data Tinkerer&quot;,&quot;bio&quot;:&quot;Head of analytics sharing deep dives and learnings about AI and all things data (science, engineering, analysis)&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/83d3bbe0-5fb8-4f8d-9b74-036abbd6fec9_500x500.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2026-03-26T04:00:33.287Z&quot;,&quot;cover_image&quot;:&quot;https://images.unsplash.com/photo-1683114010575-3ead4403180e?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxfHxub3Rpb258ZW58MHx8fHwxNzc0MjU5NjgzfDA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.datatinkerer.io/p/how-notion-scaled-ai-q-and-a-to-millions-of-workspaces&quot;,&quot;section_name&quot;:&quot;Data Engineering&quot;,&quot;video_upload_id&quot;:null,&quot;id&quot;:191742179,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:11,&quot;comment_count&quot;:0,&quot;publication_id&quot;:3422740,&quot;publication_name&quot;:&quot;Data Tinkerer&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!JEdj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6bea5ccd-f356-4154-a1db-16268300510e_500x500.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;0dbf0d77-87fd-4655-82da-31cc841a6d73&quot;,&quot;caption&quot;:&quot;LinkedIn pushed Venice to handle 175M+ lookups per second while ingesting 230M writes per second.<br /><br />This piece breaks down how they balanced compaction, CPU bottlenecks and adaptive throttling to scale ingestion under eventual consistency.&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;lg&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;How LinkedIn Built a Pipeline That Scales to 230M Records/sec Without Breaking SLAs&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:291590442,&quot;name&quot;:&quot;Data Tinkerer&quot;,&quot;bio&quot;:&quot;Head of analytics sharing deep dives and learnings about AI and all things data (science, engineering, analysis)&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/83d3bbe0-5fb8-4f8d-9b74-036abbd6fec9_500x500.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2026-02-19T04:00:52.353Z&quot;,&quot;cover_image&quot;:&quot;https://images.unsplash.com/photo-1712217559097-cc2aaf698767?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxM3x8bGlua2VkaW58ZW58MHx8fHwxNzcxMTMwNjQ1fDA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.datatinkerer.io/p/how-linkedin-built-a-pipeline-that-scales-to-230-million-records&quot;,&quot;section_name&quot;:&quot;Data Engineering&quot;,&quot;video_upload_id&quot;:null,&quot;id&quot;:187999868,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:10,&quot;comment_count&quot;:0,&quot;publication_id&quot;:3422740,&quot;publication_name&quot;:&quot;Data Tinkerer&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!JEdj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6bea5ccd-f356-4154-a1db-16268300510e_500x500.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div>]]></content:encoded></item><item><title><![CDATA[How Pinterest Used Multimodal AI to Help Millions of Shoppers]]></title><description><![CDATA[Inside the multimodal AI pipeline that converted images, metadata and search behavior into scalable shopping discovery.]]></description><link>https://www.datatinkerer.io/p/how-pinterest-used-multimodal-ai-to-help-shoppers</link><guid isPermaLink="false">https://www.datatinkerer.io/p/how-pinterest-used-multimodal-ai-to-help-shoppers</guid><dc:creator><![CDATA[Data Tinkerer]]></dc:creator><pubDate>Thu, 16 Apr 2026 04:15:48 GMT</pubDate><enclosure url="https://images.unsplash.com/photo-1698399480539-327a5f6975f3?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw2fHxwaW50ZXJlc3R8ZW58MHx8fHwxNzc1OTg0MzkxfDA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Fellow Data Tinkerers!</p><p>Today we will look at how Pinterest used multimodal AI to organise products into relevant shopping collections</p><p>But before that, I wanted to share with you what you could unlock if you share Data Tinkerer with just <strong>1 more person</strong>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Osb6!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff386d2fb-5785-499c-b1e9-f96495792b1f_800x402.gif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Osb6!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff386d2fb-5785-499c-b1e9-f96495792b1f_800x402.gif 424w, https://substackcdn.com/image/fetch/$s_!Osb6!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff386d2fb-5785-499c-b1e9-f96495792b1f_800x402.gif 848w, https://substackcdn.com/image/fetch/$s_!Osb6!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff386d2fb-5785-499c-b1e9-f96495792b1f_800x402.gif 1272w, https://substackcdn.com/image/fetch/$s_!Osb6!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff386d2fb-5785-499c-b1e9-f96495792b1f_800x402.gif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Osb6!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff386d2fb-5785-499c-b1e9-f96495792b1f_800x402.gif" width="800" height="402" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f386d2fb-5785-499c-b1e9-f96495792b1f_800x402.gif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:402,&quot;width&quot;:800,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:3369150,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/gif&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/191742085?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff386d2fb-5785-499c-b1e9-f96495792b1f_800x402.gif&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Osb6!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff386d2fb-5785-499c-b1e9-f96495792b1f_800x402.gif 424w, https://substackcdn.com/image/fetch/$s_!Osb6!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff386d2fb-5785-499c-b1e9-f96495792b1f_800x402.gif 848w, https://substackcdn.com/image/fetch/$s_!Osb6!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff386d2fb-5785-499c-b1e9-f96495792b1f_800x402.gif 1272w, https://substackcdn.com/image/fetch/$s_!Osb6!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff386d2fb-5785-499c-b1e9-f96495792b1f_800x402.gif 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><p>There are 100+ resources to learn all things data (science, engineering, analysis). It includes videos, courses, projects and can be filtered by tech stack (Python, SQL, Spark and etc), skill level (Beginner, Intermediate and so on)  provider name or free/paid. So if you know other people who like staying up to date on all things data, please share Data Tinkerer with them!</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.datatinkerer.io/leaderboard?&amp;referrer_token=4tlsmi&amp;utm_source=post&quot;,&quot;text&quot;:&quot;Refer a friend&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://www.datatinkerer.io/leaderboard?&amp;referrer_token=4tlsmi&amp;utm_source=post"><span>Refer a friend</span></a></p><p>Now, with that out of the way, let&#8217;s get to PinLanding: Pinterest&#8217;s AI system for shopping collections</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://images.unsplash.com/photo-1698399480539-327a5f6975f3?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw2fHxwaW50ZXJlc3R8ZW58MHx8fHwxNzc1OTg0MzkxfDA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://images.unsplash.com/photo-1698399480539-327a5f6975f3?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw2fHxwaW50ZXJlc3R8ZW58MHx8fHwxNzc1OTg0MzkxfDA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 424w, https://images.unsplash.com/photo-1698399480539-327a5f6975f3?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw2fHxwaW50ZXJlc3R8ZW58MHx8fHwxNzc1OTg0MzkxfDA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 848w, https://images.unsplash.com/photo-1698399480539-327a5f6975f3?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw2fHxwaW50ZXJlc3R8ZW58MHx8fHwxNzc1OTg0MzkxfDA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 1272w, https://images.unsplash.com/photo-1698399480539-327a5f6975f3?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw2fHxwaW50ZXJlc3R8ZW58MHx8fHwxNzc1OTg0MzkxfDA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 1456w" sizes="100vw"><img src="https://images.unsplash.com/photo-1698399480539-327a5f6975f3?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw2fHxwaW50ZXJlc3R8ZW58MHx8fHwxNzc1OTg0MzkxfDA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080" width="5568" height="3712" data-attrs="{&quot;src&quot;:&quot;https://images.unsplash.com/photo-1698399480539-327a5f6975f3?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw2fHxwaW50ZXJlc3R8ZW58MHx8fHwxNzc1OTg0MzkxfDA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:3712,&quot;width&quot;:5568,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;a person using a laptop computer on a desk&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="a person using a laptop computer on a desk" title="a person using a laptop computer on a desk" srcset="https://images.unsplash.com/photo-1698399480539-327a5f6975f3?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw2fHxwaW50ZXJlc3R8ZW58MHx8fHwxNzc1OTg0MzkxfDA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 424w, https://images.unsplash.com/photo-1698399480539-327a5f6975f3?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw2fHxwaW50ZXJlc3R8ZW58MHx8fHwxNzc1OTg0MzkxfDA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 848w, https://images.unsplash.com/photo-1698399480539-327a5f6975f3?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw2fHxwaW50ZXJlc3R8ZW58MHx8fHwxNzc1OTg0MzkxfDA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 1272w, https://images.unsplash.com/photo-1698399480539-327a5f6975f3?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw2fHxwaW50ZXJlc3R8ZW58MHx8fHwxNzc1OTg0MzkxfDA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Photo by <a href="https://unsplash.com/@getswello">Swello</a> on <a href="https://unsplash.com">Unsplash</a></figcaption></figure></div><div><hr></div><h4><strong>Situation</strong></h4><p>Traditional shopping collections built from search logs and manual curation were not enough to cover the growing mix of detailed, compositional and AI-native shopping queries.</p><h4><strong>Task</strong></h4><p>Build a scalable system that could generate precise, searchable shopping collections directly from product content while still aligning with real user shopping intent.</p><h4><strong>Action</strong></h4><p>Pinterest built PinLanding, a multimodal pipeline that used search behavior to identify intent gaps, vision-language models to extract product attributes, clustering and LLM-as-judge to clean and validate a shopping vocabulary, a CLIP-style model to assign curated attributes at scale and Ray plus Spark to run large-scale inference and feed construction.</p><h4><strong>Result</strong></h4><p>The system generated 4.2 million shopping landing pages, expanded unique topics by 4x, improved average Precision@10 from 0.84 to 0.96 and delivered a 35% lift in search performance.</p><h4><strong>Use Cases</strong></h4><p>Search relevance improvement, attribute tagging, catalog enrichment, cold-start collection generation</p><h4><strong>Tech Stack/Framework</strong></h4><p>Apache Spark, Ray, PyArrow, CLIP, VLM</p><div><hr></div><h3>Explained further</h3><div><hr></div><h4><strong>Context</strong></h4><p>Online retailers and social platforms now manage catalogs with billions of items. Pinterest is one example, but the underlying problem is much broader: how do you organize massive product inventories into shopping collections that people can actually browse?</p><p>Pinterest has been dealing with exactly that. Historically, shopping collections were mostly shaped by user search history and manual curation. That worked well enough when search behavior was more predictable and the long tail was smaller. It works a lot less well when people start searching in full sentences, asking for vibes, aesthetics and combinations of constraints all at once.</p><p>That shift is what makes PinLanding interesting.</p><p>Instead of waiting for search logs and human curation to define the collection space, Pinterest&#8217;s team flips the process around. The system starts from the product content itself, then builds shopping collections from that foundation while still staying grounded in how people actually search. In other words, it is content-first, but not content-only.</p><p>The system is built around four components: understanding user search patterns, building and validating a shopping collection vocabulary using multimodal LLMs and LLM-as-judge, constructing feeds from attributes and evaluating the system while adapting to AI-native search behavior.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!xCSb!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faa584992-a9d9-46c6-bab5-c1c1bfb7403d_720x462.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!xCSb!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faa584992-a9d9-46c6-bab5-c1c1bfb7403d_720x462.webp 424w, https://substackcdn.com/image/fetch/$s_!xCSb!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faa584992-a9d9-46c6-bab5-c1c1bfb7403d_720x462.webp 848w, https://substackcdn.com/image/fetch/$s_!xCSb!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faa584992-a9d9-46c6-bab5-c1c1bfb7403d_720x462.webp 1272w, https://substackcdn.com/image/fetch/$s_!xCSb!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faa584992-a9d9-46c6-bab5-c1c1bfb7403d_720x462.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!xCSb!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faa584992-a9d9-46c6-bab5-c1c1bfb7403d_720x462.webp" width="720" height="462" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/aa584992-a9d9-46c6-bab5-c1c1bfb7403d_720x462.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:462,&quot;width&quot;:720,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:39524,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/webp&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/191742085?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faa584992-a9d9-46c6-bab5-c1c1bfb7403d_720x462.webp&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!xCSb!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faa584992-a9d9-46c6-bab5-c1c1bfb7403d_720x462.webp 424w, https://substackcdn.com/image/fetch/$s_!xCSb!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faa584992-a9d9-46c6-bab5-c1c1bfb7403d_720x462.webp 848w, https://substackcdn.com/image/fetch/$s_!xCSb!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faa584992-a9d9-46c6-bab5-c1c1bfb7403d_720x462.webp 1272w, https://substackcdn.com/image/fetch/$s_!xCSb!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faa584992-a9d9-46c6-bab5-c1c1bfb7403d_720x462.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Pinterest shopping collection (Source: Pinterest)</figcaption></figure></div><div><hr></div><h4>1. Mapping what shoppers actually want</h4><p>The system starts with something fairly practical: understanding what users are actually trying to do.</p><p>Pinterest aggregates signals from search history, autocomplete interactions, filter usage and browse paths to estimate the distribution of shopping intents across the catalog. The goal here is not to build a giant list of every possible topic. It is to get a grounded picture of current demand, current coverage and where the system is falling short.</p><p>Two clear patterns show up.</p><p>The first is the head of the distribution: high-volume, well-formed shopping queries like &#8216;black cocktail dress&#8217; or &#8216;white linen pants.&#8217; These are the kinds of searches traditional collection systems already handle reasonably well. They are structured, frequent and easy to tie back to existing ranking and merchandising logic.</p><p>The second is where things get more interesting. There is a growing long tail of conversational and compositional queries, especially as users get more comfortable interacting with AI systems. These are requests that blend style, occasion, constraints and subjective framing like &#8216;what to wear for Italian summer vacation&#8217; or &#8216;long red satin dress with lace trim under $200&#8217;. They are describing a scenario or aesthetic.</p><p>That distinction matters because it reveals the limitations of a purely query-driven collection pipeline. The head is manageable. The tail is messy, sparse and expanding.</p><p>Pinterest uses these behavioral signals in three ways. First, they highlight product spaces where user demand is strong but collection coverage is still thin. Second, they reveal which attribute dimensions users care about, including things like color, occasion, style, fit, price and brand, across 20 categories. Third, they create the baseline against which the rest of the pipeline is evaluated.</p><p>That last point is important. PinLanding is not trying to throw away query understanding. It is trying to improve topical coverage and precision relative to the old query-driven baseline. Search behavior still matters. The difference is that the system no longer depends on search logs alone to decide what collections can exist.</p><div><hr></div><h4>2. From raw product content to searchable shopping topics</h4><p>Once Pinterest understands the behavior surface, the next step is to describe each product in a structured way that is useful for collection generation.</p><p>Each product is modeled as a multimodal tuple made up of an image plus metadata like title, description, merchant tags and price. A vision-language model (<a href="https://en.wikipedia.org/wiki/Vision-language_model">VLM</a>) is then used to generate candidate attributes for that product. Rather than producing free-form descriptions, the model is prompted to return normalized key-value pairs. That design choice makes downstream processing far easier because it turns messy visual and textual information into something closer to structured data.</p><p>The initial output is broad and useful in one sense because it has high recall. It captures lots of possible attributes. The problem is that it is too messy to use directly.</p><p>The raw VLM output tends to overproduce highly specific descriptors like &#8216;black insoles&#8217; or &#8216;lace-trim hem with side slit.&#8217; It also produces near-duplicate variants such as &#8216;boho,&#8217; &#8216;bohemian&#8217; and &#8216;boho-chic&#8217; for what is basically the same shopping concept. That creates a sparse attribute space where too many labels apply to too few products. Once that happens, collection quality starts to fall apart. You end up with lots of tiny fragments instead of reusable shopping topics.</p><p>Pinterest addresses this with a curation pipeline designed to build a compact and reusable attribute vocabulary.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!CwAh!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a948bc5-f610-4cbf-8bf6-a73ee34ec21f_720x231.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!CwAh!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a948bc5-f610-4cbf-8bf6-a73ee34ec21f_720x231.webp 424w, https://substackcdn.com/image/fetch/$s_!CwAh!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a948bc5-f610-4cbf-8bf6-a73ee34ec21f_720x231.webp 848w, https://substackcdn.com/image/fetch/$s_!CwAh!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a948bc5-f610-4cbf-8bf6-a73ee34ec21f_720x231.webp 1272w, https://substackcdn.com/image/fetch/$s_!CwAh!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a948bc5-f610-4cbf-8bf6-a73ee34ec21f_720x231.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!CwAh!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a948bc5-f610-4cbf-8bf6-a73ee34ec21f_720x231.webp" width="720" height="231" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2a948bc5-f610-4cbf-8bf6-a73ee34ec21f_720x231.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:231,&quot;width&quot;:720,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:16628,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/webp&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/191742085?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a948bc5-f610-4cbf-8bf6-a73ee34ec21f_720x231.webp&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!CwAh!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a948bc5-f610-4cbf-8bf6-a73ee34ec21f_720x231.webp 424w, https://substackcdn.com/image/fetch/$s_!CwAh!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a948bc5-f610-4cbf-8bf6-a73ee34ec21f_720x231.webp 848w, https://substackcdn.com/image/fetch/$s_!CwAh!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a948bc5-f610-4cbf-8bf6-a73ee34ec21f_720x231.webp 1272w, https://substackcdn.com/image/fetch/$s_!CwAh!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a948bc5-f610-4cbf-8bf6-a73ee34ec21f_720x231.webp 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption">Pinlanding Product Attribution Design (Source: Pinterest)</figcaption></figure></div><p>The first step is frequency filtering. Attributes that appear only on a tiny number of products are rarely useful as collection keys, so they are removed. This strips out many ultra-specific descriptors while keeping the attributes that are more likely to reflect reusable shopping concepts.</p><p>The second step is embedding-based clustering. Dense text embeddings are generated for the remaining attributes and highly similar attributes are merged. When multiple variants express the same idea, the more frequent surface form becomes the canonical one.</p><p>The final step uses LLM-as-judge to score candidate topic-query pairs. Given an attribute tuple and its generated query, a second LLM evaluates whether the pair is semantically coherent, whether it sounds like a plausible shopping intent and whether it matches typical search phrasing. This helps rank and filter candidate topics not just for internal consistency but for searchability.</p><p>That last part is the key difference between extracting attributes and building shopping topics. Attributes by themselves are too granular and too raw. The curation pipeline turns them into a vocabulary that is compact enough to scale and natural enough to match how people actually search.</p><p><strong>2.1 Scaling attribute assignment with a CLIP-style model</strong></p><p>Once the curated attribute vocabulary exists, the next question is how to assign those attributes across the full product catalog.</p><p>Running the original vision-language model on every product at production scale would be expensive and brittle. So Pinterest trains a dual-encoder model inspired by <a href="https://openai.com/index/clip/">CLIP</a> instead.</p><p>One encoder takes product image and text as input and produces a product embedding. The other encoder takes an attribute phrase and produces an attribute embedding in the same vector space. During training, product-attribute pairs generated from the VLM are treated as positive examples, while non-matching pairs act as negatives.</p><p>The training objective is bidirectional contrastive loss. Matching product-attribute pairs are pulled closer together in embedding space and mismatched pairs are pushed apart.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!85Zh!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc780ac8e-7491-417e-a2fa-712ba101b5e0_720x57.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!85Zh!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc780ac8e-7491-417e-a2fa-712ba101b5e0_720x57.webp 424w, https://substackcdn.com/image/fetch/$s_!85Zh!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc780ac8e-7491-417e-a2fa-712ba101b5e0_720x57.webp 848w, https://substackcdn.com/image/fetch/$s_!85Zh!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc780ac8e-7491-417e-a2fa-712ba101b5e0_720x57.webp 1272w, https://substackcdn.com/image/fetch/$s_!85Zh!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc780ac8e-7491-417e-a2fa-712ba101b5e0_720x57.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!85Zh!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc780ac8e-7491-417e-a2fa-712ba101b5e0_720x57.webp" width="720" height="57" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c780ac8e-7491-417e-a2fa-712ba101b5e0_720x57.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:57,&quot;width&quot;:720,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:3002,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/webp&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/191742085?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc780ac8e-7491-417e-a2fa-712ba101b5e0_720x57.webp&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!85Zh!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc780ac8e-7491-417e-a2fa-712ba101b5e0_720x57.webp 424w, https://substackcdn.com/image/fetch/$s_!85Zh!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc780ac8e-7491-417e-a2fa-712ba101b5e0_720x57.webp 848w, https://substackcdn.com/image/fetch/$s_!85Zh!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc780ac8e-7491-417e-a2fa-712ba101b5e0_720x57.webp 1272w, https://substackcdn.com/image/fetch/$s_!85Zh!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc780ac8e-7491-417e-a2fa-712ba101b5e0_720x57.webp 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>At inference time, the system embeds all products and all attributes once. An attribute is assigned to a product when the similarity between the two embeddings crosses a calibrated threshold. That threshold can also be adjusted with frequency-based weighting to counter long-tail smoothing.</p><p>This architecture solves a few problems at once.</p><p>It is cheaper than running the VLM everywhere. It creates a denser and more consistent attribute graph. And it reduces the explosion of distinct attributes that came from raw model outputs.</p><p>Interestingly, they note that the resulting system ends up with fewer distinct attributes overall, while increasing the average number of attributes per product. That is a useful tradeoff. Fewer labels does not mean less information here. It means the information is compressed into a vocabulary that is actually reusable.</p><p>That dense attribute graph becomes the foundation for everything that follows.</p><div><hr></div><h4>3. Building shopping feeds at catalog scale</h4><p>Once products have been tagged with curated attributes, Pinterest needs to construct feeds at catalog scale. That means dealing with millions of Pins and millions of possible topics.</p><p>That is where Ray and large-scale batch inference come in.</p><p>Attribute inference runs as a Ray streaming job with three main stages.</p><p>The first stage handles data loading and preprocessing. Product images and metadata are downloaded, tokenized and serialized into PyArrow tables, then sharded across a CPU cluster.</p><p>The second stage handles ML inference. Ray schedules preprocessed batches onto a GPU pool where the CLIP-based classifier performs forward passes and produces attribute scores.</p><p>Because execution is streamed, data loading, preprocessing and inference can overlap instead of waiting on one another. The use of heterogeneous clusters also allows CPU-heavy preprocessing and GPU-heavy inference to scale independently. The training and inference pipeline for the classifier completes in roughly 12 hours on 8 NVIDIA A100 GPUs at an estimated cost of about $500 per training run. For a system operating at this scale, that is pretty reasonable.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!FW7T!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F974bb3d9-139b-4d04-8658-bf9c516f972c_720x584.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!FW7T!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F974bb3d9-139b-4d04-8658-bf9c516f972c_720x584.webp 424w, https://substackcdn.com/image/fetch/$s_!FW7T!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F974bb3d9-139b-4d04-8658-bf9c516f972c_720x584.webp 848w, https://substackcdn.com/image/fetch/$s_!FW7T!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F974bb3d9-139b-4d04-8658-bf9c516f972c_720x584.webp 1272w, https://substackcdn.com/image/fetch/$s_!FW7T!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F974bb3d9-139b-4d04-8658-bf9c516f972c_720x584.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!FW7T!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F974bb3d9-139b-4d04-8658-bf9c516f972c_720x584.webp" width="720" height="584" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/974bb3d9-139b-4d04-8658-bf9c516f972c_720x584.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:584,&quot;width&quot;:720,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:20376,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/webp&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/191742085?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F974bb3d9-139b-4d04-8658-bf9c516f972c_720x584.webp&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!FW7T!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F974bb3d9-139b-4d04-8658-bf9c516f972c_720x584.webp 424w, https://substackcdn.com/image/fetch/$s_!FW7T!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F974bb3d9-139b-4d04-8658-bf9c516f972c_720x584.webp 848w, https://substackcdn.com/image/fetch/$s_!FW7T!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F974bb3d9-139b-4d04-8658-bf9c516f972c_720x584.webp 1272w, https://substackcdn.com/image/fetch/$s_!FW7T!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F974bb3d9-139b-4d04-8658-bf9c516f972c_720x584.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Ray Data Diagram (Source: Pinterest)</figcaption></figure></div><p>Once attribute assignments are available, feed construction moves to a matching layer built around ANN-style and strict attribute matching via Spark.</p><p>Each shopping topic is defined as an attribute tuple. For example: category equals dress, color equals yellow, season equals summer, occasion equals party.</p><p>Spark is then used to compute relevance scores between topics and products. The scoring aggregates shared attributes between topic and product, weighted by attribute-level confidence.</p><p>To keep this tractable, candidate joins are pruned through attribute-based partitioning and minimum-overlap prefilters.</p><p>This part is easy to underestimate. Generating a topic vocabulary is one thing. Turning that vocabulary into feeds across millions of products is where many systems fall apart. Pinterest&#8217;s design works because the modeling and infrastructure choices line up. The attribute space is structured enough to match efficiently and the batch inference stack is built to operate at the right scale.</p><div><hr></div><h4>4. Measuring whether the system actually works</h4><p>Pinterest evaluates the system at both the attribute level and the collection level.</p><p>For attribute quality, the CLIP-based model is tested on <a href="https://huggingface.co/datasets/Marqo/fashion200k">Fashion200K</a>, a standard benchmark for fashion attribute prediction. The reported result is 99.7 percent Recall@10, which substantially exceeds prior methods that sit in the 50 percent range on the same metric. That suggests the model has learned a strong mapping between product imagery and fashion attributes.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!UpvK!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F44d02438-bf9e-4248-af85-21d025bf4e4c_720x257.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!UpvK!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F44d02438-bf9e-4248-af85-21d025bf4e4c_720x257.webp 424w, https://substackcdn.com/image/fetch/$s_!UpvK!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F44d02438-bf9e-4248-af85-21d025bf4e4c_720x257.webp 848w, https://substackcdn.com/image/fetch/$s_!UpvK!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F44d02438-bf9e-4248-af85-21d025bf4e4c_720x257.webp 1272w, https://substackcdn.com/image/fetch/$s_!UpvK!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F44d02438-bf9e-4248-af85-21d025bf4e4c_720x257.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!UpvK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F44d02438-bf9e-4248-af85-21d025bf4e4c_720x257.webp" width="720" height="257" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/44d02438-bf9e-4248-af85-21d025bf4e4c_720x257.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:257,&quot;width&quot;:720,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:7816,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/webp&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/191742085?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F44d02438-bf9e-4248-af85-21d025bf4e4c_720x257.webp&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!UpvK!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F44d02438-bf9e-4248-af85-21d025bf4e4c_720x257.webp 424w, https://substackcdn.com/image/fetch/$s_!UpvK!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F44d02438-bf9e-4248-af85-21d025bf4e4c_720x257.webp 848w, https://substackcdn.com/image/fetch/$s_!UpvK!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F44d02438-bf9e-4248-af85-21d025bf4e4c_720x257.webp 1272w, https://substackcdn.com/image/fetch/$s_!UpvK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F44d02438-bf9e-4248-af85-21d025bf4e4c_720x257.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Recall @ 10 on the Fashion200k dataset (Source: Pinterest)</figcaption></figure></div><p>Pinterest also looks at the distribution of generated attributes. Compared with raw GPT-4-V outputs, the CLIP-based system produces a much more usable attribute space. There are fewer extremely rare attributes and more attributes that consistently apply across many products.</p><p>That is important because quality here is not only about benchmark performance. It is also about whether the resulting label space can support downstream shopping tasks without collapsing into noise.</p><p>For collection quality, human raters compare feeds produced by the content-first pipeline against a traditional search-log-derived baseline. The evaluation uses Precision@10, measured as the fraction of the top ten products in a collection that match the collection&#8217;s title attributes.</p><p>Across attribute families including color, main material, fit and stretch, shape, style, season and festival, occasion and brand, the new system improves average Precision@10 from 0.84 to 0.96. Several categories, including style and brand, reach 1.00.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Ozuj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3d13ebc2-bdea-4a38-adf0-12ef963c8054_720x408.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Ozuj!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3d13ebc2-bdea-4a38-adf0-12ef963c8054_720x408.webp 424w, https://substackcdn.com/image/fetch/$s_!Ozuj!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3d13ebc2-bdea-4a38-adf0-12ef963c8054_720x408.webp 848w, https://substackcdn.com/image/fetch/$s_!Ozuj!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3d13ebc2-bdea-4a38-adf0-12ef963c8054_720x408.webp 1272w, https://substackcdn.com/image/fetch/$s_!Ozuj!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3d13ebc2-bdea-4a38-adf0-12ef963c8054_720x408.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Ozuj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3d13ebc2-bdea-4a38-adf0-12ef963c8054_720x408.webp" width="720" height="408" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3d13ebc2-bdea-4a38-adf0-12ef963c8054_720x408.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:408,&quot;width&quot;:720,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:16048,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/webp&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/191742085?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3d13ebc2-bdea-4a38-adf0-12ef963c8054_720x408.webp&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Ozuj!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3d13ebc2-bdea-4a38-adf0-12ef963c8054_720x408.webp 424w, https://substackcdn.com/image/fetch/$s_!Ozuj!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3d13ebc2-bdea-4a38-adf0-12ef963c8054_720x408.webp 848w, https://substackcdn.com/image/fetch/$s_!Ozuj!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3d13ebc2-bdea-4a38-adf0-12ef963c8054_720x408.webp 1272w, https://substackcdn.com/image/fetch/$s_!Ozuj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3d13ebc2-bdea-4a38-adf0-12ef963c8054_720x408.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Precision@10 Comparison for Collection Quality (Source: Pinterest)</figcaption></figure></div><p>That is a meaningful jump. It suggests the collections are not just broader in coverage, but also more precise in the items they surface.</p><p>In production, the pipeline generates <strong>4.2 million</strong> shopping landing pages. That represents a <strong>fourfold increase</strong> in unique topics relative to the previous search-log-based approach and leads to a 35% improvement in search performance.</p><p>That combination is what makes PinLanding interesting because it is not only producing cleaner labels in an offline benchmark but also expanding coverage and improving relevance in production.</p><div><hr></div><h4>Summary</h4><p>PinLanding is a strong example of what multimodal AI looks like when it is pushed into production with actual constraints.</p><p>The system does not rely on a single model to do everything. Instead, Pinterest breaks the problem into stages: measure intent, generate structured attributes, compress them into a usable vocabulary, scale assignment with a dual encoder and then construct collections with distributed matching.</p><p>That decomposition is probably the most useful lesson here.</p><p>There are plenty of teams experimenting with multimodal models for product understanding. The harder part is turning those outputs into something dense, normalized and operationally stable enough to support search and discovery at web scale. Pinterest&#8217;s team shows a fairly clear path for doing that.</p><p>The broader shift is also worth noting. Search-log-derived collections are not going away, but they are no longer enough on their own. As people search in more conversational, compositional and trend-driven ways, systems need to infer collections from the content itself while still staying anchored to user intent.</p><p>That is the balance PinLanding is aiming for.</p><p>And for platforms sitting on billions of products, that balance is probably becoming less of a nice-to-have and more of a survival requirement.</p><div><hr></div><h3>The full scoop</h3><p>To learn more about this, check <a href="https://medium.com/pinterest-engineering/pinlanding-turn-billions-of-products-into-instant-shopping-collections-with-multimodal-ai-3489320294e9">Pinterest's Engineering Blog</a> post on this topic</p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.datatinkerer.io/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">If you liked this post and don&#8217;t want to miss the next one, subscribe to Data Tinkerer!</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>If you are already subscribed and enjoyed the article, please give it a like and/or share it others, really appreciate it &#128591;</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.datatinkerer.io/p/how-pinterest-used-multimodal-ai-to-help-shoppers?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.datatinkerer.io/p/how-pinterest-used-multimodal-ai-to-help-shoppers?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><div><hr></div><h3>Keep learning</h3><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;9446aeb6-d506-4011-b0ce-4c2d6c4f4464&quot;,&quot;caption&quot;:&quot;Shopify frames taxonomy at scale as three problems: volume, expertise, and consistency.<br /><br />When you&#8217;re operating a taxonomy with 10,000+ categories, manual review will not work. and by the time you react, merchants are already listing products that don&#8217;t fit.<br /><br />This piece breaks down how Shopify moved from reactive manual updates to a multi-agent system that scans taxonomy branches in parallel, proposes new categories/attributes from merchant data, detects duplicates via equivalence relationships and runs automated QA through domain-specific judges.&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;lg&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;How Shopify Scales Taxonomy Evolution Across 10,000+ Categories With Multi-Agent AI&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:291590442,&quot;name&quot;:&quot;Data Tinkerer&quot;,&quot;bio&quot;:&quot;Head of analytics sharing deep dives and learnings about AI and all things data (science, engineering, analysis)&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/83d3bbe0-5fb8-4f8d-9b74-036abbd6fec9_500x500.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2026-02-26T04:00:23.437Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/$s_!tUAj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F10490256-5538-403a-bc50-b153a36a9b6f_1536x1024.webp&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.datatinkerer.io/p/how-shopify-scales-taxonomy-evolution-across-10000-categories-with-ai-agents&quot;,&quot;section_name&quot;:&quot;Data Science&quot;,&quot;video_upload_id&quot;:null,&quot;id&quot;:188769392,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:14,&quot;comment_count&quot;:2,&quot;publication_id&quot;:3422740,&quot;publication_name&quot;:&quot;Data Tinkerer&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!JEdj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6bea5ccd-f356-4154-a1db-16268300510e_500x500.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;e78745a9-87b3-4842-91b1-c47c28b3e197&quot;,&quot;caption&quot;:&quot;Production ML isn&#8217;t only about clever architectures. It&#8217;s about judgment, trade-offs and systems that hold up when data is messy.<br /><br />I sat down with Ahsaas Bajaj , Senior ML Engineer at Instacart, to talk about how they handle product substitutions at scale, what actually moves business metrics and what changes when you move into a senior ML role.&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;lg&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;How to Build a Recommendation System at Scale: Insights from Instacart&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:291590442,&quot;name&quot;:&quot;Data Tinkerer&quot;,&quot;bio&quot;:&quot;Head of analytics sharing deep dives and learnings about AI and all things data (science, engineering, analysis)&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/83d3bbe0-5fb8-4f8d-9b74-036abbd6fec9_500x500.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null},{&quot;id&quot;:175610076,&quot;name&quot;:&quot;Ahsaas Bajaj&quot;,&quot;bio&quot;:&quot;Senior Machine Learning Engineer II at Instacart&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/34dac958-9c70-4f48-89ed-6c2e0d6f197e_899x901.png&quot;,&quot;is_guest&quot;:true,&quot;bestseller_tier&quot;:null,&quot;primaryPublicationSubscribeUrl&quot;:&quot;https://bajajahsaas.substack.com/subscribe?&quot;,&quot;primaryPublicationUrl&quot;:&quot;https://bajajahsaas.substack.com&quot;,&quot;primaryPublicationName&quot;:&quot;Ahsaas Bajaj&quot;,&quot;primaryPublicationId&quot;:7296320}],&quot;post_date&quot;:&quot;2026-01-29T03:30:24.563Z&quot;,&quot;cover_image&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3e6c5924-ed6c-4998-8e4a-8f88d9102c8b_844x473.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.datatinkerer.io/p/how-to-build-a-recommendation-system-at-scale&quot;,&quot;section_name&quot;:&quot;Data Science&quot;,&quot;video_upload_id&quot;:null,&quot;id&quot;:181648418,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:12,&quot;comment_count&quot;:2,&quot;publication_id&quot;:3422740,&quot;publication_name&quot;:&quot;Data Tinkerer&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!JEdj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6bea5ccd-f356-4154-a1db-16268300510e_500x500.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div>]]></content:encoded></item><item><title><![CDATA[What the Data Crowd Was Reading in March 2026]]></title><description><![CDATA[Tools, techniques and deep dives worth reading that I came across in March 2026.]]></description><link>https://www.datatinkerer.io/p/what-the-data-crowd-was-reading-in-march-2026</link><guid isPermaLink="false">https://www.datatinkerer.io/p/what-the-data-crowd-was-reading-in-march-2026</guid><dc:creator><![CDATA[Data Tinkerer]]></dc:creator><pubDate>Thu, 02 Apr 2026 03:30:16 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/e2a58f73-a2c0-4074-8db5-67153f3c5d45_500x500.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Fellow Data Tinkerers</p><p>It&#8217;s time for another round-up on all things data and AI!</p><p>But before that, I wanted to share with you what you could unlock if you share Data Tinkerer with just <strong>1 more person</strong>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!e2U5!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F682cbb58-0721-4a68-871d-0258d12f7e6d_800x402.gif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!e2U5!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F682cbb58-0721-4a68-871d-0258d12f7e6d_800x402.gif 424w, https://substackcdn.com/image/fetch/$s_!e2U5!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F682cbb58-0721-4a68-871d-0258d12f7e6d_800x402.gif 848w, https://substackcdn.com/image/fetch/$s_!e2U5!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F682cbb58-0721-4a68-871d-0258d12f7e6d_800x402.gif 1272w, https://substackcdn.com/image/fetch/$s_!e2U5!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F682cbb58-0721-4a68-871d-0258d12f7e6d_800x402.gif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!e2U5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F682cbb58-0721-4a68-871d-0258d12f7e6d_800x402.gif" width="800" height="402" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/682cbb58-0721-4a68-871d-0258d12f7e6d_800x402.gif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:402,&quot;width&quot;:800,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:3369150,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/gif&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/192483521?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F682cbb58-0721-4a68-871d-0258d12f7e6d_800x402.gif&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!e2U5!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F682cbb58-0721-4a68-871d-0258d12f7e6d_800x402.gif 424w, https://substackcdn.com/image/fetch/$s_!e2U5!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F682cbb58-0721-4a68-871d-0258d12f7e6d_800x402.gif 848w, https://substackcdn.com/image/fetch/$s_!e2U5!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F682cbb58-0721-4a68-871d-0258d12f7e6d_800x402.gif 1272w, https://substackcdn.com/image/fetch/$s_!e2U5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F682cbb58-0721-4a68-871d-0258d12f7e6d_800x402.gif 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>There are 100+ resources to learn all things data (science, engineering, analysis). It includes videos, courses, projects and can be filtered by tech stack (Python, SQL, Spark and etc), skill level (Beginner, Intermediate and so on) provider name or free/paid. So if you know other people who like staying up to date on all things data, please share Data Tinkerer with them!</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.datatinkerer.io/leaderboard?&amp;utm_source=post&quot;,&quot;text&quot;:&quot;Refer a friend&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.datatinkerer.io/leaderboard?&amp;utm_source=post"><span>Refer a friend</span></a></p><p>Without further ado, let&#8217;s get to the round up for March!</p><div><hr></div><h3>Data science &amp; AI</h3><ul><li><p><strong><a href="https://www.newsletter.swirlai.com/p/state-of-context-engineering-in-2026">State of Context Engineering in 2026</a> (12 minute read)<br></strong><span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Aurimas Grici&#363;nas&quot;,&quot;id&quot;:14122259,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F746f0396-fc7f-4690-b75c-ef482a8cb1c7_3684x3683.jpeg&quot;,&quot;uuid&quot;:&quot;d03a3318-95a8-4019-b842-e5f2c246d90e&quot;}" data-component-name="MentionToDOM"></span> argues that context engineering is evolving from prompt tinkering into a structured discipline where managing memory, retrieval and state becomes the core challenge of building reliable AI systems.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!fhut!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4ae1b9f3-19f6-4e10-a7c3-ad53b06c1418_1456x827.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!fhut!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4ae1b9f3-19f6-4e10-a7c3-ad53b06c1418_1456x827.webp 424w, https://substackcdn.com/image/fetch/$s_!fhut!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4ae1b9f3-19f6-4e10-a7c3-ad53b06c1418_1456x827.webp 848w, https://substackcdn.com/image/fetch/$s_!fhut!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4ae1b9f3-19f6-4e10-a7c3-ad53b06c1418_1456x827.webp 1272w, https://substackcdn.com/image/fetch/$s_!fhut!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4ae1b9f3-19f6-4e10-a7c3-ad53b06c1418_1456x827.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!fhut!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4ae1b9f3-19f6-4e10-a7c3-ad53b06c1418_1456x827.webp" width="1456" height="827" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4ae1b9f3-19f6-4e10-a7c3-ad53b06c1418_1456x827.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:827,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:57642,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/webp&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/192483521?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4ae1b9f3-19f6-4e10-a7c3-ad53b06c1418_1456x827.webp&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!fhut!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4ae1b9f3-19f6-4e10-a7c3-ad53b06c1418_1456x827.webp 424w, https://substackcdn.com/image/fetch/$s_!fhut!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4ae1b9f3-19f6-4e10-a7c3-ad53b06c1418_1456x827.webp 848w, https://substackcdn.com/image/fetch/$s_!fhut!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4ae1b9f3-19f6-4e10-a7c3-ad53b06c1418_1456x827.webp 1272w, https://substackcdn.com/image/fetch/$s_!fhut!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4ae1b9f3-19f6-4e10-a7c3-ad53b06c1418_1456x827.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div></li><li><p><strong><a href="https://nchagnet.pages.dev/blog/bayesian-statistics-for-confused-data-scientists">Bayesian statistics for confused data scientists</a> (15 minute read)<br></strong>Nicolas Chagnet explains Bayesian statistics in plain terms by showing that its real strength is not mathematical elegance but giving data scientists a cleaner way to reason about uncertainty and sparse real-world data.</p></li><li><p><strong><a href="https://www.decodingai.com/p/agentic-ai-engineering-guide-6-mistakes">Agentic AI Engineering Guide</a> (10 minute read)<br></strong><span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Paul Iusztin&quot;,&quot;id&quot;:110559689,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0714d360-396c-4b41-a676-1b58dc1dc5f3_1470x1470.jpeg&quot;,&quot;uuid&quot;:&quot;a40f7d2d-1ffa-4921-80fe-eb7c7912ff68&quot;}" data-component-name="MentionToDOM"></span> argues that most agentic AI systems fail not because the model is weak, but because teams make avoidable engineering mistakes around context, architecture, planning and evaluation.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!FSYK!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F00301e82-2fba-4a01-b821-9045b7938cfd_1096x549.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!FSYK!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F00301e82-2fba-4a01-b821-9045b7938cfd_1096x549.webp 424w, https://substackcdn.com/image/fetch/$s_!FSYK!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F00301e82-2fba-4a01-b821-9045b7938cfd_1096x549.webp 848w, https://substackcdn.com/image/fetch/$s_!FSYK!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F00301e82-2fba-4a01-b821-9045b7938cfd_1096x549.webp 1272w, https://substackcdn.com/image/fetch/$s_!FSYK!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F00301e82-2fba-4a01-b821-9045b7938cfd_1096x549.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!FSYK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F00301e82-2fba-4a01-b821-9045b7938cfd_1096x549.webp" width="1096" height="549" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/00301e82-2fba-4a01-b821-9045b7938cfd_1096x549.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:549,&quot;width&quot;:1096,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:29226,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/webp&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/192483521?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F00301e82-2fba-4a01-b821-9045b7938cfd_1096x549.webp&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!FSYK!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F00301e82-2fba-4a01-b821-9045b7938cfd_1096x549.webp 424w, https://substackcdn.com/image/fetch/$s_!FSYK!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F00301e82-2fba-4a01-b821-9045b7938cfd_1096x549.webp 848w, https://substackcdn.com/image/fetch/$s_!FSYK!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F00301e82-2fba-4a01-b821-9045b7938cfd_1096x549.webp 1272w, https://substackcdn.com/image/fetch/$s_!FSYK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F00301e82-2fba-4a01-b821-9045b7938cfd_1096x549.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div></li><li><p><strong><a href="https://jessicatalisman.substack.com/p/the-context-problem">The Context Problem</a> (29 minute read)<br></strong><span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Jessica Talisman&quot;,&quot;id&quot;:24176542,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!zEsI!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18f1fe4e-779e-4a27-be92-71fac460ee01_935x935.jpeg&quot;,&quot;uuid&quot;:&quot;0da07ecd-fbbe-4d3e-a0c8-500b6799c802&quot;}" data-component-name="MentionToDOM"></span> argues that the AI industry has turned context into a token-priced billing unit even though context should really mean the relational structure that makes information coherent and useful.</p></li><li><p><strong><a href="https://magazine.sebastianraschka.com/p/visual-attention-variants">A Visual Guide to Attention Variants in Modern LLMs</a> (26 minute read)</strong><br><span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Sebastian Raschka, PhD&quot;,&quot;id&quot;:27393275,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F61f4c017-506f-4e9b-a24f-76340dad0309_800x800.jpeg&quot;,&quot;uuid&quot;:&quot;a0f35f83-e9bb-408b-a811-d935e0af6c10&quot;}" data-component-name="MentionToDOM"></span> shows how modern LLM attention has evolved from standard multi-head attention into a growing set of variants each designed to balance quality, memory use and inference efficiency.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!s-uv!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98bbb1b3-72dd-42c0-967c-f20689e839d9_1494x974.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!s-uv!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98bbb1b3-72dd-42c0-967c-f20689e839d9_1494x974.jpeg 424w, https://substackcdn.com/image/fetch/$s_!s-uv!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98bbb1b3-72dd-42c0-967c-f20689e839d9_1494x974.jpeg 848w, https://substackcdn.com/image/fetch/$s_!s-uv!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98bbb1b3-72dd-42c0-967c-f20689e839d9_1494x974.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!s-uv!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98bbb1b3-72dd-42c0-967c-f20689e839d9_1494x974.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!s-uv!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98bbb1b3-72dd-42c0-967c-f20689e839d9_1494x974.jpeg" width="1456" height="949" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/98bbb1b3-72dd-42c0-967c-f20689e839d9_1494x974.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:949,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:172464,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/192483521?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98bbb1b3-72dd-42c0-967c-f20689e839d9_1494x974.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!s-uv!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98bbb1b3-72dd-42c0-967c-f20689e839d9_1494x974.jpeg 424w, https://substackcdn.com/image/fetch/$s_!s-uv!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98bbb1b3-72dd-42c0-967c-f20689e839d9_1494x974.jpeg 848w, https://substackcdn.com/image/fetch/$s_!s-uv!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98bbb1b3-72dd-42c0-967c-f20689e839d9_1494x974.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!s-uv!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98bbb1b3-72dd-42c0-967c-f20689e839d9_1494x974.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div></li><li><p><strong><a href="https://amandeepsp.github.io/blog/high-dims">The Boon of Dimensionality</a> (6 minute read)</strong><br>Amandeep Singh shows that high-dimensional space creates the geometric conditions that make embeddings, random projections and feature separation work in modern machine learning.</p></li><li><p><strong><a href="https://www.interconnects.ai/p/gpt-54-is-a-big-step-for-codex">GPT 5.4 is a big step for Codex</a> (9 minute read)<br></strong><span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Nathan Lambert&quot;,&quot;id&quot;:10472909,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!RihO!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8fedcdfb-e137-4f6a-9089-a46add6c6242_500x500.jpeg&quot;,&quot;uuid&quot;:&quot;5138e0c6-b974-4dfe-9416-12c648293080&quot;}" data-component-name="MentionToDOM"></span> writes that GPT 5.4 feels like a real step forward for Codex, with gains in usability, speed, context handling and agent reliability that matter more in practice than benchmark scores alone.</p></li><li><p><strong><a href="https://www.datatinkerer.io/p/how-shopify-scales-taxonomy-evolution-across-10000-categories-with-ai-agents">How Shopify Scales Taxonomy Evolution Across 10,000+ Categories With Multi-Agent AI</a> (14 minute read)<br></strong>This piece breaks down how Shopify moved from reactive manual updates to a multi-agent system that scans taxonomy branches in parallel, proposes new categories/attributes from merchant data, detects duplicates and runs automated QA through domain-specific judges.</p></li></ul><div><hr></div><h3>Data engineering</h3><ul><li><p><strong><a href="https://seattledataguy.substack.com/p/layer-by-layer-we-built-data-systems">Layer by Layer, We Built Data Systems No One Understands</a> (9 minute read)<br></strong><span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;SeattleDataGuy&quot;,&quot;id&quot;:4963622,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F1ec905aa-9a7b-4f21-b0ff-fec92e8916d1_512x512.jpeg&quot;,&quot;uuid&quot;:&quot;776d9623-2ff5-4cb6-875f-8f43c8ab263f&quot;}" data-component-name="MentionToDOM"></span> writes that modern data stacks keep piling on layers in the name of simplicity but the result is often more sprawl, more cost and systems that are harder to understand or tie back to business outcomes.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!8W02!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d9bc5fb-0da4-429b-a887-ee35e2cfc412_1024x768.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!8W02!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d9bc5fb-0da4-429b-a887-ee35e2cfc412_1024x768.webp 424w, https://substackcdn.com/image/fetch/$s_!8W02!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d9bc5fb-0da4-429b-a887-ee35e2cfc412_1024x768.webp 848w, https://substackcdn.com/image/fetch/$s_!8W02!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d9bc5fb-0da4-429b-a887-ee35e2cfc412_1024x768.webp 1272w, https://substackcdn.com/image/fetch/$s_!8W02!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d9bc5fb-0da4-429b-a887-ee35e2cfc412_1024x768.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!8W02!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d9bc5fb-0da4-429b-a887-ee35e2cfc412_1024x768.webp" width="1024" height="768" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4d9bc5fb-0da4-429b-a887-ee35e2cfc412_1024x768.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:768,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:47330,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/webp&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/192483521?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d9bc5fb-0da4-429b-a887-ee35e2cfc412_1024x768.webp&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!8W02!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d9bc5fb-0da4-429b-a887-ee35e2cfc412_1024x768.webp 424w, https://substackcdn.com/image/fetch/$s_!8W02!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d9bc5fb-0da4-429b-a887-ee35e2cfc412_1024x768.webp 848w, https://substackcdn.com/image/fetch/$s_!8W02!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d9bc5fb-0da4-429b-a887-ee35e2cfc412_1024x768.webp 1272w, https://substackcdn.com/image/fetch/$s_!8W02!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d9bc5fb-0da4-429b-a887-ee35e2cfc412_1024x768.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div></li><li><p><strong><a href="https://www.dataengineeringweekly.com/p/etl-is-dead">ETL is Dead</a> (12 minute read)<br></strong><span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Ananth Packkildurai&quot;,&quot;id&quot;:3520227,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!mRE-!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F4f38fa68-8a30-4357-a48e-6833efe28c0f_989x989.jpeg&quot;,&quot;uuid&quot;:&quot;96474d3c-2a11-4127-a971-49633612ee83&quot;}" data-component-name="MentionToDOM"></span> argues that ETL is not disappearing in volume but it is fading as the core identity of data engineering as AI shifts the real work toward context and semantics.</p></li><li><p><strong><a href="https://pipeline2insights.substack.com/cp/189784942">The Data Engineer&#8217;s GitHub Portfolio (2026 Edition)</a> (10 minute read)<br></strong><span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Yordan Ivanov&quot;,&quot;id&quot;:40945395,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!Ma-p!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F76f52904-5428-4d97-82a5-3faa722b8d46_2234x1253.jpeg&quot;,&quot;uuid&quot;:&quot;3ba92759-3c5c-463f-afd4-48c62129a1c6&quot;}" data-component-name="MentionToDOM"></span> writes that a strong data engineering GitHub portfolio should prove technical taste, system thinking and real-world problem solving, not just show a pile of tutorial projects.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!wkgQ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F41ea9aaa-bfb2-4818-b107-f628731493fd_700x609.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!wkgQ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F41ea9aaa-bfb2-4818-b107-f628731493fd_700x609.webp 424w, https://substackcdn.com/image/fetch/$s_!wkgQ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F41ea9aaa-bfb2-4818-b107-f628731493fd_700x609.webp 848w, https://substackcdn.com/image/fetch/$s_!wkgQ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F41ea9aaa-bfb2-4818-b107-f628731493fd_700x609.webp 1272w, https://substackcdn.com/image/fetch/$s_!wkgQ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F41ea9aaa-bfb2-4818-b107-f628731493fd_700x609.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!wkgQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F41ea9aaa-bfb2-4818-b107-f628731493fd_700x609.webp" width="700" height="609" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/41ea9aaa-bfb2-4818-b107-f628731493fd_700x609.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:609,&quot;width&quot;:700,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:13952,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/webp&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/192483521?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F41ea9aaa-bfb2-4818-b107-f628731493fd_700x609.webp&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!wkgQ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F41ea9aaa-bfb2-4818-b107-f628731493fd_700x609.webp 424w, https://substackcdn.com/image/fetch/$s_!wkgQ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F41ea9aaa-bfb2-4818-b107-f628731493fd_700x609.webp 848w, https://substackcdn.com/image/fetch/$s_!wkgQ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F41ea9aaa-bfb2-4818-b107-f628731493fd_700x609.webp 1272w, https://substackcdn.com/image/fetch/$s_!wkgQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F41ea9aaa-bfb2-4818-b107-f628731493fd_700x609.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div></li><li><p><strong><a href="https://buildtolaunch.substack.com/cp/191744000">The Data Engineering Mindset Every AI Builder Needs</a> (14 minute read)</strong><br><span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Erfan Hesami&quot;,&quot;id&quot;:277538242,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!rcW2!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9e2692f-48e0-43a5-9f33-7eebb007bd6e_1641x1641.jpeg&quot;,&quot;uuid&quot;:&quot;a2ee5071-74dc-4515-829b-b04cb357605c&quot;}" data-component-name="MentionToDOM"></span> writes that most AI products do not break because of the model but because builders ignore the data foundations early on, especially data flow design, data quality and monitoring.</p></li><li><p><strong><a href="https://pthorpe92.dev/databasemaxxing/">The absolute beginners guide to databasemaxxing</a> (18 minute read)<br></strong>This article walks through database internals from a beginner&#8217;s perspective, showing how concepts like parsing, binding, scans and index seeks fit together under the hood.</p></li><li><p><strong><a href="https://ghostinthedata.info/posts/2026/2026-03-14-your-data-model-isnt-broken-part-1/">Your Data Model Isn&#8217;t Broken, Part I: Why Refactoring Beats Rebuilding</a> (12 minute read)<br></strong>Chris Hillman makes the case that most broken data models are really bundles of hard-won business knowledge and that careful refactoring is usually smarter than blowing everything up and starting again.</p></li><li><p><strong><a href="https://vutr.substack.com/p/clickhouse-real-time-insight-in-15">ClickHouse -&gt; Real-time insight in 15 minutes</a> (19 minute read)</strong><br><span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Phi Vu Trinh&quot;,&quot;id&quot;:167177248,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!UWAa!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4805f673-db97-4f7c-85c4-44b345a8de80_256x256.png&quot;,&quot;uuid&quot;:&quot;3babf9ab-7ba3-449f-b67e-f70c16a65ce8&quot;}" data-component-name="MentionToDOM"></span> shows that ClickHouse is built for real-time analytics but getting that performance in production usually means handling enough operational complexity that makes platforms like Tinybird appealing.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!QNqF!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe71616cf-67b4-4591-ad4e-90961030734e_1456x1040.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!QNqF!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe71616cf-67b4-4591-ad4e-90961030734e_1456x1040.webp 424w, https://substackcdn.com/image/fetch/$s_!QNqF!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe71616cf-67b4-4591-ad4e-90961030734e_1456x1040.webp 848w, https://substackcdn.com/image/fetch/$s_!QNqF!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe71616cf-67b4-4591-ad4e-90961030734e_1456x1040.webp 1272w, https://substackcdn.com/image/fetch/$s_!QNqF!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe71616cf-67b4-4591-ad4e-90961030734e_1456x1040.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!QNqF!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe71616cf-67b4-4591-ad4e-90961030734e_1456x1040.webp" width="1456" height="1040" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e71616cf-67b4-4591-ad4e-90961030734e_1456x1040.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1040,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:38334,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/webp&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/192483521?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe71616cf-67b4-4591-ad4e-90961030734e_1456x1040.webp&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!QNqF!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe71616cf-67b4-4591-ad4e-90961030734e_1456x1040.webp 424w, https://substackcdn.com/image/fetch/$s_!QNqF!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe71616cf-67b4-4591-ad4e-90961030734e_1456x1040.webp 848w, https://substackcdn.com/image/fetch/$s_!QNqF!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe71616cf-67b4-4591-ad4e-90961030734e_1456x1040.webp 1272w, https://substackcdn.com/image/fetch/$s_!QNqF!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe71616cf-67b4-4591-ad4e-90961030734e_1456x1040.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div></li><li><p><strong><a href="https://www.datatinkerer.io/p/how-notion-scaled-ai-q-and-a-to-millions-of-workspaces">How Notion Scaled AI Q&amp;A to Millions of Workspaces</a> (14 minute read)<br></strong>This article walks through how Notion scaled AI Q&amp;A to millions of workspaces while increasing onboarding throughput 600x and cutting costs by up to 90%.</p></li></ul><div><hr></div><h3><strong>Other interesting reads</strong></h3><ul><li><p><strong><a href="https://thedataecosystem.substack.com/p/issue-53-business-models-and-data">Relevance of Business Models for Data</a> (10 minute read)<br></strong><span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Dylan Anderson&quot;,&quot;id&quot;:14172622,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F128526c2-c66d-497b-ab50-f95deb8ce0fc_800x800.jpeg&quot;,&quot;uuid&quot;:&quot;e33430e3-f195-49f5-9e0d-371a21acad16&quot;}" data-component-name="MentionToDOM"></span> makes the case that data teams should start with the business model first, because strategy, architecture, governance and analytics all work better when they are tied to how the company actually creates and captures value.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!f6qe!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e70f19f-300f-4a7d-b313-0dd3f47f89db_1351x652.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!f6qe!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e70f19f-300f-4a7d-b313-0dd3f47f89db_1351x652.webp 424w, https://substackcdn.com/image/fetch/$s_!f6qe!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e70f19f-300f-4a7d-b313-0dd3f47f89db_1351x652.webp 848w, https://substackcdn.com/image/fetch/$s_!f6qe!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e70f19f-300f-4a7d-b313-0dd3f47f89db_1351x652.webp 1272w, https://substackcdn.com/image/fetch/$s_!f6qe!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e70f19f-300f-4a7d-b313-0dd3f47f89db_1351x652.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!f6qe!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e70f19f-300f-4a7d-b313-0dd3f47f89db_1351x652.webp" width="1351" height="652" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8e70f19f-300f-4a7d-b313-0dd3f47f89db_1351x652.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:652,&quot;width&quot;:1351,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:134148,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/webp&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/192483521?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e70f19f-300f-4a7d-b313-0dd3f47f89db_1351x652.webp&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!f6qe!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e70f19f-300f-4a7d-b313-0dd3f47f89db_1351x652.webp 424w, https://substackcdn.com/image/fetch/$s_!f6qe!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e70f19f-300f-4a7d-b313-0dd3f47f89db_1351x652.webp 848w, https://substackcdn.com/image/fetch/$s_!f6qe!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e70f19f-300f-4a7d-b313-0dd3f47f89db_1351x652.webp 1272w, https://substackcdn.com/image/fetch/$s_!f6qe!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e70f19f-300f-4a7d-b313-0dd3f47f89db_1351x652.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div></li><li><p><strong><a href="https://hipster-data-show.ghost.io/its-about-the-strategy-stupid/">It&#8217;s about the strategy, stupid</a> (14 minute read)<br></strong>Timo Dechau makes the case that data work becomes far more useful when it starts with business strategy, not with dashboards, tracking audits or whatever tactic happens to be fashionable.</p></li><li><p><strong><a href="https://epochai.substack.com/p/what-do-frontier-ai-companies-job">What do frontier AI companies&#8217; job postings reveal about their plans?</a> (9 minute read)</strong><br>Interesting article suggesting that frontier labs&#8217; job postings reveal where the market is heading, with hiring patterns pointing to heavier go-to-market pushes, new product bets and different strategies for securing compute and data.</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!8UOo!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b5612d7-5fac-4abd-abc8-846505c60bda_1026x1283.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!8UOo!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b5612d7-5fac-4abd-abc8-846505c60bda_1026x1283.webp 424w, https://substackcdn.com/image/fetch/$s_!8UOo!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b5612d7-5fac-4abd-abc8-846505c60bda_1026x1283.webp 848w, https://substackcdn.com/image/fetch/$s_!8UOo!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b5612d7-5fac-4abd-abc8-846505c60bda_1026x1283.webp 1272w, https://substackcdn.com/image/fetch/$s_!8UOo!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b5612d7-5fac-4abd-abc8-846505c60bda_1026x1283.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!8UOo!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b5612d7-5fac-4abd-abc8-846505c60bda_1026x1283.webp" width="1026" height="1283" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7b5612d7-5fac-4abd-abc8-846505c60bda_1026x1283.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1283,&quot;width&quot;:1026,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:48938,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/webp&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/192483521?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b5612d7-5fac-4abd-abc8-846505c60bda_1026x1283.webp&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!8UOo!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b5612d7-5fac-4abd-abc8-846505c60bda_1026x1283.webp 424w, https://substackcdn.com/image/fetch/$s_!8UOo!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b5612d7-5fac-4abd-abc8-846505c60bda_1026x1283.webp 848w, https://substackcdn.com/image/fetch/$s_!8UOo!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b5612d7-5fac-4abd-abc8-846505c60bda_1026x1283.webp 1272w, https://substackcdn.com/image/fetch/$s_!8UOo!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b5612d7-5fac-4abd-abc8-846505c60bda_1026x1283.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div><hr></div><h3><strong>Quick favor - need your take</strong></h3><div class="poll-embed" data-attrs="{&quot;id&quot;:485645}" data-component-name="PollToDOM"></div><p><strong>Was there any standout article or topic from March I missed? Feel free to drop a comment or hit reply, even a quick line helps.</strong></p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.datatinkerer.io/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">If you liked this post and don&#8217;t want to miss the next one, subscribe to Data Tinkerer!</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>If you are already subscribed and enjoyed the article, please give it a like and/or share it others, really appreciate it &#128591;</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.datatinkerer.io/p/how-expedia-monitors-1000-ab-tests-in-real-time-with-flink-and-kafka?utm_source=substack&amp;utm_medium=email&amp;utm_content=share&amp;action=share&amp;token=eyJ1c2VyX2lkIjoyOTE1OTA0NDIsInBvc3RfaWQiOjE2OTA5NDI3MywiaWF0IjoxNzU0NTE5MDY3LCJleHAiOjE3NTcxMTEwNjcsImlzcyI6InB1Yi0zNDIyNzQwIiwic3ViIjoicG9zdC1yZWFjdGlvbiJ9.oZvHOJmFWdVqE7IbG0eqLLsohZgpmGBltKU1W08ZN4c&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://www.datatinkerer.io/p/how-expedia-monitors-1000-ab-tests-in-real-time-with-flink-and-kafka?utm_source=substack&amp;utm_medium=email&amp;utm_content=share&amp;action=share&amp;token=eyJ1c2VyX2lkIjoyOTE1OTA0NDIsInBvc3RfaWQiOjE2OTA5NDI3MywiaWF0IjoxNzU0NTE5MDY3LCJleHAiOjE3NTcxMTEwNjcsImlzcyI6InB1Yi0zNDIyNzQwIiwic3ViIjoicG9zdC1yZWFjdGlvbiJ9.oZvHOJmFWdVqE7IbG0eqLLsohZgpmGBltKU1W08ZN4c"><span>Share</span></a></p><div><hr></div><h3>Keep learning</h3><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;79d232de-3d15-4abe-b46e-997945fb8a86&quot;,&quot;caption&quot;:&quot;It&#8217;s time for another data/AI roundup and here are the highlights from February&#128071;<br /><br />&#119811;&#119834;&#119853;&#119834; &#119826;&#119836;&#119842;&#119838;&#119847;&#119836;&#119838; &amp;amp; &#119808;&#119816;<br />Inside OpenAI&#8217;s in-house data agent<br />A practical guide to which AI to use in the agentic era<br />Why judgment may not be uniquely human after all<br />How Codex is being used for serious research automation<br />Why semantic linking matters for giving data meaning<br /><br />&#119811;&#119834;&#119853;&#119834; &#119812;&#119847;&#119840;&#119842;&#119847;&#119838;&#119838;&#119851;&#119842;&#119847;&#119840;<br />A portable analytics stack built on DuckDB, DuckLake, dlt and SQLMesh<br />Why healing tables beat slow-motion backfill disasters<br />The case for MetadataOps engineers<br />How to use AI tools without losing data engineering fundamentals<br />Why 5-second BigQuery queries can still be expensive<br /><br />&#119811;&#119834;&#119853;&#119834; &#119808;&#119847;&#119834;&#119845;&#119858;&#119852;&#119842;&#119852; &amp;amp; &#119809;&#119816;<br />The state of machine learning competitions in 2025<br /><br />Plus: why AI is eating software&#8217;s TAM, what world models could unlock in robotics and why AI may intensify work instead of reducing it.&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;lg&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;What the Data Crowd Was Reading in February 2026&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:291590442,&quot;name&quot;:&quot;Data Tinkerer&quot;,&quot;bio&quot;:&quot;Head of analytics sharing deep dives and learnings about AI and all things data (science, engineering, analysis)&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/83d3bbe0-5fb8-4f8d-9b74-036abbd6fec9_500x500.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2026-03-12T04:00:47.277Z&quot;,&quot;cover_image&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b9b4d29f-5a76-44a8-902d-bc2983dbe445_500x500.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.datatinkerer.io/p/what-the-data-crowd-was-reading-in-february-2026&quot;,&quot;section_name&quot;:&quot;Data Roundup&quot;,&quot;video_upload_id&quot;:null,&quot;id&quot;:190247984,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:15,&quot;comment_count&quot;:8,&quot;publication_id&quot;:3422740,&quot;publication_name&quot;:&quot;Data Tinkerer&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!JEdj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6bea5ccd-f356-4154-a1db-16268300510e_500x500.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;1b7b5909-6ab4-4a88-98a1-d09e96554f4d&quot;,&quot;caption&quot;:&quot;It's time for another data/AI roundup and here are the highlights from January&#128071;<br /><br />&#119811;&#119834;&#119853;&#119834; &#119826;&#119836;&#119842;&#119838;&#119847;&#119836;&#119838; &amp;amp; &#119808;&#119816;<br />Why &#8216;use agents or be left behind&#8217; is mostly about practical automation<br />Piecewise regression for spotting regime shifts in time series<br />Why AI benchmarks are hitting a measurement wall<br />What the data actually says about the state of open models<br />How large-scale recommendation systems are built in the real world<br /><br />&#119811;&#119834;&#119853;&#119834; &#119812;&#119847;&#119840;&#119842;&#119847;&#119838;&#119838;&#119851;&#119842;&#119847;&#119840;<br />How Unity Catalog really works under the hood<br />Databricks Lakeflow vs Airflow in practice<br />End-to-end agentic data modeling with OpenMetadata<br />A candid look at the day-to-day reality of data engineering<br />How Uber cut data lake freshness from hours to minutes with Flink<br /><br />&#119811;&#119834;&#119853;&#119834; &#119808;&#119847;&#119834;&#119845;&#119858;&#119852;&#119842;&#119852; &amp;amp; &#119809;&#119816;<br />The best data visualization projects of 2025<br />Why storytelling matters more than chart tricks<br />Designing more accessible line charts<br />Practical rules for dashboard filter placement<br /><br />Plus: ontologies explained, hard lessons from building AI agents in finance and new data on who&#8217;s really buying AI compute.&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;lg&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;What the Data Crowd Was Reading in January 2026&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:291590442,&quot;name&quot;:&quot;Data Tinkerer&quot;,&quot;bio&quot;:&quot;Head of analytics sharing deep dives and learnings about AI and all things data (science, engineering, analysis)&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/83d3bbe0-5fb8-4f8d-9b74-036abbd6fec9_500x500.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2026-02-05T03:20:52.027Z&quot;,&quot;cover_image&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c8f4e23e-4e9e-4420-bbed-d16a4d242c7d_500x500.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.datatinkerer.io/p/what-the-data-crowd-was-reading-in-january-2026&quot;,&quot;section_name&quot;:&quot;Data Roundup&quot;,&quot;video_upload_id&quot;:null,&quot;id&quot;:186553359,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:16,&quot;comment_count&quot;:2,&quot;publication_id&quot;:3422740,&quot;publication_name&quot;:&quot;Data Tinkerer&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!JEdj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6bea5ccd-f356-4154-a1db-16268300510e_500x500.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div>]]></content:encoded></item><item><title><![CDATA[How Notion Scaled AI Q&A to Millions of Workspaces]]></title><description><![CDATA[Kafka, Spark and Ray powering low-latency, high-throughput search pipelines]]></description><link>https://www.datatinkerer.io/p/how-notion-scaled-ai-q-and-a-to-millions-of-workspaces</link><guid isPermaLink="false">https://www.datatinkerer.io/p/how-notion-scaled-ai-q-and-a-to-millions-of-workspaces</guid><dc:creator><![CDATA[Data Tinkerer]]></dc:creator><pubDate>Thu, 26 Mar 2026 04:00:33 GMT</pubDate><enclosure url="https://images.unsplash.com/photo-1683114010575-3ead4403180e?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxfHxub3Rpb258ZW58MHx8fHwxNzc0MjU5NjgzfDA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Fellow Data Tinkerers</p><p>Today we will look at how Notion scaled its AI Q&amp;A to millions of users.</p><p>But before that, I wanted to share with you what you could unlock if you share Data Tinkerer with just <strong>1 more person</strong>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!HsBV!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F084aaae7-58c7-464e-b0f8-7c31523985ef_800x402.gif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!HsBV!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F084aaae7-58c7-464e-b0f8-7c31523985ef_800x402.gif 424w, https://substackcdn.com/image/fetch/$s_!HsBV!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F084aaae7-58c7-464e-b0f8-7c31523985ef_800x402.gif 848w, https://substackcdn.com/image/fetch/$s_!HsBV!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F084aaae7-58c7-464e-b0f8-7c31523985ef_800x402.gif 1272w, https://substackcdn.com/image/fetch/$s_!HsBV!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F084aaae7-58c7-464e-b0f8-7c31523985ef_800x402.gif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!HsBV!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F084aaae7-58c7-464e-b0f8-7c31523985ef_800x402.gif" width="800" height="402" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/084aaae7-58c7-464e-b0f8-7c31523985ef_800x402.gif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:402,&quot;width&quot;:800,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:3369150,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/gif&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/191742179?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F084aaae7-58c7-464e-b0f8-7c31523985ef_800x402.gif&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!HsBV!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F084aaae7-58c7-464e-b0f8-7c31523985ef_800x402.gif 424w, https://substackcdn.com/image/fetch/$s_!HsBV!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F084aaae7-58c7-464e-b0f8-7c31523985ef_800x402.gif 848w, https://substackcdn.com/image/fetch/$s_!HsBV!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F084aaae7-58c7-464e-b0f8-7c31523985ef_800x402.gif 1272w, https://substackcdn.com/image/fetch/$s_!HsBV!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F084aaae7-58c7-464e-b0f8-7c31523985ef_800x402.gif 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>There are 100+ resources to learn all things data (science, engineering, analysis). It includes videos, courses, projects and can be filtered by tech stack (Python, SQL, Spark and etc), skill level (Beginner, Intermediate and so on) provider name or free/paid. So if you know other people who like staying up to date on all things data, please share Data Tinkerer with them!</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.datatinkerer.io/leaderboard?&amp;utm_source=post&quot;,&quot;text&quot;:&quot;Refer a friend&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.datatinkerer.io/leaderboard?&amp;utm_source=post"><span>Refer a friend</span></a></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://images.unsplash.com/photo-1683114010575-3ead4403180e?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxfHxub3Rpb258ZW58MHx8fHwxNzc0MjU5NjgzfDA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://images.unsplash.com/photo-1683114010575-3ead4403180e?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxfHxub3Rpb258ZW58MHx8fHwxNzc0MjU5NjgzfDA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 424w, https://images.unsplash.com/photo-1683114010575-3ead4403180e?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxfHxub3Rpb258ZW58MHx8fHwxNzc0MjU5NjgzfDA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 848w, https://images.unsplash.com/photo-1683114010575-3ead4403180e?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxfHxub3Rpb258ZW58MHx8fHwxNzc0MjU5NjgzfDA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 1272w, https://images.unsplash.com/photo-1683114010575-3ead4403180e?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxfHxub3Rpb258ZW58MHx8fHwxNzc0MjU5NjgzfDA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 1456w" sizes="100vw"><img src="https://images.unsplash.com/photo-1683114010575-3ead4403180e?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxfHxub3Rpb258ZW58MHx8fHwxNzc0MjU5NjgzfDA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080" width="3840" height="2160" data-attrs="{&quot;src&quot;:&quot;https://images.unsplash.com/photo-1683114010575-3ead4403180e?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxfHxub3Rpb258ZW58MHx8fHwxNzc0MjU5NjgzfDA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:2160,&quot;width&quot;:3840,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;a black and white block with the letter n on it&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="a black and white block with the letter n on it" title="a black and white block with the letter n on it" srcset="https://images.unsplash.com/photo-1683114010575-3ead4403180e?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxfHxub3Rpb258ZW58MHx8fHwxNzc0MjU5NjgzfDA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 424w, https://images.unsplash.com/photo-1683114010575-3ead4403180e?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxfHxub3Rpb258ZW58MHx8fHwxNzc0MjU5NjgzfDA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 848w, https://images.unsplash.com/photo-1683114010575-3ead4403180e?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxfHxub3Rpb258ZW58MHx8fHwxNzc0MjU5NjgzfDA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 1272w, https://images.unsplash.com/photo-1683114010575-3ead4403180e?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxfHxub3Rpb258ZW58MHx8fHwxNzc0MjU5NjgzfDA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Photo by <a href="https://unsplash.com/@maria_shalabaieva">Mariia Shalabaieva</a> on <a href="https://unsplash.com">Unsplash</a></figcaption></figure></div><p>Now, with that out of the way, let&#8217;s get to Notion&#8217;s AI Q&amp;A level up!</p><h3>TL;DR</h3><div><hr></div><h4><strong>Situation</strong></h4><p>Notion launched AI Q&amp;A on top of vector search and quickly faced massive demand across millions of workspaces. The initial system hit limits in capacity, onboarding speed and cost.</p><h4><strong>Task</strong></h4><p>Scale onboarding, keep indexes fresh and reduce rising infrastructure costs. At the same time, simplify a growingly complex architecture without hurting latency.</p><h4><strong>Action</strong></h4><p>They introduced dual ingestion paths, generation-based indexing, serverless architecture and migrated to turbopuffer. Then reduced recomputation with page state tracking and moved embeddings to Ray for unified compute.</p><h4><strong>Result</strong></h4><p>600x onboarding growth, 15x workspace growth and major cost reductions across layers. Latency improved and the system became simpler and more efficient.</p><h4><strong>Use Cases</strong></h4><p>Real-time search indexing, semantic search, document retrieval</p><h4><strong>Tech Stack/Framework</strong></h4><p>Apache Spark, AWS EMR, Apache Airflow, Apache Kafka, AWS S3, DynamoDB, Ray, turbopuffer</p><div><hr></div><h3>Explained further</h3><div><hr></div><h4>Context</h4><p>When <a href="https://www.notion.com/blog/introducing-q-and-a">Notion launched AI Q&amp;A</a> in November 2023, the core idea sounded simple enough: let people ask natural-language questions and retrieve relevant knowledge from across their workspace and connected tools. In practice, that meant building a vector search system that could ingest huge amounts of content, stay fresh as pages changed and do all of it at a cost that made sense at Notion scale.</p><p>That is the real story here. Not just &#8220;vector search powers AI&#8221; but what happens after launch, when adoption jumps faster than expected and the infrastructure underneath has to keep up. Over two years, the Notion team pushed that system through several big transitions: scaling onboarding, dealing with storage pressure, changing database architecture, reworking indexing logic and moving embeddings workloads onto Ray. The headline numbers are hard to ignore: 10x scale and roughly one-tenth the cost.</p><p>This is a good example of how modern AI infrastructure usually evolves. The first version gets the product live. The next few versions are about survival, then simplification, then cost, then latency, then getting rid of all the awkward bits that built up during the rush.</p><div><hr></div><h4>Vector search, explained through Notion&#8217;s lens</h4><p>Traditional keyword search is literal. It works when users type the exact words that exist in the content. It starts falling apart when the wording changes but the meaning stays the same. Someone searching for &#8220;team meeting notes&#8221; may still want a page called &#8220;group standup summary,&#8221; but keyword search does not naturally understand that those are closely related.</p><p>Vector search solves that by representing text as embeddings. Instead of storing only words, it maps text into a high-dimensional space where semantically similar ideas sit closer together. That means retrieval is based on meaning, not exact phrasing.</p><p>For Notion AI, this matters a lot. The system needs to answer questions in natural language by finding useful content across a workspace and even across connected sources like Slack and Google Drive. That is exactly the sort of setup where semantic retrieval becomes more useful than plain lexical matching. A user is not thinking about the title of the page or the exact phrasing inside a paragraph. They are asking a question in their own words and expecting the system to bridge the gap.</p><p>That expectation becomes expensive very quickly.</p><div><hr></div><h4>Part 1: Scaling beyond what the original system expected</h4><p>At launch, Notion&#8217;s ingestion and indexing pipeline had two paths.</p><p>The first was an offline path. Batch jobs running on Apache Spark would chunk existing documents, generate embeddings through an API and bulk-load those vectors into the vector database. This handled the heavy lifting for backfilling large amounts of existing content.</p><p>The second was an online path. Kafka consumers processed page edits in near real time so live workspaces stayed up to date with sub-minute latency.</p><p>It is a practical split. The offline side handles the backlog and large initial loads. The online side keeps things fresh once a workspace is active. Together, the two-path setup gave Notion a way to onboard workspaces at scale without sacrificing freshness for day-to-day edits.</p><p>The vector database itself ran on dedicated &#8216;pod&#8217; clusters, where storage and compute were coupled. The Notion team designed sharding in a way that echoed their Postgres setup: workspace ID was the partitioning key, routing used range-based partitioning and a single config referenced all shards.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!zNlu!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff165c24e-314a-4bfb-ae67-84e69f2b5c63_2172x1428.avif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!zNlu!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff165c24e-314a-4bfb-ae67-84e69f2b5c63_2172x1428.avif 424w, https://substackcdn.com/image/fetch/$s_!zNlu!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff165c24e-314a-4bfb-ae67-84e69f2b5c63_2172x1428.avif 848w, https://substackcdn.com/image/fetch/$s_!zNlu!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff165c24e-314a-4bfb-ae67-84e69f2b5c63_2172x1428.avif 1272w, https://substackcdn.com/image/fetch/$s_!zNlu!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff165c24e-314a-4bfb-ae67-84e69f2b5c63_2172x1428.avif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!zNlu!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff165c24e-314a-4bfb-ae67-84e69f2b5c63_2172x1428.avif" width="1456" height="957" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f165c24e-314a-4bfb-ae67-84e69f2b5c63_2172x1428.avif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:957,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:12663,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/avif&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/191742179?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff165c24e-314a-4bfb-ae67-84e69f2b5c63_2172x1428.avif&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!zNlu!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff165c24e-314a-4bfb-ae67-84e69f2b5c63_2172x1428.avif 424w, https://substackcdn.com/image/fetch/$s_!zNlu!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff165c24e-314a-4bfb-ae67-84e69f2b5c63_2172x1428.avif 848w, https://substackcdn.com/image/fetch/$s_!zNlu!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff165c24e-314a-4bfb-ae67-84e69f2b5c63_2172x1428.avif 1272w, https://substackcdn.com/image/fetch/$s_!zNlu!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff165c24e-314a-4bfb-ae67-84e69f2b5c63_2172x1428.avif 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">pipelines writing into sharded vector database pods (Source: Notion)</figcaption></figure></div><p>That all made sense on paper. Then the product launched and demand was overwhelming.</p><p>Notion quickly built up a waitlist of millions of workspaces that wanted access to Q&amp;A. The problem was no longer whether the system worked. It was how fast it could onboard people without cracking under the pressure.</p><p><strong>When the indexes started to fill up</strong></p><p>Only a month after launch, the original indexes were already nearing capacity.</p><p>That is the kind of problem that sounds good in product meetings and bad in infrastructure meetings. If the indexes filled up, Notion would have to pause onboarding. That would slow down rollout and delay access for everyone waiting.</p><p>The team had two obvious options.</p><p>One was to re-shard incrementally. Clone data into another index, delete half, repeat and keep doing that every couple of weeks as new customers came in.</p><p>The other was to re-shard for the final expected volume. But their vector database provider charged for uptime, so over-provisioning would have been painfully expensive.</p><p>Instead, the Notion team went with a third approach. When a set of indexes got close to full, they provisioned a new set and directed all newly onboarded workspaces there. Each set was assigned a generation ID, which determined where reads and writes should go.</p><p>It is not the prettiest long-term design, but it was a smart short-term move. It avoided repeated re-shard operations and kept onboarding moving. Sometimes the right scaling decision is not the most elegant one. It is the one that buys breathing room without stopping the business.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!a8zu!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4100f229-aff6-4fd9-8fa3-f35e4878a0aa_1990x1218.avif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!a8zu!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4100f229-aff6-4fd9-8fa3-f35e4878a0aa_1990x1218.avif 424w, https://substackcdn.com/image/fetch/$s_!a8zu!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4100f229-aff6-4fd9-8fa3-f35e4878a0aa_1990x1218.avif 848w, https://substackcdn.com/image/fetch/$s_!a8zu!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4100f229-aff6-4fd9-8fa3-f35e4878a0aa_1990x1218.avif 1272w, https://substackcdn.com/image/fetch/$s_!a8zu!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4100f229-aff6-4fd9-8fa3-f35e4878a0aa_1990x1218.avif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!a8zu!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4100f229-aff6-4fd9-8fa3-f35e4878a0aa_1990x1218.avif" width="1456" height="891" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4100f229-aff6-4fd9-8fa3-f35e4878a0aa_1990x1218.avif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:891,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:16313,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/avif&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/191742179?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4100f229-aff6-4fd9-8fa3-f35e4878a0aa_1990x1218.avif&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!a8zu!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4100f229-aff6-4fd9-8fa3-f35e4878a0aa_1990x1218.avif 424w, https://substackcdn.com/image/fetch/$s_!a8zu!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4100f229-aff6-4fd9-8fa3-f35e4878a0aa_1990x1218.avif 848w, https://substackcdn.com/image/fetch/$s_!a8zu!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4100f229-aff6-4fd9-8fa3-f35e4878a0aa_1990x1218.avif 1272w, https://substackcdn.com/image/fetch/$s_!a8zu!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4100f229-aff6-4fd9-8fa3-f35e4878a0aa_1990x1218.avif 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">New index &#8216;generations&#8217; added as capacity fills, routing new workspaces without re-sharding. (Source: Notion)</figcaption></figure></div><p><strong>Turning onboarding into a throughput problem</strong></p><p>Even with the architecture in place, the initial onboarding rate was nowhere near enough. At launch, Notion could onboard only a few hundred workspaces per day. At that pace, clearing a multi-million waitlist would have taken decades which is obviously not a real option.</p><p>So the team pushed hard on throughput. Using Airflow scheduling, pipelining and Spark job tuning, they dramatically increased capacity.</p><p>The results were big:</p><ul><li><p>Daily onboarding capacity increased by <strong>600x</strong></p></li><li><p>Active workspaces grew <strong>15x</strong></p></li><li><p>Vector database capacity expanded <strong>8x</strong></p></li></ul><p>By April 2024, the Q&amp;A waitlist was cleared.</p><p>That is the kind of milestone that looks clean in hindsight but it came with a cost. Managing multiple generations of databases helped during the hypergrowth phase but it also added operational complexity and financial overhead. The team had solved the immediate scaling problem, but the architecture was starting to feel heavy.</p><p>That set up the next phase of the story.</p><div><hr></div><h4>Part 2: Cost becomes the next constraint</h4><p>In May 2024, Notion migrated its embeddings workload from the original dedicated &#8216;pod&#8217; architecture to a serverless setup that decoupled storage from compute and charged based on usage instead of uptime.</p><p>The effect was immediate. Costs dropped by 50 percent from peak usage, translating into several millions of dollars in annual savings.</p><p>That alone would have made the migration worthwhile, but the serverless design also fixed two practical problems. First, it removed the storage capacity constraints that had become a serious scaling bottleneck. Second, it simplified operations because the team no longer had to provision capacity ahead of demand.</p><p>Still, even after cutting costs in half, the annual run rate for vector database spend was still in the millions. From an engineering point of view, this is where things get interesting. The easy win had already happened. Now the team had to go after deeper structural gains.</p><p><strong>A new search foundation (turbopuffer)</strong></p><p>While working on the first round of savings, Notion also evaluated alternative search engines. <a href="https://turbopuffer.com/">turbopuffer</a> stood out because it offered significantly lower projected costs.</p><p>At the time, turbopuffer was a newer player in search. Its architecture was built on object storage with a focus on cost-efficiency and performance. It also supported both managed and bring-your-own-cloud deployment models and it made bulk modification of stored vector objects easier.</p><p>That combination lined up well with what Notion needed.</p><p>After a successful evaluation, the team decided to migrate its entire multi-billion-object workload to turbopuffer in late 2024. Since they were already making a provider switch, they used the migration as a chance to clean up the broader architecture too.</p><p>Several changes happened together.</p><p>First, they fully re-indexed the corpus, increasing write throughput in the offline indexing pipeline to rebuild everything in turbopuffer.</p><p>Second, they upgraded the embeddings model during the migration to be more performant.</p><p>Third, they simplified the architecture. turbopuffer treats each namespace as an independent index which removed the need to think about sharding and generation-based routing in the same way as before.</p><p>Finally, they handled the cutover gradually, migrating one generation at a time and validating correctness before moving on.</p><p>This is a strong pattern: if a migration is painful anyway, use it to pay off other infrastructure debt at the same time.</p><p>The outcome was solid on several fronts:</p><ul><li><p><strong>60 percent cost reduction</strong> on search engine spend</p></li><li><p><strong>35 percent reduction</strong> in AWS EMR compute costs</p></li><li><p>p50 production query latency <strong>improved from 70&#8211;100ms to 50&#8211;70ms</strong></p></li></ul><p>That is a meaningful improvement across cost and performance, which is not always easy to pull off together.</p><p><strong>Avoiding full reprocessing with page state tracking</strong></p><p>The next optimization went after a very expensive inefficiency in the indexing pipeline.</p><p>Notion pages can be long, so the team chunks each page into spans, embeds each span and stores those vectors with metadata such as authors and permissions. In the original implementation, any edit to a page or its properties triggered a full re-chunk, full re-embed and full re-upload of all spans on that page.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ytMS!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0ca76a6-5c49-4f2b-ba2d-f612690b6b83_1000x189.gif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ytMS!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0ca76a6-5c49-4f2b-ba2d-f612690b6b83_1000x189.gif 424w, https://substackcdn.com/image/fetch/$s_!ytMS!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0ca76a6-5c49-4f2b-ba2d-f612690b6b83_1000x189.gif 848w, https://substackcdn.com/image/fetch/$s_!ytMS!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0ca76a6-5c49-4f2b-ba2d-f612690b6b83_1000x189.gif 1272w, https://substackcdn.com/image/fetch/$s_!ytMS!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0ca76a6-5c49-4f2b-ba2d-f612690b6b83_1000x189.gif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ytMS!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0ca76a6-5c49-4f2b-ba2d-f612690b6b83_1000x189.gif" width="1000" height="189" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e0ca76a6-5c49-4f2b-ba2d-f612690b6b83_1000x189.gif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:189,&quot;width&quot;:1000,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2615711,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/gif&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/191742179?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0ca76a6-5c49-4f2b-ba2d-f612690b6b83_1000x189.gif&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ytMS!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0ca76a6-5c49-4f2b-ba2d-f612690b6b83_1000x189.gif 424w, https://substackcdn.com/image/fetch/$s_!ytMS!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0ca76a6-5c49-4f2b-ba2d-f612690b6b83_1000x189.gif 848w, https://substackcdn.com/image/fetch/$s_!ytMS!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0ca76a6-5c49-4f2b-ba2d-f612690b6b83_1000x189.gif 1272w, https://substackcdn.com/image/fetch/$s_!ytMS!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0ca76a6-5c49-4f2b-ba2d-f612690b6b83_1000x189.gif 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption">Page &#8594; chunking &#8594; embedding &#8594; vector DB with full reprocessing on every edit. (Source: Notion)</figcaption></figure></div><p>That meant even a tiny change could trigger a lot of unnecessary work.</p><p>The team narrowed the problem down to two things that actually mattered:</p><ol><li><p>The page text changes which means embeddings need updating</p></li><li><p>The metadata changes which means metadata needs updating</p></li></ol><p>To detect those cases, they tracked two hashes per span: one hash for the span text and another for the metadata fields. They chose 64-bit xxHash because it offered a good balance of speed, simplicity, low collision risk and storage footprint.</p><p>For caching, they used DynamoDB. Each page had one record containing the state of all spans on that page, including text and metadata hashes.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Mj4k!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8731bd9-7cf6-489b-bd3b-66077e710a40_1396x858.avif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Mj4k!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8731bd9-7cf6-489b-bd3b-66077e710a40_1396x858.avif 424w, https://substackcdn.com/image/fetch/$s_!Mj4k!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8731bd9-7cf6-489b-bd3b-66077e710a40_1396x858.avif 848w, https://substackcdn.com/image/fetch/$s_!Mj4k!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8731bd9-7cf6-489b-bd3b-66077e710a40_1396x858.avif 1272w, https://substackcdn.com/image/fetch/$s_!Mj4k!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8731bd9-7cf6-489b-bd3b-66077e710a40_1396x858.avif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Mj4k!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8731bd9-7cf6-489b-bd3b-66077e710a40_1396x858.avif" width="1396" height="858" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e8731bd9-7cf6-489b-bd3b-66077e710a40_1396x858.avif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:858,&quot;width&quot;:1396,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:23088,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/avif&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/191742179?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8731bd9-7cf6-489b-bd3b-66077e710a40_1396x858.avif&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Mj4k!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8731bd9-7cf6-489b-bd3b-66077e710a40_1396x858.avif 424w, https://substackcdn.com/image/fetch/$s_!Mj4k!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8731bd9-7cf6-489b-bd3b-66077e710a40_1396x858.avif 848w, https://substackcdn.com/image/fetch/$s_!Mj4k!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8731bd9-7cf6-489b-bd3b-66077e710a40_1396x858.avif 1272w, https://substackcdn.com/image/fetch/$s_!Mj4k!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8731bd9-7cf6-489b-bd3b-66077e710a40_1396x858.avif 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Span-level hashing (text + metadata) with DynamoDB state to detect and update only changed spans. (Source: Notion)</figcaption></figure></div><p>The win came from using that state to avoid unnecessary work.</p><p><strong>Case 1: The page text changes</strong></p><p>Imagine Herman Melville editing <em>Moby Dick</em> halfway through a page. Before this improvement, the whole page would have been re-embedded and reloaded. After the change, the system chunks the page, fetches the previous state from DynamoDB and compares text hashes span by span. It can then detect which spans actually changed and only re-embed and reload those.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!xTeN!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8fd33692-a8bf-4d28-bcd7-96a5201504f5_1000x151.gif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!xTeN!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8fd33692-a8bf-4d28-bcd7-96a5201504f5_1000x151.gif 424w, https://substackcdn.com/image/fetch/$s_!xTeN!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8fd33692-a8bf-4d28-bcd7-96a5201504f5_1000x151.gif 848w, https://substackcdn.com/image/fetch/$s_!xTeN!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8fd33692-a8bf-4d28-bcd7-96a5201504f5_1000x151.gif 1272w, https://substackcdn.com/image/fetch/$s_!xTeN!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8fd33692-a8bf-4d28-bcd7-96a5201504f5_1000x151.gif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!xTeN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8fd33692-a8bf-4d28-bcd7-96a5201504f5_1000x151.gif" width="1000" height="151" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8fd33692-a8bf-4d28-bcd7-96a5201504f5_1000x151.gif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:151,&quot;width&quot;:1000,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1891331,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/gif&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/191742179?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8fd33692-a8bf-4d28-bcd7-96a5201504f5_1000x151.gif&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!xTeN!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8fd33692-a8bf-4d28-bcd7-96a5201504f5_1000x151.gif 424w, https://substackcdn.com/image/fetch/$s_!xTeN!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8fd33692-a8bf-4d28-bcd7-96a5201504f5_1000x151.gif 848w, https://substackcdn.com/image/fetch/$s_!xTeN!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8fd33692-a8bf-4d28-bcd7-96a5201504f5_1000x151.gif 1272w, https://substackcdn.com/image/fetch/$s_!xTeN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8fd33692-a8bf-4d28-bcd7-96a5201504f5_1000x151.gif 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption">Only changed spans are re-embedded and updated using page state + text hash comparison. (Source: Notion)</figcaption></figure></div><p>That is the kind of fix that getting the balance right matters. Miss a changed span and search quality suffers. Reprocess too much and cost stays high.</p><p><strong>Case 2: The metadata changes</strong></p><p>Now imagine Melville updates permissions so the page becomes visible to everyone. The permissions metadata changes but the text does not.</p><p>Previously, that still meant re-embedding and reloading the entire page. With the new approach, Notion compares both text and metadata hashes. If the text hashes are unchanged but metadata hashes differ, the system skips embedding entirely and issues a PATCH command to the vector database to update only the metadata. That is much cheaper than recomputing embeddings.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!6qtN!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F44aa866c-9f82-4a8f-b065-1e5e780a2175_1000x197.gif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!6qtN!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F44aa866c-9f82-4a8f-b065-1e5e780a2175_1000x197.gif 424w, https://substackcdn.com/image/fetch/$s_!6qtN!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F44aa866c-9f82-4a8f-b065-1e5e780a2175_1000x197.gif 848w, https://substackcdn.com/image/fetch/$s_!6qtN!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F44aa866c-9f82-4a8f-b065-1e5e780a2175_1000x197.gif 1272w, https://substackcdn.com/image/fetch/$s_!6qtN!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F44aa866c-9f82-4a8f-b065-1e5e780a2175_1000x197.gif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!6qtN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F44aa866c-9f82-4a8f-b065-1e5e780a2175_1000x197.gif" width="1000" height="197" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/44aa866c-9f82-4a8f-b065-1e5e780a2175_1000x197.gif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:197,&quot;width&quot;:1000,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2162583,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/gif&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/191742179?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F44aa866c-9f82-4a8f-b065-1e5e780a2175_1000x197.gif&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!6qtN!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F44aa866c-9f82-4a8f-b065-1e5e780a2175_1000x197.gif 424w, https://substackcdn.com/image/fetch/$s_!6qtN!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F44aa866c-9f82-4a8f-b065-1e5e780a2175_1000x197.gif 848w, https://substackcdn.com/image/fetch/$s_!6qtN!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F44aa866c-9f82-4a8f-b065-1e5e780a2175_1000x197.gif 1272w, https://substackcdn.com/image/fetch/$s_!6qtN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F44aa866c-9f82-4a8f-b065-1e5e780a2175_1000x197.gif 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption">Metadata-only changes skip embeddings and update spans via PATCH in the vector DB. (Source: Notion)</figcaption></figure></div><p>Across these changes, the Page State Project reduced data volume by 70 percent. That saved money on both embeddings API costs and vector database write costs.</p><p><strong>Moving embeddings to Ray (indexing)</strong></p><p>In July 2025, Notion started migrating its near real-time embeddings pipeline to <a href="https://www.ray.io/">Ray</a> on <a href="https://www.anyscale.com/">Anyscale</a>.</p><p>The motivation came from several pain points in the earlier setup.</p><p>One was the <strong>&#8216;double compute&#8217; problem</strong>. Spark on EMR handled preprocessing like chunking, transformations and API orchestration, but embeddings themselves were still generated through an external provider that charged per token. So the team was paying for both preprocessing infrastructure and embedding API usage.</p><p>Another issue was <strong>endpoint reliability</strong>. Fresh search indexes depended on the stability of an external embeddings API.</p><p>The third problem was <strong>clunky pipelining</strong>. To smooth traffic and avoid API rate limits, the team had built a multi-step handoff process where Spark jobs passed batches through S3. It worked but it was clunky.</p><p>Ray and Anyscale gave Notion a cleaner path.</p><p>Ray let the team run open-source embedding models directly, which meant more model flexibility and less dependence on external providers. By consolidating preprocessing and inference onto a single compute layer, they could cut out the double-compute setup. Ray also supports pipelining CPU-bound work such as chunking and page-state detection with GPU-bound embedding generation on the same nodes, which helps keep utilization high.</p><p>There was also a developer productivity angle. Anyscale workspaces let engineers write and test pipelines from their preferred tools without having to provision infrastructure manually.</p><p>And on the product side, self-hosting embeddings removed a third-party API hop from the user-facing path, which helped reduce end-to-end latency.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!UN1z!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a163616-e23f-465c-a9d4-e974253208d3_1000x362.gif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!UN1z!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a163616-e23f-465c-a9d4-e974253208d3_1000x362.gif 424w, https://substackcdn.com/image/fetch/$s_!UN1z!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a163616-e23f-465c-a9d4-e974253208d3_1000x362.gif 848w, https://substackcdn.com/image/fetch/$s_!UN1z!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a163616-e23f-465c-a9d4-e974253208d3_1000x362.gif 1272w, https://substackcdn.com/image/fetch/$s_!UN1z!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a163616-e23f-465c-a9d4-e974253208d3_1000x362.gif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!UN1z!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a163616-e23f-465c-a9d4-e974253208d3_1000x362.gif" width="1000" height="362" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2a163616-e23f-465c-a9d4-e974253208d3_1000x362.gif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:362,&quot;width&quot;:1000,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1537621,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/gif&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/191742179?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a163616-e23f-465c-a9d4-e974253208d3_1000x362.gif&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!UN1z!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a163616-e23f-465c-a9d4-e974253208d3_1000x362.gif 424w, https://substackcdn.com/image/fetch/$s_!UN1z!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a163616-e23f-465c-a9d4-e974253208d3_1000x362.gif 848w, https://substackcdn.com/image/fetch/$s_!UN1z!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a163616-e23f-465c-a9d4-e974253208d3_1000x362.gif 1272w, https://substackcdn.com/image/fetch/$s_!UN1z!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a163616-e23f-465c-a9d4-e974253208d3_1000x362.gif 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Ray natively supports pipelining CPU bound tasks (chunking, detecting page state) with GPU bound embeddings generation within the same node. (Source: Notion)</figcaption></figure></div><p>The rollout is still ongoing, but early results suggest a 90+ percent reduction in embeddings infrastructure costs. That is a major shift in how the economics of the system work.</p><p><strong>Real-time query embeddings on Ray (serving)</strong></p><p>Indexing is only half the picture. When users or agents search in Notion, queries must also be embedded on the fly before the vector database can be searched.</p><p>That makes serving latency-sensitive. The embedding has to happen fast enough that the search still feels responsive.</p><p>Hosting large embedding models is not trivial. GPU allocation, ingress routing, replication and autoscaling all matter, especially when traffic is uneven and expectations for responsiveness are high.</p><p><a href="https://docs.ray.io/en/latest/serve/index.html">Ray Serve</a> helped Notion here by handling much of that operational layer out of the box. The team could wrap open-source embedding models in persistent deployments that stay loaded on GPU, configure request batching and replication and manage the serving setup with normal Python code plus YAML-based infrastructure configuration.</p><p>That is a pretty practical endpoint for the broader journey.</p><p>What started as a vector search stack built quickly enough to launch AI Q&amp;A turned into a much more refined system: simpler in some places, more selective in others, cheaper across multiple layers and faster where users feel it. The interesting part is not any single tool choice. It is how the Notion team kept removing bottlenecks one by one: storage limits, awkward shard routing, redundant recomputation, external API dependence and fragmented compute layers.</p><p>That is usually what mature AI infrastructure looks like in the real world. Not one giant redesign. A sequence of sharp decisions, each fixing the thing that has become too expensive, too slow or too annoying to keep around.</p><div><hr></div><h3>The full scoop</h3><p>To learn more about this, check <a href="https://www.notion.com/blog/two-years-of-vector-search-at-notion">Notion's Engineering Blog</a> post on this topic</p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.datatinkerer.io/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">If you liked this post and don&#8217;t want to miss the next one, subscribe to Data Tinkerer!</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>If you are already subscribed and enjoyed the article, please give it a like and/or share it others, really appreciate it &#128591;</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.datatinkerer.io/p/how-notion-scaled-ai-q-and-a-to-millions-of-workspaces?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.datatinkerer.io/p/how-notion-scaled-ai-q-and-a-to-millions-of-workspaces?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><div><hr></div><h3>Keep learning</h3><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;0dbf0d77-87fd-4655-82da-31cc841a6d73&quot;,&quot;caption&quot;:&quot;LinkedIn pushed Venice to handle 175M+ lookups per second while ingesting 230M writes per second.<br /><br />This piece breaks down how they balanced compaction, CPU bottlenecks and adaptive throttling to scale ingestion under eventual consistency.&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;lg&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;How LinkedIn Built a Pipeline That Scales to 230M Records/sec Without Breaking SLAs&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:291590442,&quot;name&quot;:&quot;Data Tinkerer&quot;,&quot;bio&quot;:&quot;Head of analytics sharing deep dives and learnings about AI and all things data (science, engineering, analysis)&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/83d3bbe0-5fb8-4f8d-9b74-036abbd6fec9_500x500.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2026-02-19T04:00:52.353Z&quot;,&quot;cover_image&quot;:&quot;https://images.unsplash.com/photo-1712217559097-cc2aaf698767?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxM3x8bGlua2VkaW58ZW58MHx8fHwxNzcxMTMwNjQ1fDA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.datatinkerer.io/p/how-linkedin-built-a-pipeline-that-scales-to-230-million-records&quot;,&quot;section_name&quot;:&quot;Data Engineering&quot;,&quot;video_upload_id&quot;:null,&quot;id&quot;:187999868,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:10,&quot;comment_count&quot;:0,&quot;publication_id&quot;:3422740,&quot;publication_name&quot;:&quot;Data Tinkerer&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!JEdj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6bea5ccd-f356-4154-a1db-16268300510e_500x500.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;765aa4a7-c63b-4175-8423-aae14d8d54cb&quot;,&quot;caption&quot;:&quot;Grab needed to detect schema and value issues in Kafka streams while data was still in motion.<br /><br />This piece breaks down how they introduced real-time checks and fast alerts to catch poison events before they spread.&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;lg&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;How Grab Detects Data Issues across 100+ Kafka Topics Before They Spread&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:291590442,&quot;name&quot;:&quot;Data Tinkerer&quot;,&quot;bio&quot;:&quot;Head of analytics sharing deep dives and learnings about AI and all things data (science, engineering, analysis)&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/83d3bbe0-5fb8-4f8d-9b74-036abbd6fec9_500x500.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2026-01-15T04:15:57.055Z&quot;,&quot;cover_image&quot;:&quot;https://images.unsplash.com/photo-1624957083543-9a67140fabfd?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw0fHxncmFifGVufDB8fHx8MTc2NzkzMTIwOXww&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.datatinkerer.io/p/how-grab-detects-data-issues-across-100-kafka-topics&quot;,&quot;section_name&quot;:&quot;Data Engineering&quot;,&quot;video_upload_id&quot;:null,&quot;id&quot;:183755897,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:15,&quot;comment_count&quot;:0,&quot;publication_id&quot;:3422740,&quot;publication_name&quot;:&quot;Data Tinkerer&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!JEdj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6bea5ccd-f356-4154-a1db-16268300510e_500x500.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div>]]></content:encoded></item><item><title><![CDATA[What the Data Crowd Was Reading in February 2026]]></title><description><![CDATA[Tools, techniques and deep dives worth reading that I came across in February 2026.]]></description><link>https://www.datatinkerer.io/p/what-the-data-crowd-was-reading-in-february-2026</link><guid isPermaLink="false">https://www.datatinkerer.io/p/what-the-data-crowd-was-reading-in-february-2026</guid><dc:creator><![CDATA[Data Tinkerer]]></dc:creator><pubDate>Thu, 12 Mar 2026 04:00:47 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/b9b4d29f-5a76-44a8-902d-bc2983dbe445_500x500.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Fellow Data Tinkerers</p><p>It&#8217;s time for another round-up on all things data and AI!</p><p>But before that, I wanted to share with you what you could unlock if you share Data Tinkerer with just <strong>1 more person</strong>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ah3D!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faef79236-cc80-4038-9560-5b3fe6da7359_800x402.gif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ah3D!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faef79236-cc80-4038-9560-5b3fe6da7359_800x402.gif 424w, https://substackcdn.com/image/fetch/$s_!ah3D!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faef79236-cc80-4038-9560-5b3fe6da7359_800x402.gif 848w, https://substackcdn.com/image/fetch/$s_!ah3D!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faef79236-cc80-4038-9560-5b3fe6da7359_800x402.gif 1272w, https://substackcdn.com/image/fetch/$s_!ah3D!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faef79236-cc80-4038-9560-5b3fe6da7359_800x402.gif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ah3D!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faef79236-cc80-4038-9560-5b3fe6da7359_800x402.gif" width="800" height="402" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/aef79236-cc80-4038-9560-5b3fe6da7359_800x402.gif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:402,&quot;width&quot;:800,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:3369150,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/gif&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/190247984?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faef79236-cc80-4038-9560-5b3fe6da7359_800x402.gif&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ah3D!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faef79236-cc80-4038-9560-5b3fe6da7359_800x402.gif 424w, https://substackcdn.com/image/fetch/$s_!ah3D!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faef79236-cc80-4038-9560-5b3fe6da7359_800x402.gif 848w, https://substackcdn.com/image/fetch/$s_!ah3D!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faef79236-cc80-4038-9560-5b3fe6da7359_800x402.gif 1272w, https://substackcdn.com/image/fetch/$s_!ah3D!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faef79236-cc80-4038-9560-5b3fe6da7359_800x402.gif 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>There are 100+ resources to learn all things data (science, engineering, analysis). It includes videos, courses, projects and can be filtered by tech stack (Python, SQL, Spark and etc), skill level (Beginner, Intermediate and so on) provider name or free/paid. So if you know other people who like staying up to date on all things data, please share Data Tinkerer with them!</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.datatinkerer.io/leaderboard?&amp;utm_source=post&quot;,&quot;text&quot;:&quot;Refer a friend&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.datatinkerer.io/leaderboard?&amp;utm_source=post"><span>Refer a friend</span></a></p><p>Without further ado, let&#8217;s get to the round up for February!</p><div><hr></div><h3>Data science &amp; AI</h3><ul><li><p><strong><a href="https://openai.com/index/inside-our-in-house-data-agent">Inside OpenAI&#8217;s in-house data agent</a> (14 minute read)<br></strong>OpenAI explains how its in-house data agent combines rich internal context, live querying and self-learning memory to help employees go from vague business questions to trustworthy analysis in minutes.</p></li><li><p><strong><a href="https://www.oneusefulthing.org/p/a-guide-to-which-ai-to-use-in-the">A Guide to Which AI to Use in the Agentic Era</a> (17 minute read)<br></strong><span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Ethan Mollick&quot;,&quot;id&quot;:846835,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7c05cdbc-40fd-459b-915d-f8bc8ac8bf01_3509x5263.jpeg&quot;,&quot;uuid&quot;:&quot;9368bcdb-9ff7-4426-bf98-c55a4e3f944c&quot;}" data-component-name="MentionToDOM"></span> breaks down the current AI tool landscape into a simple question: which model is best for this specific task, not which one wins the internet on any given day.</p></li><li><p><strong><a href="https://stevenadler.substack.com/p/judgment-isnt-uniquely-human">Judgment isn&#8217;t uniquely human</a> (19 minute read)<br></strong><span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Steven Adler&quot;,&quot;id&quot;:7944928,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a4cc0ff3-5403-4378-bee6-aded1be48a65_2317x2317.png&quot;,&quot;uuid&quot;:&quot;dcb868dd-7766-411c-9240-7e4f3a41b4fd&quot;}" data-component-name="MentionToDOM"></span> argues judgment and taste are not uniquely human, and that treating them as off-limits to AI is another case of people underestimating how quickly models can learn high-level cognitive tasks.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Bd7U!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1388f51d-9940-42e1-bda9-b1774f0a727c_1164x284.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Bd7U!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1388f51d-9940-42e1-bda9-b1774f0a727c_1164x284.webp 424w, https://substackcdn.com/image/fetch/$s_!Bd7U!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1388f51d-9940-42e1-bda9-b1774f0a727c_1164x284.webp 848w, https://substackcdn.com/image/fetch/$s_!Bd7U!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1388f51d-9940-42e1-bda9-b1774f0a727c_1164x284.webp 1272w, https://substackcdn.com/image/fetch/$s_!Bd7U!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1388f51d-9940-42e1-bda9-b1774f0a727c_1164x284.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Bd7U!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1388f51d-9940-42e1-bda9-b1774f0a727c_1164x284.webp" width="1164" height="284" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1388f51d-9940-42e1-bda9-b1774f0a727c_1164x284.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:284,&quot;width&quot;:1164,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:36176,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/webp&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/190247984?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1388f51d-9940-42e1-bda9-b1774f0a727c_1164x284.webp&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Bd7U!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1388f51d-9940-42e1-bda9-b1774f0a727c_1164x284.webp 424w, https://substackcdn.com/image/fetch/$s_!Bd7U!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1388f51d-9940-42e1-bda9-b1774f0a727c_1164x284.webp 848w, https://substackcdn.com/image/fetch/$s_!Bd7U!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1388f51d-9940-42e1-bda9-b1774f0a727c_1164x284.webp 1272w, https://substackcdn.com/image/fetch/$s_!Bd7U!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1388f51d-9940-42e1-bda9-b1774f0a727c_1164x284.webp 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div></li><li><p><strong><a href="https://x.com/kareldoostrlnck/status/2019477361557926281?utm_source=tldrai&amp;utm_medium=newsletter">I spent $10,000 to automate my research at OpenAI with Codex</a> (6 minute read)</strong><br>A researcher from OpenAI argues that people still underestimate what Codex can do in real workflows, sharing a high-usage setup and the practical lessons from using it at serious scale.</p></li><li><p><strong><a href="https://commonsensedata.substack.com/p/semantic-linking-the-aboutness-of">Semantic Linking: the Aboutness of Data</a> (12 minute read)</strong><br><span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Juha Korpela&quot;,&quot;id&quot;:195506571,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!QAUB!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F19dd5ae5-a523-4e05-a139-00405295f5af_2134x1853.png&quot;,&quot;uuid&quot;:&quot;44575a9f-9137-4f91-8fdb-101dee965464&quot;}" data-component-name="MentionToDOM"></span> expalins that semantic linking is the missing connection between data and meaning, where the real job is not adding labels to tables but explicitly mapping data objects to shared business concepts.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!8Pjf!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F325df036-0fca-4a3f-840e-64018410c7d2_886x726.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!8Pjf!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F325df036-0fca-4a3f-840e-64018410c7d2_886x726.webp 424w, https://substackcdn.com/image/fetch/$s_!8Pjf!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F325df036-0fca-4a3f-840e-64018410c7d2_886x726.webp 848w, https://substackcdn.com/image/fetch/$s_!8Pjf!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F325df036-0fca-4a3f-840e-64018410c7d2_886x726.webp 1272w, https://substackcdn.com/image/fetch/$s_!8Pjf!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F325df036-0fca-4a3f-840e-64018410c7d2_886x726.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!8Pjf!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F325df036-0fca-4a3f-840e-64018410c7d2_886x726.webp" width="886" height="726" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/325df036-0fca-4a3f-840e-64018410c7d2_886x726.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:726,&quot;width&quot;:886,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:19040,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/webp&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/190247984?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F325df036-0fca-4a3f-840e-64018410c7d2_886x726.webp&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!8Pjf!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F325df036-0fca-4a3f-840e-64018410c7d2_886x726.webp 424w, https://substackcdn.com/image/fetch/$s_!8Pjf!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F325df036-0fca-4a3f-840e-64018410c7d2_886x726.webp 848w, https://substackcdn.com/image/fetch/$s_!8Pjf!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F325df036-0fca-4a3f-840e-64018410c7d2_886x726.webp 1272w, https://substackcdn.com/image/fetch/$s_!8Pjf!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F325df036-0fca-4a3f-840e-64018410c7d2_886x726.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div></li><li><p><strong><a href="https://vinvashishta.substack.com/p/ai-is-finally-eating-softwares-total">AI Is Finally Eating Software&#8217;s Total Market: Here&#8217;s What&#8217;s Next</a> (11 minute read)<br></strong><span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Vin Vashishta&quot;,&quot;id&quot;:16324927,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/4b303796-0198-4e37-9ec4-016a2f12582d_400x400.jpeg&quot;,&quot;uuid&quot;:&quot;8c53864b-2a1a-4c48-9e00-f4e65c96daee&quot;}" data-component-name="MentionToDOM"></span> argues that AI won&#8217;t just disrupt software products, it will collapse whole layers of software value unless companies control the workflow, the customer relationship or the data moat.</p></li><li><p><strong><a href="https://joeljang.github.io/world-models-for-robotics?utm_source=tldrai&amp;utm_medium=newsletter">World Models and the Data Problem in Robotics</a> (13 minute read)<br></strong>Nvidia researcher makes the case that robotics hits a data wall long before an algorithm wall and that world models learned from human first-person video are the most plausible route to scalable robot intelligence.</p></li><li><p><strong><a href="https://medium.com/whatnot-engineering/lessons-learned-from-scaling-data-scientists-with-ai-e7aa7b3235b4">Lessons learned from scaling data scientists with AI</a> (10 minute read)<br></strong>Whatnot&#8217;s lesson from deploying AI for data science is that LLMs don&#8217;t remove the need for data scientists, they force teams to get serious about semantic layers and production-grade context management.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!NSuj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa809ae6c-3595-46e0-902e-0da2947f9c90_720x458.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!NSuj!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa809ae6c-3595-46e0-902e-0da2947f9c90_720x458.webp 424w, https://substackcdn.com/image/fetch/$s_!NSuj!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa809ae6c-3595-46e0-902e-0da2947f9c90_720x458.webp 848w, https://substackcdn.com/image/fetch/$s_!NSuj!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa809ae6c-3595-46e0-902e-0da2947f9c90_720x458.webp 1272w, https://substackcdn.com/image/fetch/$s_!NSuj!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa809ae6c-3595-46e0-902e-0da2947f9c90_720x458.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!NSuj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa809ae6c-3595-46e0-902e-0da2947f9c90_720x458.webp" width="720" height="458" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a809ae6c-3595-46e0-902e-0da2947f9c90_720x458.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:458,&quot;width&quot;:720,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:36948,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/webp&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/190247984?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa809ae6c-3595-46e0-902e-0da2947f9c90_720x458.webp&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!NSuj!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa809ae6c-3595-46e0-902e-0da2947f9c90_720x458.webp 424w, https://substackcdn.com/image/fetch/$s_!NSuj!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa809ae6c-3595-46e0-902e-0da2947f9c90_720x458.webp 848w, https://substackcdn.com/image/fetch/$s_!NSuj!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa809ae6c-3595-46e0-902e-0da2947f9c90_720x458.webp 1272w, https://substackcdn.com/image/fetch/$s_!NSuj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa809ae6c-3595-46e0-902e-0da2947f9c90_720x458.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div></li><li><p><strong><a href="https://www.datatinkerer.io/p/how-shopify-scales-taxonomy-evolution-across-10000-categories-with-ai-agents">How Shopify Scales Taxonomy Evolution Across 10,000+ Categories With Multi-Agent AI</a> (14 minute read)<br></strong>This piece breaks down how Shopify moved from reactive manual updates to a multi-agent system that scans taxonomy branches in parallel, proposes new categories/attributes from merchant data, detects duplicates and runs automated QA through domain-specific judges.</p></li></ul><div><hr></div><h3>Data engineering</h3><ul><li><p><strong><a href="https://dataengineeringcentral.substack.com/p/a-portable-analytics-stack">A Portable Analytics Stack</a> (13 minute read)<br></strong><span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Yuki&quot;,&quot;id&quot;:89127157,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!Y7d4!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F026b3d67-d3cf-4b3f-b498-7dd16df31b1e_1874x1868.png&quot;,&quot;uuid&quot;:&quot;5bc04c30-baa1-4f7d-a445-8dcf94fd6dc8&quot;}" data-component-name="MentionToDOM"></span> shows how a portable analytics stack built on DuckDB, DuckLake, dlt and SQLMesh can replace warehouse-heavy setups with lightweight, version-controlled pipelines that run locally or on cheap scheduled compute.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Mc7P!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98af22fb-150d-440a-a6bf-fb25b14a2d2a_1456x836.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Mc7P!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98af22fb-150d-440a-a6bf-fb25b14a2d2a_1456x836.webp 424w, https://substackcdn.com/image/fetch/$s_!Mc7P!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98af22fb-150d-440a-a6bf-fb25b14a2d2a_1456x836.webp 848w, https://substackcdn.com/image/fetch/$s_!Mc7P!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98af22fb-150d-440a-a6bf-fb25b14a2d2a_1456x836.webp 1272w, https://substackcdn.com/image/fetch/$s_!Mc7P!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98af22fb-150d-440a-a6bf-fb25b14a2d2a_1456x836.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Mc7P!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98af22fb-150d-440a-a6bf-fb25b14a2d2a_1456x836.webp" width="1456" height="836" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/98af22fb-150d-440a-a6bf-fb25b14a2d2a_1456x836.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:836,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:34544,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/webp&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/190247984?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98af22fb-150d-440a-a6bf-fb25b14a2d2a_1456x836.webp&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Mc7P!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98af22fb-150d-440a-a6bf-fb25b14a2d2a_1456x836.webp 424w, https://substackcdn.com/image/fetch/$s_!Mc7P!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98af22fb-150d-440a-a6bf-fb25b14a2d2a_1456x836.webp 848w, https://substackcdn.com/image/fetch/$s_!Mc7P!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98af22fb-150d-440a-a6bf-fb25b14a2d2a_1456x836.webp 1272w, https://substackcdn.com/image/fetch/$s_!Mc7P!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98af22fb-150d-440a-a6bf-fb25b14a2d2a_1456x836.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div></li><li><p><strong><a href="https://ghostinthedata.info/posts/2026/2026-02-07-self-healing/">Healing Tables: When Day-by-Day Backfills Become a Slow-Motion Disaster</a> (24 minute read)<br></strong>Chris Hillman shows why incremental historical backfills corrupt dimensions and proposes a healing table pattern that separates change detection from period building so history can be rebuilt cleanly.</p></li><li><p><strong><a href="https://joereis.substack.com/p/2028-the-great-data-reckoning">2028 - THE GREAT DATA RECKONING</a> (16 minute read)<br></strong>A speculative but funny take by<strong> </strong><span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Joe Reis&quot;,&quot;id&quot;:3531217,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6e4716b1-c223-41e3-b943-def0291bf217_1175x783.jpeg&quot;,&quot;uuid&quot;:&quot;ac81ae97-b281-43c3-83f2-fa87aa69afe7&quot;}" data-component-name="MentionToDOM"></span> where he imagines a 2028 data industry shakeout where AI wipes out much of the tooling and data theater while the people who survive are the ones with real business context and architecture skills.</p></li><li><p><strong><a href="https://www.datagibberish.com/p/data-engineers-are-becoming-metadataops-engineers">Data Engineers Are Becoming MetadataOps Engineers</a> (10 minute read)</strong><br>An interesting take by <span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Alejandro Aboy&quot;,&quot;id&quot;:22949723,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!u1Ao!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdca2c63d-9f5e-4cd3-99ac-7d8e71dc114b_1024x1024.jpeg&quot;,&quot;uuid&quot;:&quot;7bc527fa-74d9-439c-8b1c-47efa24317e3&quot;}" data-component-name="MentionToDOM"></span>  that the next layer of data engineering is MetadataOps: building AI-ready metadata, semantic structure and agent-facing governance so LLMs stop guessing and start using data reliably.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!rA0I!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8282242-7eaf-4270-8899-acb679ca05db_1246x492.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!rA0I!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8282242-7eaf-4270-8899-acb679ca05db_1246x492.webp 424w, https://substackcdn.com/image/fetch/$s_!rA0I!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8282242-7eaf-4270-8899-acb679ca05db_1246x492.webp 848w, https://substackcdn.com/image/fetch/$s_!rA0I!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8282242-7eaf-4270-8899-acb679ca05db_1246x492.webp 1272w, https://substackcdn.com/image/fetch/$s_!rA0I!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8282242-7eaf-4270-8899-acb679ca05db_1246x492.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!rA0I!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8282242-7eaf-4270-8899-acb679ca05db_1246x492.webp" width="1246" height="492" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d8282242-7eaf-4270-8899-acb679ca05db_1246x492.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:492,&quot;width&quot;:1246,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:29996,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/webp&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/190247984?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8282242-7eaf-4270-8899-acb679ca05db_1246x492.webp&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!rA0I!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8282242-7eaf-4270-8899-acb679ca05db_1246x492.webp 424w, https://substackcdn.com/image/fetch/$s_!rA0I!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8282242-7eaf-4270-8899-acb679ca05db_1246x492.webp 848w, https://substackcdn.com/image/fetch/$s_!rA0I!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8282242-7eaf-4270-8899-acb679ca05db_1246x492.webp 1272w, https://substackcdn.com/image/fetch/$s_!rA0I!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8282242-7eaf-4270-8899-acb679ca05db_1246x492.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div></li><li><p><strong><a href="https://pipeline2insights.substack.com/p/how-data-engineers-can-leverage-ai-tools-without-losing-fundamentals">How Data Engineers Can Leverage AI Tools Without Losing Fundamentals</a> (13 minute read)</strong><br><span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Jenny Ouyang&quot;,&quot;id&quot;:282291554,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4b27a5eb-a443-4738-b205-2a29d85f00b9_1068x1068.png&quot;,&quot;uuid&quot;:&quot;6c167fee-5ac6-4e3a-9da7-b49a6b406c6e&quot;}" data-component-name="MentionToDOM"></span> makes the case that data engineers should use AI to accelerate the boilerplate, not outsource the fundamentals because the real leverage still comes from owning modeling, architecture and performance</p></li><li><p><strong><a href="https://seattledataguy.substack.com/p/backfills-the-necessary-evil-of-data">Backfills - The Necessary Evil of Data Engineering</a> (12 minute read)<br></strong>A practical look by <span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;SeattleDataGuy&quot;,&quot;id&quot;:4963622,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F1ec905aa-9a7b-4f21-b0ff-fec92e8916d1_512x512.jpeg&quot;,&quot;uuid&quot;:&quot;4dcbf161-9f88-42ae-9512-dc6d0ba9728d&quot;}" data-component-name="MentionToDOM"></span> at why backfills happen, why engineers hate them, and how better parameterization, rerunnability, and storage-aware design can make them less painful.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!bbl6!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F24264c10-ffaf-4e05-abbf-1ecb2ae8e30e_800x1000.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!bbl6!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F24264c10-ffaf-4e05-abbf-1ecb2ae8e30e_800x1000.webp 424w, https://substackcdn.com/image/fetch/$s_!bbl6!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F24264c10-ffaf-4e05-abbf-1ecb2ae8e30e_800x1000.webp 848w, https://substackcdn.com/image/fetch/$s_!bbl6!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F24264c10-ffaf-4e05-abbf-1ecb2ae8e30e_800x1000.webp 1272w, https://substackcdn.com/image/fetch/$s_!bbl6!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F24264c10-ffaf-4e05-abbf-1ecb2ae8e30e_800x1000.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!bbl6!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F24264c10-ffaf-4e05-abbf-1ecb2ae8e30e_800x1000.webp" width="800" height="1000" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/24264c10-ffaf-4e05-abbf-1ecb2ae8e30e_800x1000.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1000,&quot;width&quot;:800,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:88114,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/webp&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/190247984?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F24264c10-ffaf-4e05-abbf-1ecb2ae8e30e_800x1000.webp&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!bbl6!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F24264c10-ffaf-4e05-abbf-1ecb2ae8e30e_800x1000.webp 424w, https://substackcdn.com/image/fetch/$s_!bbl6!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F24264c10-ffaf-4e05-abbf-1ecb2ae8e30e_800x1000.webp 848w, https://substackcdn.com/image/fetch/$s_!bbl6!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F24264c10-ffaf-4e05-abbf-1ecb2ae8e30e_800x1000.webp 1272w, https://substackcdn.com/image/fetch/$s_!bbl6!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F24264c10-ffaf-4e05-abbf-1ecb2ae8e30e_800x1000.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div></li><li><p><strong><a href="https://luminousmen.substack.com/p/why-your-5-second-bigquery-query">Why Your 5-Second BigQuery Query Isn&#8217;t Cheap</a> (13 minute read)</strong><br>A practical breakdown of BigQuery pricing by <span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;luminousmen&quot;,&quot;id&quot;:29227863,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffead33a9-5e35-4522-b96e-c1a523419524_300x297.jpeg&quot;,&quot;uuid&quot;:&quot;8b5a60a7-c76a-40b4-9563-67e217b7c93f&quot;}" data-component-name="MentionToDOM"></span> that shows why short query runtimes are a misleading proxy for cost, and why slots are the compute metric that actually matters.</p></li><li><p><strong><a href="https://www.datatinkerer.io/p/how-linkedin-built-a-pipeline-that-scales-to-230-million-records">How LinkedIn Built a Pipeline That Scales to 230M Records/sec Without Breaking SLAs</a> (12 minute read)<br></strong>LinkedIn pushed Venice to handle 175M+ lookups per second while ingesting 230M writes per second. This piece breaks down how they balanced compaction, CPU bottlenecks and adaptive throttling to scale ingestion under eventual consistency.</p></li></ul><div><hr></div><h3>Data analysis and visualisation</h3><ul><li><p><strong><a href="https://mlcontests.com/state-of-machine-learning-competitions-2025/">The State of Machine Learning Competitions</a> (34 minute read)</strong><br>This report maps the 2025 competition landscape and finds that winning solutions are getting more compute-hungry, transformer-heavy and increasingly shaped by Qwen in NLP while classic tabular methods still hold their ground.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!QrX6!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F172d9814-58ab-437c-9093-72f0429c57a1_976x465.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!QrX6!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F172d9814-58ab-437c-9093-72f0429c57a1_976x465.png 424w, https://substackcdn.com/image/fetch/$s_!QrX6!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F172d9814-58ab-437c-9093-72f0429c57a1_976x465.png 848w, https://substackcdn.com/image/fetch/$s_!QrX6!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F172d9814-58ab-437c-9093-72f0429c57a1_976x465.png 1272w, https://substackcdn.com/image/fetch/$s_!QrX6!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F172d9814-58ab-437c-9093-72f0429c57a1_976x465.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!QrX6!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F172d9814-58ab-437c-9093-72f0429c57a1_976x465.png" width="976" height="465" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/172d9814-58ab-437c-9093-72f0429c57a1_976x465.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:465,&quot;width&quot;:976,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:54108,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/190247984?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F172d9814-58ab-437c-9093-72f0429c57a1_976x465.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!QrX6!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F172d9814-58ab-437c-9093-72f0429c57a1_976x465.png 424w, https://substackcdn.com/image/fetch/$s_!QrX6!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F172d9814-58ab-437c-9093-72f0429c57a1_976x465.png 848w, https://substackcdn.com/image/fetch/$s_!QrX6!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F172d9814-58ab-437c-9093-72f0429c57a1_976x465.png 1272w, https://substackcdn.com/image/fetch/$s_!QrX6!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F172d9814-58ab-437c-9093-72f0429c57a1_976x465.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div></li></ul><div><hr></div><h3><strong>Other interesting reads</strong></h3><ul><li><p><strong><a href="https://hbr.org/2026/02/ai-doesnt-reduce-work-it-intensifies-it">AI Doesn&#8217;t Reduce Work - It Intensifies It</a> (9 minute read)<br></strong>Interesting findings in HBR that AI doesn&#8217;t really remove work so much as intensify it, speeding up expectations and raising the risk of burnout instead of delivering the productivity gains companies hoped for.</p></li><li><p><strong><a href="https://steve-yegge.medium.com/the-anthropic-hive-mind-d01f768f3d7b">The Anthropic Hive Mind</a> (21 minute read)<br></strong>Steve Yegge&#8217;s take is that Anthropic&#8217;s real edge is not just Claude but a hive-mind way of working where humans and AI operate in a shared, high-speed loop that most companies aren&#8217;t built for yet.</p></li><li><p><strong><a href="https://epochai.substack.com/p/the-least-understood-driver-of-ai">The least understood driver of AI progress</a> (36 minute read)</strong><br>Anson Ho highlights that software progress, not just bigger chips or more spending, is a major and underappreciated reason AI keeps getting better faster than many people expect.</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!KwlD!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4e4c1544-719e-4bcb-8b02-760d97629847_1456x833.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!KwlD!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4e4c1544-719e-4bcb-8b02-760d97629847_1456x833.webp 424w, https://substackcdn.com/image/fetch/$s_!KwlD!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4e4c1544-719e-4bcb-8b02-760d97629847_1456x833.webp 848w, https://substackcdn.com/image/fetch/$s_!KwlD!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4e4c1544-719e-4bcb-8b02-760d97629847_1456x833.webp 1272w, https://substackcdn.com/image/fetch/$s_!KwlD!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4e4c1544-719e-4bcb-8b02-760d97629847_1456x833.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!KwlD!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4e4c1544-719e-4bcb-8b02-760d97629847_1456x833.webp" width="1456" height="833" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4e4c1544-719e-4bcb-8b02-760d97629847_1456x833.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:833,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:37956,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/webp&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/190247984?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4e4c1544-719e-4bcb-8b02-760d97629847_1456x833.webp&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!KwlD!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4e4c1544-719e-4bcb-8b02-760d97629847_1456x833.webp 424w, https://substackcdn.com/image/fetch/$s_!KwlD!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4e4c1544-719e-4bcb-8b02-760d97629847_1456x833.webp 848w, https://substackcdn.com/image/fetch/$s_!KwlD!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4e4c1544-719e-4bcb-8b02-760d97629847_1456x833.webp 1272w, https://substackcdn.com/image/fetch/$s_!KwlD!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4e4c1544-719e-4bcb-8b02-760d97629847_1456x833.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div><hr></div><h3><strong>Quick favor - need your take</strong></h3><div class="poll-embed" data-attrs="{&quot;id&quot;:469745}" data-component-name="PollToDOM"></div><p><strong>Was there any standout article or topic from February I missed? Feel free to drop a comment or hit reply, even a quick line helps.</strong></p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.datatinkerer.io/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">If you liked this post and don&#8217;t want to miss the next one, subscribe to Data Tinkerer!</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>If you are already subscribed and enjoyed the article, please give it a like and/or share it others, really appreciate it &#128591;</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.datatinkerer.io/p/how-expedia-monitors-1000-ab-tests-in-real-time-with-flink-and-kafka?utm_source=substack&amp;utm_medium=email&amp;utm_content=share&amp;action=share&amp;token=eyJ1c2VyX2lkIjoyOTE1OTA0NDIsInBvc3RfaWQiOjE2OTA5NDI3MywiaWF0IjoxNzU0NTE5MDY3LCJleHAiOjE3NTcxMTEwNjcsImlzcyI6InB1Yi0zNDIyNzQwIiwic3ViIjoicG9zdC1yZWFjdGlvbiJ9.oZvHOJmFWdVqE7IbG0eqLLsohZgpmGBltKU1W08ZN4c&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://www.datatinkerer.io/p/how-expedia-monitors-1000-ab-tests-in-real-time-with-flink-and-kafka?utm_source=substack&amp;utm_medium=email&amp;utm_content=share&amp;action=share&amp;token=eyJ1c2VyX2lkIjoyOTE1OTA0NDIsInBvc3RfaWQiOjE2OTA5NDI3MywiaWF0IjoxNzU0NTE5MDY3LCJleHAiOjE3NTcxMTEwNjcsImlzcyI6InB1Yi0zNDIyNzQwIiwic3ViIjoicG9zdC1yZWFjdGlvbiJ9.oZvHOJmFWdVqE7IbG0eqLLsohZgpmGBltKU1W08ZN4c"><span>Share</span></a></p><div><hr></div><h3>Keep learning</h3><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;1b7b5909-6ab4-4a88-98a1-d09e96554f4d&quot;,&quot;caption&quot;:&quot;It's time for another data/AI roundup and here are the highlights from January&#128071;<br /><br />&#119811;&#119834;&#119853;&#119834; &#119826;&#119836;&#119842;&#119838;&#119847;&#119836;&#119838; &amp;amp; &#119808;&#119816;<br />Why &#8216;use agents or be left behind&#8217; is mostly about practical automation<br />Piecewise regression for spotting regime shifts in time series<br />Why AI benchmarks are hitting a measurement wall<br />What the data actually says about the state of open models<br />How large-scale recommendation systems are built in the real world<br /><br />&#119811;&#119834;&#119853;&#119834; &#119812;&#119847;&#119840;&#119842;&#119847;&#119838;&#119838;&#119851;&#119842;&#119847;&#119840;<br />How Unity Catalog really works under the hood<br />Databricks Lakeflow vs Airflow in practice<br />End-to-end agentic data modeling with OpenMetadata<br />A candid look at the day-to-day reality of data engineering<br />How Uber cut data lake freshness from hours to minutes with Flink<br /><br />&#119811;&#119834;&#119853;&#119834; &#119808;&#119847;&#119834;&#119845;&#119858;&#119852;&#119842;&#119852; &amp;amp; &#119809;&#119816;<br />The best data visualization projects of 2025<br />Why storytelling matters more than chart tricks<br />Designing more accessible line charts<br />Practical rules for dashboard filter placement<br /><br />Plus: ontologies explained, hard lessons from building AI agents in finance and new data on who&#8217;s really buying AI compute.&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;lg&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;What the Data Crowd Was Reading in January 2026&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:291590442,&quot;name&quot;:&quot;Data Tinkerer&quot;,&quot;bio&quot;:&quot;Head of analytics sharing deep dives and learnings about AI and all things data (science, engineering, analysis)&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/83d3bbe0-5fb8-4f8d-9b74-036abbd6fec9_500x500.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2026-02-05T03:20:52.027Z&quot;,&quot;cover_image&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c8f4e23e-4e9e-4420-bbed-d16a4d242c7d_500x500.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.datatinkerer.io/p/what-the-data-crowd-was-reading-in-january-2026&quot;,&quot;section_name&quot;:&quot;Data Roundup&quot;,&quot;video_upload_id&quot;:null,&quot;id&quot;:186553359,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:16,&quot;comment_count&quot;:2,&quot;publication_id&quot;:3422740,&quot;publication_name&quot;:&quot;Data Tinkerer&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!JEdj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6bea5ccd-f356-4154-a1db-16268300510e_500x500.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;6e0c97bb-8be5-42ce-a02a-36b05fdd232c&quot;,&quot;caption&quot;:&quot;It's time for another data/AI roundup and here are the highlights from December&#128071;<br /><br />&#119811;&#119834;&#119853;&#119834; &#119826;&#119836;&#119842;&#119838;&#119847;&#119836;&#119838; &amp;amp; &#119808;&#119816;<br />The state of LLMs in 2025<br />Building a data cleaning agent with LangGraph<br />Making sense of memory in AI agents<br />Exploring TabPFN: a foundation model built for tabular data<br /><br />&#119811;&#119834;&#119853;&#119834; &#119812;&#119847;&#119840;&#119842;&#119847;&#119838;&#119838;&#119851;&#119842;&#119847;&#119840;<br />Opinionated data platforms vs. open-source<br />Data quality design patterns<br />LLM for PDF data pipelines<br />DuckDB: the Swiss army knife for data engineers<br /><br />&#119811;&#119834;&#119853;&#119834; &#119808;&#119847;&#119834;&#119845;&#119858;&#119852;&#119842;&#119852; &amp;amp; &#119809;&#119816;<br />A comprehensive guide to data visualization<br />Broken charts and 9 visualization alternatives<br /><br />Plus: The most useful skill to learn as a data professional, predictions about AI in 2026 and the next data bottleneck&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;lg&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;What the Data Crowd Was Reading in December 2025&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:291590442,&quot;name&quot;:&quot;Data Tinkerer&quot;,&quot;bio&quot;:&quot;Ex-head of analytics sharing deep dives and learnings about AI and all things data (science, engineering, analysis)&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/83d3bbe0-5fb8-4f8d-9b74-036abbd6fec9_500x500.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2026-01-08T05:01:52.132Z&quot;,&quot;cover_image&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/29125fa4-9a37-40a2-a85c-c795fb77137f_500x500.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.datatinkerer.io/p/what-the-data-crowd-was-reading-in-december-2025&quot;,&quot;section_name&quot;:&quot;Data Roundup&quot;,&quot;video_upload_id&quot;:null,&quot;id&quot;:183495145,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:16,&quot;comment_count&quot;:2,&quot;publication_id&quot;:3422740,&quot;publication_name&quot;:&quot;Data Tinkerer&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!JEdj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6bea5ccd-f356-4154-a1db-16268300510e_500x500.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div>]]></content:encoded></item><item><title><![CDATA[How Shopify Scales Taxonomy Evolution Across 10,000+ Categories With Multi-Agent AI]]></title><description><![CDATA[From reactive manual curation to continuous taxonomy evolution grounded in merchant reality.]]></description><link>https://www.datatinkerer.io/p/how-shopify-scales-taxonomy-evolution-across-10000-categories-with-ai-agents</link><guid isPermaLink="false">https://www.datatinkerer.io/p/how-shopify-scales-taxonomy-evolution-across-10000-categories-with-ai-agents</guid><dc:creator><![CDATA[Data Tinkerer]]></dc:creator><pubDate>Thu, 26 Feb 2026 04:00:23 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!tUAj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F10490256-5538-403a-bc50-b153a36a9b6f_1536x1024.webp" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Fellow Data Tinkerers!</p><p>Today we will look at how Shopify scales its product categorisation using agentic AI</p><p>But before that, I wanted to share with you what you could unlock if you share Data Tinkerer with just <strong>1 more person</strong>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!jEOH!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6544e5f1-6242-45c5-b4e9-0fda20c0d106_800x402.gif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!jEOH!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6544e5f1-6242-45c5-b4e9-0fda20c0d106_800x402.gif 424w, https://substackcdn.com/image/fetch/$s_!jEOH!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6544e5f1-6242-45c5-b4e9-0fda20c0d106_800x402.gif 848w, https://substackcdn.com/image/fetch/$s_!jEOH!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6544e5f1-6242-45c5-b4e9-0fda20c0d106_800x402.gif 1272w, https://substackcdn.com/image/fetch/$s_!jEOH!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6544e5f1-6242-45c5-b4e9-0fda20c0d106_800x402.gif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!jEOH!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6544e5f1-6242-45c5-b4e9-0fda20c0d106_800x402.gif" width="800" height="402" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6544e5f1-6242-45c5-b4e9-0fda20c0d106_800x402.gif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:402,&quot;width&quot;:800,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:3369150,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/gif&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/188769392?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6544e5f1-6242-45c5-b4e9-0fda20c0d106_800x402.gif&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!jEOH!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6544e5f1-6242-45c5-b4e9-0fda20c0d106_800x402.gif 424w, https://substackcdn.com/image/fetch/$s_!jEOH!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6544e5f1-6242-45c5-b4e9-0fda20c0d106_800x402.gif 848w, https://substackcdn.com/image/fetch/$s_!jEOH!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6544e5f1-6242-45c5-b4e9-0fda20c0d106_800x402.gif 1272w, https://substackcdn.com/image/fetch/$s_!jEOH!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6544e5f1-6242-45c5-b4e9-0fda20c0d106_800x402.gif 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><p>There are 100+ resources to learn all things data (science, engineering, analysis). It includes videos, courses, projects and can be filtered by tech stack (Python, SQL, Spark and etc), skill level (Beginner, Intermediate and so on)  provider name or free/paid. So if you know other people who like staying up to date on all things data, please share Data Tinkerer with them!</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.datatinkerer.io/leaderboard?&amp;referrer_token=4tlsmi&amp;utm_source=post&quot;,&quot;text&quot;:&quot;Refer a friend&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://www.datatinkerer.io/leaderboard?&amp;referrer_token=4tlsmi&amp;utm_source=post"><span>Refer a friend</span></a></p><p>Now, with that out of the way, let&#8217;s get to Shopify&#8217;s multi-agent taxonomy</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!tUAj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F10490256-5538-403a-bc50-b153a36a9b6f_1536x1024.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!tUAj!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F10490256-5538-403a-bc50-b153a36a9b6f_1536x1024.webp 424w, https://substackcdn.com/image/fetch/$s_!tUAj!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F10490256-5538-403a-bc50-b153a36a9b6f_1536x1024.webp 848w, https://substackcdn.com/image/fetch/$s_!tUAj!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F10490256-5538-403a-bc50-b153a36a9b6f_1536x1024.webp 1272w, https://substackcdn.com/image/fetch/$s_!tUAj!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F10490256-5538-403a-bc50-b153a36a9b6f_1536x1024.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!tUAj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F10490256-5538-403a-bc50-b153a36a9b6f_1536x1024.webp" width="1456" height="971" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/10490256-5538-403a-bc50-b153a36a9b6f_1536x1024.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:971,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:72968,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/webp&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/188769392?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F10490256-5538-403a-bc50-b153a36a9b6f_1536x1024.webp&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!tUAj!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F10490256-5538-403a-bc50-b153a36a9b6f_1536x1024.webp 424w, https://substackcdn.com/image/fetch/$s_!tUAj!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F10490256-5538-403a-bc50-b153a36a9b6f_1536x1024.webp 848w, https://substackcdn.com/image/fetch/$s_!tUAj!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F10490256-5538-403a-bc50-b153a36a9b6f_1536x1024.webp 1272w, https://substackcdn.com/image/fetch/$s_!tUAj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F10490256-5538-403a-bc50-b153a36a9b6f_1536x1024.webp 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">(Source: Shopify)</figcaption></figure></div><h3>TL;DR</h3><div><hr></div><h4><strong>Situation</strong></h4><p>Shopify&#8217;s product classification system makes tens of millions of predictions daily, across a taxonomy with 10,000+ categories and 2,000+ attributes. Commerce changes fast, the taxonomy has to keep up or the whole stack starts drifting.</p><h4><strong>Task</strong></h4><p>Keep the taxonomy current at scale without relying on slow, reactive, manual curation. Fix volume, expertise and consistency problems before they hit merchants, customers and model quality.</p><h4><strong>Action</strong></h4><p>Built an AI multi-agent system: structural analysis + product-driven analysis, then intelligent synthesis. Added equivalence detection (category = broader category + attribute filters) plus automated QA via domain-specific AI judges.</p><h4><strong>Result</strong></h4><p>Taxonomy branches can be analyzed in parallel: hundreds of categories instead of a few per day. Quality improved via grounded merchant data + structural consistency, with judges filtering proposals (example: &#8220;MagSafe compatible&#8221; approved at 93% confidence).</p><h4><strong>Use Cases</strong></h4><p>Category discovery, attribute gap detection, taxonomy maintenance, search and filtering improvement</p><h4><strong>Tech Stack/Framework</strong></h4><p>AI agent, equivalence detection, multi-agent system</p><div><hr></div><h3>Explained further</h3><div><hr></div><h4>Context</h4><p>Last year, over 875 million people bought items from Shopify merchants. Shopify already runs a product classification system that makes tens of millions of predictions daily with a high degree of accuracy.</p><p>But classification is the easy part compared to the thing underneath it: taxonomy. Because the model doesn&#8217;t just need to be right, it also needs a clean, consistent set of labels to be right <em>about</em>.</p><p>That&#8217;s the challenge for Shopify: once you have 10,000+ categories and 2,000+ attributes, the taxonomy becomes its own product with its own failure modes. It can get stale. It can get inconsistent. It can drift away from how merchants actually describe products. And when that happens, the classifier quality takes the blame for what is basically a taxonomy debt problem.</p><p>So this post is about what Shopify did next: they built an AI multi-agent system that doesn&#8217;t just classify products, it actively improves the taxonomy labels themselves so the system stays agile as commerce changes.</p><div><hr></div><h4>The challenge: scaling taxonomy without losing accuracy</h4><p>A taxonomy is a contract between three groups that rarely agree:</p><ul><li><p>Merchants describing products the way they think about them</p></li><li><p>Customers searching and filtering with their own mental model</p></li><li><p>Platform systems trying to enforce structure so everything stays queryable and comparable</p></li></ul><p>Now add the reality that commerce never sits still. New products appear. Old categories split. Entire verticals get reshaped by trends, tech and regulation. The taxonomy has to keep up or the platform drifts away from how people actually shop and sell.</p><p>Shopify frames the challenge as three problems.</p><p><strong>The volume problem: manual updates can&#8217;t keep up</strong></p><p>A global product taxonomy needs constant attention. Every new product type, emerging technology category and seasonal trend potentially triggers taxonomy updates. </p><p>Manual curation becomes a bottleneck because taxonomy work is not one change. It is usually a bundle: a category addition, a hierarchy decision, a set of attributes, naming alignment and a check for duplicates or conflicts.</p><p>For example, consider the emergence of categories like smart home devices or remote work equipment. Each category represents not just new categories but also entirely new attribute sets.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!u4Rg!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F08494fd3-dfe6-4019-acce-a25af7e18b77_2462x841.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!u4Rg!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F08494fd3-dfe6-4019-acce-a25af7e18b77_2462x841.webp 424w, https://substackcdn.com/image/fetch/$s_!u4Rg!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F08494fd3-dfe6-4019-acce-a25af7e18b77_2462x841.webp 848w, https://substackcdn.com/image/fetch/$s_!u4Rg!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F08494fd3-dfe6-4019-acce-a25af7e18b77_2462x841.webp 1272w, https://substackcdn.com/image/fetch/$s_!u4Rg!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F08494fd3-dfe6-4019-acce-a25af7e18b77_2462x841.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!u4Rg!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F08494fd3-dfe6-4019-acce-a25af7e18b77_2462x841.webp" width="1456" height="497" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/08494fd3-dfe6-4019-acce-a25af7e18b77_2462x841.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:497,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:42484,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/webp&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/188769392?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F08494fd3-dfe6-4019-acce-a25af7e18b77_2462x841.webp&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!u4Rg!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F08494fd3-dfe6-4019-acce-a25af7e18b77_2462x841.webp 424w, https://substackcdn.com/image/fetch/$s_!u4Rg!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F08494fd3-dfe6-4019-acce-a25af7e18b77_2462x841.webp 848w, https://substackcdn.com/image/fetch/$s_!u4Rg!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F08494fd3-dfe6-4019-acce-a25af7e18b77_2462x841.webp 1272w, https://substackcdn.com/image/fetch/$s_!u4Rg!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F08494fd3-dfe6-4019-acce-a25af7e18b77_2462x841.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">A new category example (Source: Shopify)</figcaption></figure></div><p>Smart home devices for instance need connectivity types, power requirements and compatibility. Those are specs that did not exist in the taxonomy before.</p><p>So the work isn&#8217;t a one-off. It&#8217;s continuous expansion and adjustment across a giant tree of concepts.</p><p><strong>The expertise problem: every vertical has its own rules</strong></p><p>Good taxonomy design is domain-heavy. You do not get it right by being generally smart. You get it right by knowing what matters in that product world. For example, there are nuanced differences between types of guitar pickups or appropriate attributes for skincare products.</p><p>A taxonomy team can&#8217;t realistically maintain deep expertise across every vertical that merchants sell into. But if the taxonomy is inconsistent or poorly structured, merchants pay for it through reduced discoverability, suboptimal search results and ineffective filters for customers.</p><p><strong>The consistency problem: one concept, five different labels</strong></p><p>As the taxonomy grows organically, inconsistencies creep in:</p><ul><li><p>similar concepts represented differently across categories</p></li><li><p>naming conventions inconsistent</p></li><li><p>discrepancies between merchant categorization and customer expectations</p></li></ul><p>Those inconsistencies compound. Merchants get confused when listing. Customers get frustrated when filtering and comparing. And the classifier quality drops because labels stop being reliably meaningful across the tree.</p><p>This is the part most teams underestimate. In a taxonomy, small inconsistencies behave like small data quality issues: they don&#8217;t stay small.</p><div><hr></div><h4>From manual taxonomy work to agent-led evolution</h4><p>Shopify&#8217;s taxonomy management evolved from a manual workflow into an AI-driven system.</p><p><strong>The old way: Expert review, slow throughput</strong></p><p>The traditional pattern is familiar:</p><ol><li><p>domain experts analyze product data</p></li><li><p>identify gaps or inconsistencies</p></li><li><p>propose changes</p></li><li><p>implement changes via careful review</p></li></ol><p>It ensures quality but it also creates bottlenecks.</p><p>The biggest problem was the reactive nature of it: Shopify would only recognize the need for new categories or attributes <em><strong>after</strong></em> merchants began listing products that didn&#8217;t fit. By then, the system had already missed chances to give merchants and customers a better experience.</p><p>So even when you do great manual work, you&#8217;re always late.</p><p><strong>The breakthrough: Two lenses, one system</strong></p><p>Advanced language models opened a door: not to replace human experts, but to augment them with scale and consistency.</p><p>The key insight was that taxonomy improvement comes from two different angles:</p><ul><li><p><strong>structural analysis</strong>: the logical structure of the taxonomy, gaps in hierarchies, missing relationships</p></li><li><p><strong>product-driven analysis</strong>: what real product data says merchants actually sell and how they describe it</p></li></ul><p>Each angle catches different issues. Shopify&#8217;s breakthrough was combining them into a system that can continuously propose improvements then filter them through quality checks before human review.</p><div><hr></div><h4>Inside the system: How the agents work</h4><p>The new architecture rests on three principles:</p><ul><li><p>specialized analysis</p></li><li><p>intelligent coordination</p></li><li><p>quality assurance</p></li></ul><p>And the intent is clear: continuous evolution, not one-time taxonomy construction.</p><p><strong>What&#8217;s different: continuous evolution, not one-time creation</strong></p><p>AI&#8217;s been used for product categorisation and one-off taxonomy builds for a while. The difference here is instead of building it once and hoping it holds, Shopify uses specialised AI agents to keep the taxonomy evolving continuously. There are 3 core components to this approach:</p><p><strong>1- Real product grounding: </strong>The system integrates actual merchant product data so proposals reflect how merchants describe and categorize products. This keeps decisions grounded in commerce reality rather than only theory.</p><p>In other words: if merchants are consistently describing a differentiator, it probably belongs in the taxonomy, even if it offends someone&#8217;s idea of a &#8220;pure&#8221; category tree.</p><p><strong>2- Multi-agent specialization: </strong>Multiple specialized agents run different analyses. One focuses on structural consistency. Another focuses on product-driven insights. Then those outputs are synthesized. The claim here is that the combination finds improvements that neither agent would find alone.</p><p>That makes sense structurally. Taxonomy is both a graph problem and a language problem.</p><p><strong>3- Sophisticated equivalence discovery: </strong>This is the most interesting component. detecting equivalence relationships where a specific category equals a broader category filtered by attribute values.</p><p>This matters because merchants should be able to organize their catalogs however they want, while the platform still understands what products &#8216;mean&#8217; underneath the merchant&#8217;s choices.</p><p>So instead of forcing everyone into one rigid structure, Shopify tries to learn mappings that preserve flexibility and still support search, recommendations, and analytics.</p><p><strong>Architecture flow</strong></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!jtG3!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F36e057eb-1a81-48b4-be6f-6fc481b3a4c1_470x840.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!jtG3!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F36e057eb-1a81-48b4-be6f-6fc481b3a4c1_470x840.webp 424w, https://substackcdn.com/image/fetch/$s_!jtG3!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F36e057eb-1a81-48b4-be6f-6fc481b3a4c1_470x840.webp 848w, https://substackcdn.com/image/fetch/$s_!jtG3!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F36e057eb-1a81-48b4-be6f-6fc481b3a4c1_470x840.webp 1272w, https://substackcdn.com/image/fetch/$s_!jtG3!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F36e057eb-1a81-48b4-be6f-6fc481b3a4c1_470x840.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!jtG3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F36e057eb-1a81-48b4-be6f-6fc481b3a4c1_470x840.webp" width="470" height="840" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/36e057eb-1a81-48b4-be6f-6fc481b3a4c1_470x840.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:840,&quot;width&quot;:470,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:23704,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/webp&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/188769392?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F36e057eb-1a81-48b4-be6f-6fc481b3a4c1_470x840.webp&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!jtG3!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F36e057eb-1a81-48b4-be6f-6fc481b3a4c1_470x840.webp 424w, https://substackcdn.com/image/fetch/$s_!jtG3!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F36e057eb-1a81-48b4-be6f-6fc481b3a4c1_470x840.webp 848w, https://substackcdn.com/image/fetch/$s_!jtG3!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F36e057eb-1a81-48b4-be6f-6fc481b3a4c1_470x840.webp 1272w, https://substackcdn.com/image/fetch/$s_!jtG3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F36e057eb-1a81-48b4-be6f-6fc481b3a4c1_470x840.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">AI Agent architecture flow (Source: Shopify)</figcaption></figure></div><p>The AI agent workflow works like this:</p><ul><li><p>enable agents to explore the taxonomy</p></li><li><p>run multi-stage analysis (structural + product-driven)</p></li><li><p>synthesize and resolve conflicts</p></li><li><p>detect equivalences</p></li><li><p>run automated QA using judges</p></li><li><p>send refined proposals to humans</p></li><li><p>update the taxonomy in production</p></li></ul><div><hr></div><h4>Enabling agent-taxonomy interaction</h4><p>Before agents can improve anything, they need to &#8216;read&#8217; the taxonomy like a human would.</p><p>Shopify implemented a system that allows agents to:</p><ul><li><p>search for related categories</p></li><li><p>examine hierarchical relationships</p></li><li><p>verify whether proposed changes conflict with existing elements</p></li></ul><p>A good example: an agent analyzing guitar-related categories can explore the full musical instruments hierarchy, inspect related attributes across instruments and look for patterns that suggest better structure.</p><p>In other words, the agent doesn&#8217;t just look at one node. It roams the neighborhood.</p><div><hr></div><h4>The pipeline: specialised agents, staged decisions</h4><p>For the AI Agent to be work properly, different specialised agents are at work to provide specific insights:</p><p><strong>Structural analysis: </strong>This agent looks at the taxonomy itself for logical consistency, completeness, gaps in category hierarchies, naming convention inconsistencies and opportunities to reorganize related concepts.</p><p>It operates purely on the taxonomy structure and aims to keep the whole thing coherent.</p><p><strong>Product-driven analysis: </strong>This agent integrates real merchant data and examines how products are described and categorized on the platform.</p><p>Specifically, it looks at patterns in product titles, product descriptions and merchant-defined categories. The goal is to find gaps between how merchants think about products and how the taxonomy represents them.</p><p>This is an important distinction. A taxonomy can be structurally perfect and still be useless if it doesn&#8217;t match merchant reality.</p><p><strong>Intelligent synthesis: </strong>Now we have two streams of recommendations:</p><ul><li><p>structure-driven improvements</p></li><li><p>product-driven improvements</p></li></ul><p>They can conflict. They can overlap. They can propose redundant changes.</p><p>The synthesis step merges insights, resolves conflicts, and eliminates redundancies. And sometimes the best answer is not pick one, it&#8217;s combine both.</p><p><strong>Equivalence detection: </strong>This agent solves a practical commerce problem: merchants want flexibility but platform systems need consistency.</p><p>Consider golf shoes:</p><ul><li><p>Merchant A uses a specific &#8216;Golf Shoes&#8217; category</p></li><li><p>Merchant B uses &#8216;Athletic Shoes&#8217; with an &#8216;Activity Type = Golf attribute</p></li></ul><p>Both are valid for the merchant. But search, recommendations and analytics benefit from understanding these represent the same product set.</p><p>So the system detects attribute-based equivalences of the form:</p><blockquote><p>specific category = broader category + one or more attribute filters</p></blockquote><p>This lets merchants organize however makes sense for their business while keeping platform intelligence consistent across different catalog structures.</p><p>If you&#8217;ve ever tried to do cross-merchant analytics at scale, you can probably feel why Shopify cared enough to build an entire agent for this.</p><div><hr></div><h4>Automated QA: judges before humans</h4><p>After proposals are generated, Shopify adds automated QA through specialized AI judges.</p><p>These judges evaluate proposed changes using reasoning capabilities and taxonomy design principles to filter and refine suggestions before human review.</p><p>The important detail is that evaluation differs by change type:</p><ul><li><p>adding new attributes</p></li><li><p>creating category hierarchies</p></li><li><p>modifying existing structures</p></li></ul><p>Different changes require different criteria, so one generic &#8216;judge prompt&#8217; would be weak. So instead, they use <strong>domain-specific judges</strong>.</p><p>An electronics-focused judge applies electronics expertise. A musical instruments judge applies that domain&#8217;s patterns and rules. The goal is consistent domain-aware evaluation across verticals.</p><div><hr></div><h3>Results</h3><p>The system can analyze taxonomy branches in parallel, identifying improvement opportunities that used to take weeks of manual work.</p><p>Where experts might analyze a few categories per day, the system can evaluate hundreds of categories, checking both:</p><ul><li><p>structural consistency</p></li><li><p>alignment with real product data</p></li></ul><p>This matters most for emerging product categories. When new product types become popular on the platform, the system can quickly identify taxonomy gaps and propose comprehensive solutions, instead of reactive patches that build up debt.</p><p><strong>Quality improvements</strong></p><p>The multi-agent design improves consistency and comprehensiveness because it combines two lenses:</p><ul><li><p>structural analysis keeps hierarchy organization logical and consistent</p></li><li><p>product-driven analysis keeps categories and attributes aligned with merchant reality</p></li></ul><p>The automated QA layer reduces iteration cycles by catching issues before human review and applying domain expertise consistently.</p><p><strong>Example: mobile phone accessories and MagSafe compatibility</strong></p><p>Product analysis identified that merchants frequently advertise &#8220;MagSafe support&#8221; for accessories such as chargers, cases and wallets.</p><p>So the agent proposed adding a boolean attribute: &#8216;MagSafe compatible.&#8217;</p><p>A specialized electronics judge evaluated the proposal and checked:</p><ul><li><p>no duplicate attribute already exists</p></li><li><p>boolean type is appropriate</p></li><li><p>while brand-specific, MagSafe is treated as a legitimate technical standard similar to Bluetooth or Qi</p></li></ul><p>The judge approved the attribute with <strong>93% confidence</strong>, noting it would improve customer filtering for MagSafe-ready products.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!M4Uu!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e5da6d9-d490-4773-a9a5-77b2d8b2166d_2048x1460.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!M4Uu!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e5da6d9-d490-4773-a9a5-77b2d8b2166d_2048x1460.webp 424w, https://substackcdn.com/image/fetch/$s_!M4Uu!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e5da6d9-d490-4773-a9a5-77b2d8b2166d_2048x1460.webp 848w, https://substackcdn.com/image/fetch/$s_!M4Uu!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e5da6d9-d490-4773-a9a5-77b2d8b2166d_2048x1460.webp 1272w, https://substackcdn.com/image/fetch/$s_!M4Uu!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e5da6d9-d490-4773-a9a5-77b2d8b2166d_2048x1460.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!M4Uu!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e5da6d9-d490-4773-a9a5-77b2d8b2166d_2048x1460.webp" width="1456" height="1038" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8e5da6d9-d490-4773-a9a5-77b2d8b2166d_2048x1460.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1038,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:215182,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/webp&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/188769392?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e5da6d9-d490-4773-a9a5-77b2d8b2166d_2048x1460.webp&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!M4Uu!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e5da6d9-d490-4773-a9a5-77b2d8b2166d_2048x1460.webp 424w, https://substackcdn.com/image/fetch/$s_!M4Uu!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e5da6d9-d490-4773-a9a5-77b2d8b2166d_2048x1460.webp 848w, https://substackcdn.com/image/fetch/$s_!M4Uu!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e5da6d9-d490-4773-a9a5-77b2d8b2166d_2048x1460.webp 1272w, https://substackcdn.com/image/fetch/$s_!M4Uu!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e5da6d9-d490-4773-a9a5-77b2d8b2166d_2048x1460.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">MagSafe example (Source: Shopify)</figcaption></figure></div><p>This example matters because it demonstrates the full loop:</p><ul><li><p>merchant reality creates a signal</p></li><li><p>the agent proposes a structured change</p></li><li><p>a domain judge validates it with rule checks and domain framing</p></li><li><p>humans get a higher quality proposal to review</p></li></ul><p><strong>Scaling development: from reactive fixes to proactive evolution</strong></p><p>The biggest shift is strategic: taxonomy development becomes proactive, not reactive.</p><p>Instead of waiting for a merchant pain point or a platform limitation to trigger a change, the system can identify and address gaps earlier.</p><p>The system can also reason over the entire taxonomy structure, which supports cross-category consistency. That helps avoid the fragmentation you get when teams fix issues in isolation.</p><p>To validate the approach, they applied it to a specific area: <strong>Electronics &gt; Communications &gt; Telephony</strong> (called &#8220;Telephony AI&#8221; in their analysis) and compared it against their previous manual expansion method.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!3I-O!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8ae93987-6ddd-4093-891c-44bed1b0a9ff_1558x1164.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!3I-O!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8ae93987-6ddd-4093-891c-44bed1b0a9ff_1558x1164.webp 424w, https://substackcdn.com/image/fetch/$s_!3I-O!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8ae93987-6ddd-4093-891c-44bed1b0a9ff_1558x1164.webp 848w, https://substackcdn.com/image/fetch/$s_!3I-O!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8ae93987-6ddd-4093-891c-44bed1b0a9ff_1558x1164.webp 1272w, https://substackcdn.com/image/fetch/$s_!3I-O!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8ae93987-6ddd-4093-891c-44bed1b0a9ff_1558x1164.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!3I-O!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8ae93987-6ddd-4093-891c-44bed1b0a9ff_1558x1164.webp" width="1456" height="1088" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8ae93987-6ddd-4093-891c-44bed1b0a9ff_1558x1164.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1088,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:103104,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/webp&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/188769392?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8ae93987-6ddd-4093-891c-44bed1b0a9ff_1558x1164.webp&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!3I-O!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8ae93987-6ddd-4093-891c-44bed1b0a9ff_1558x1164.webp 424w, https://substackcdn.com/image/fetch/$s_!3I-O!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8ae93987-6ddd-4093-891c-44bed1b0a9ff_1558x1164.webp 848w, https://substackcdn.com/image/fetch/$s_!3I-O!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8ae93987-6ddd-4093-891c-44bed1b0a9ff_1558x1164.webp 1272w, https://substackcdn.com/image/fetch/$s_!3I-O!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8ae93987-6ddd-4093-891c-44bed1b0a9ff_1558x1164.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">AI Agent impact (Source: Shopify)</figcaption></figure></div><p>As you can see from the chart, the AI-assisted method can compress years of work into weeks for the taxonomy area if the agents are applied across all verticals.</p><div><hr></div><h3>The full scoop</h3><p>To learn more about this, check <a href="https://shopify.engineering/product-taxonomy-at-scale">Shopify's Engineering Blog</a> post on this topic</p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.datatinkerer.io/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">If you liked this post and don&#8217;t want to miss the next one, subscribe to Data Tinkerer!</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>If you are already subscribed and enjoyed the article, please give it a like and/or share it others, really appreciate it &#128591;</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.datatinkerer.io/p/how-shopify-scales-taxonomy-evolution-across-10000-categories-with-ai-agents?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.datatinkerer.io/p/how-shopify-scales-taxonomy-evolution-across-10000-categories-with-ai-agents?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><div><hr></div><h3>Keep learning</h3><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;e78745a9-87b3-4842-91b1-c47c28b3e197&quot;,&quot;caption&quot;:&quot;Production ML isn&#8217;t only about clever architectures. It&#8217;s about judgment, trade-offs and systems that hold up when data is messy.<br /><br />I sat down with Ahsaas Bajaj , Senior ML Engineer at Instacart, to talk about how they handle product substitutions at scale, what actually moves business metrics and what changes when you move into a senior ML role.&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;lg&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;How to Build a Recommendation System at Scale: Insights from Instacart&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:291590442,&quot;name&quot;:&quot;Data Tinkerer&quot;,&quot;bio&quot;:&quot;Head of analytics sharing deep dives and learnings about AI and all things data (science, engineering, analysis)&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/83d3bbe0-5fb8-4f8d-9b74-036abbd6fec9_500x500.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null},{&quot;id&quot;:175610076,&quot;name&quot;:&quot;Ahsaas Bajaj&quot;,&quot;bio&quot;:&quot;Senior Machine Learning Engineer II at Instacart&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/34dac958-9c70-4f48-89ed-6c2e0d6f197e_899x901.png&quot;,&quot;is_guest&quot;:true,&quot;bestseller_tier&quot;:null,&quot;primaryPublicationSubscribeUrl&quot;:&quot;https://bajajahsaas.substack.com/subscribe?&quot;,&quot;primaryPublicationUrl&quot;:&quot;https://bajajahsaas.substack.com&quot;,&quot;primaryPublicationName&quot;:&quot;Ahsaas Bajaj&quot;,&quot;primaryPublicationId&quot;:7296320}],&quot;post_date&quot;:&quot;2026-01-29T03:30:24.563Z&quot;,&quot;cover_image&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3e6c5924-ed6c-4998-8e4a-8f88d9102c8b_844x473.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.datatinkerer.io/p/how-to-build-a-recommendation-system-at-scale&quot;,&quot;section_name&quot;:&quot;Data Science&quot;,&quot;video_upload_id&quot;:null,&quot;id&quot;:181648418,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:12,&quot;comment_count&quot;:2,&quot;publication_id&quot;:3422740,&quot;publication_name&quot;:&quot;Data Tinkerer&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!JEdj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6bea5ccd-f356-4154-a1db-16268300510e_500x500.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;85d1fa84-549b-4cc1-b9b2-ea55a5e0b6fb&quot;,&quot;caption&quot;:&quot;DoorDash built an anomaly detection platform to catch fraud trends before they result into huge top-line losses.<br /><br />This piece breaks down how they scan hundreds of millions of overlapping segments each day, cut fraud detection time from 100+ days to under three and save tens of millions annually by finding small signals while they still look like noise.&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;lg&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;How DoorDash Saves Tens of Millions of Dollars Per Year by Detecting Fraud 30&#215; Faster&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:291590442,&quot;name&quot;:&quot;Data Tinkerer&quot;,&quot;bio&quot;:&quot;Head of analytics sharing deep dives and learnings about AI and all things data (science, engineering, analysis)&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/83d3bbe0-5fb8-4f8d-9b74-036abbd6fec9_500x500.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2026-01-23T05:56:24.141Z&quot;,&quot;cover_image&quot;:&quot;https://images.unsplash.com/photo-1648091855444-76f97897dcd4?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxfHxkb29yZGFzaHxlbnwwfHx8fDE3NjkxNDc0MjB8MA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.datatinkerer.io/p/how-doordash-saves-tens-of-millions-a-year-by-detecting-fraud&quot;,&quot;section_name&quot;:&quot;Data Science&quot;,&quot;video_upload_id&quot;:null,&quot;id&quot;:185495640,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:15,&quot;comment_count&quot;:0,&quot;publication_id&quot;:3422740,&quot;publication_name&quot;:&quot;Data Tinkerer&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!JEdj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6bea5ccd-f356-4154-a1db-16268300510e_500x500.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div>]]></content:encoded></item><item><title><![CDATA[How LinkedIn Built a Pipeline That Scales to 230M Records/sec Without Breaking SLAs]]></title><description><![CDATA[From partition strategy to adaptive throttling, the playbook behind Venice&#8217;s ingestion evolution.]]></description><link>https://www.datatinkerer.io/p/how-linkedin-built-a-pipeline-that-scales-to-230-million-records</link><guid isPermaLink="false">https://www.datatinkerer.io/p/how-linkedin-built-a-pipeline-that-scales-to-230-million-records</guid><dc:creator><![CDATA[Data Tinkerer]]></dc:creator><pubDate>Thu, 19 Feb 2026 04:00:52 GMT</pubDate><enclosure url="https://images.unsplash.com/photo-1712217559097-cc2aaf698767?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxM3x8bGlua2VkaW58ZW58MHx8fHwxNzcxMTMwNjQ1fDA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Fellow Data Tinkerers</p><p>Today we will look at how LinkedIn ingests data at scale.</p><p>But before that, I wanted to share with you what you could unlock if you share Data Tinkerer with just <strong>1 more person</strong>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!y5YD!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F42e71ce5-0a12-4514-b7a0-ddfd0460289a_800x402.gif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!y5YD!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F42e71ce5-0a12-4514-b7a0-ddfd0460289a_800x402.gif 424w, https://substackcdn.com/image/fetch/$s_!y5YD!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F42e71ce5-0a12-4514-b7a0-ddfd0460289a_800x402.gif 848w, https://substackcdn.com/image/fetch/$s_!y5YD!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F42e71ce5-0a12-4514-b7a0-ddfd0460289a_800x402.gif 1272w, https://substackcdn.com/image/fetch/$s_!y5YD!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F42e71ce5-0a12-4514-b7a0-ddfd0460289a_800x402.gif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!y5YD!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F42e71ce5-0a12-4514-b7a0-ddfd0460289a_800x402.gif" width="800" height="402" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/42e71ce5-0a12-4514-b7a0-ddfd0460289a_800x402.gif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:402,&quot;width&quot;:800,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:3369150,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/gif&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/187999868?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F42e71ce5-0a12-4514-b7a0-ddfd0460289a_800x402.gif&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!y5YD!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F42e71ce5-0a12-4514-b7a0-ddfd0460289a_800x402.gif 424w, https://substackcdn.com/image/fetch/$s_!y5YD!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F42e71ce5-0a12-4514-b7a0-ddfd0460289a_800x402.gif 848w, https://substackcdn.com/image/fetch/$s_!y5YD!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F42e71ce5-0a12-4514-b7a0-ddfd0460289a_800x402.gif 1272w, https://substackcdn.com/image/fetch/$s_!y5YD!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F42e71ce5-0a12-4514-b7a0-ddfd0460289a_800x402.gif 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>There are 100+ resources to learn all things data (science, engineering, analysis). It includes videos, courses, projects and can be filtered by tech stack (Python, SQL, Spark and etc), skill level (Beginner, Intermediate and so on) provider name or free/paid. So if you know other people who like staying up to date on all things data, please share Data Tinkerer with them!</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.datatinkerer.io/leaderboard?&amp;utm_source=post&quot;,&quot;text&quot;:&quot;Refer a friend&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.datatinkerer.io/leaderboard?&amp;utm_source=post"><span>Refer a friend</span></a></p><p>Now, with that out of the way, let&#8217;s get to Venice: LinkedIn&#8217;s data storage platform</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://images.unsplash.com/photo-1712217559097-cc2aaf698767?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxM3x8bGlua2VkaW58ZW58MHx8fHwxNzcxMTMwNjQ1fDA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://images.unsplash.com/photo-1712217559097-cc2aaf698767?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxM3x8bGlua2VkaW58ZW58MHx8fHwxNzcxMTMwNjQ1fDA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 424w, https://images.unsplash.com/photo-1712217559097-cc2aaf698767?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxM3x8bGlua2VkaW58ZW58MHx8fHwxNzcxMTMwNjQ1fDA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 848w, https://images.unsplash.com/photo-1712217559097-cc2aaf698767?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxM3x8bGlua2VkaW58ZW58MHx8fHwxNzcxMTMwNjQ1fDA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 1272w, https://images.unsplash.com/photo-1712217559097-cc2aaf698767?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxM3x8bGlua2VkaW58ZW58MHx8fHwxNzcxMTMwNjQ1fDA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 1456w" sizes="100vw"><img src="https://images.unsplash.com/photo-1712217559097-cc2aaf698767?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxM3x8bGlua2VkaW58ZW58MHx8fHwxNzcxMTMwNjQ1fDA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080" width="5184" height="3456" data-attrs="{&quot;src&quot;:&quot;https://images.unsplash.com/photo-1712217559097-cc2aaf698767?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxM3x8bGlua2VkaW58ZW58MHx8fHwxNzcxMTMwNjQ1fDA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:3456,&quot;width&quot;:5184,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;a computer screen with a facebook page on it&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="a computer screen with a facebook page on it" title="a computer screen with a facebook page on it" srcset="https://images.unsplash.com/photo-1712217559097-cc2aaf698767?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxM3x8bGlua2VkaW58ZW58MHx8fHwxNzcxMTMwNjQ1fDA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 424w, https://images.unsplash.com/photo-1712217559097-cc2aaf698767?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxM3x8bGlua2VkaW58ZW58MHx8fHwxNzcxMTMwNjQ1fDA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 848w, https://images.unsplash.com/photo-1712217559097-cc2aaf698767?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxM3x8bGlua2VkaW58ZW58MHx8fHwxNzcxMTMwNjQ1fDA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 1272w, https://images.unsplash.com/photo-1712217559097-cc2aaf698767?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxM3x8bGlua2VkaW58ZW58MHx8fHwxNzcxMTMwNjQ1fDA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Photo by <a href="https://unsplash.com/@getswello">Swello</a> on <a href="https://unsplash.com">Unsplash</a></figcaption></figure></div><h3>TL;DR</h3><div><hr></div><h4><strong>Situation</strong></h4><p>Venice powers LinkedIn&#8217;s AI-driven products and has scaled to 2,600+ stores with workloads spanning bulk loads, streaming updates and active/active replication. The ingestion pipeline had to handle throughput-heavy, CPU-heavy and latency-sensitive traffic under eventual consistency.</p><h4><strong>Task</strong></h4><p>Redesign ingestion to scale to 230M writes/sec while preserving ordering and protecting read and write SLAs. Support hybrid stores, partial updates and multi&#8211;data center replication without destabilizing clusters.</p><h4><strong>Action</strong></h4><p>Scaled bulk ingestion with partition tuning, shared consumer/writer pools and direct SST writes; tuned RocksDB via compaction triggers and BlobDB to manage amplification. Optimized CPU-heavy paths using Fast-Avro and parallel processing, then enforced priority pools and adaptive throttling to protect current-version latency.</p><h4><strong>Result</strong></h4><p>Venice now handles 175M+ key lookups/sec and 230M+ writes/sec in production. It maintains a write latency SLA under 10 minutes while safeguarding read latency as the top priority.</p><h4><strong>Use Cases</strong></h4><p>Large-scale feature stores, real-time recommendation systems, hybrid data serving, low-latency notification</p><h4><strong>Tech Stack/Framework</strong></h4><p>Apache Spark, Apache Samza, Apache Kafka, RocksDB, Fast-Avro, Adaptive Throttling</p><div><hr></div><h3>Explained further</h3><div><hr></div><h4>Background</h4><p><a href="https://github.com/linkedin/venice">Venice</a> is an open-source derived data storage platform and LinkedIn&#8217;s default storage layer for online AI use cases. It sits behind products like People You May Know, feed, videos, ads, notifications, the A/B testing platform, LinkedIn Learning and more.</p><p>Since Venice launched internally in 2016 it has scaled from a handful of stores to over 2,600 production stores. The workloads also evolved a lot. It started with &#8220;just bulk load a dataset&#8221; and grew into a mix of:</p><ul><li><p>Bulk loading huge offline datasets</p></li><li><p>Nearline streaming updates</p></li><li><p>Active/active replication across data centers</p></li><li><p>Partial updates that merge fields and collections</p></li><li><p>Deterministic write latency expectations under eventual consistency</p></li></ul><p>This post walks through how the ingestion pipeline was revamped to hit <strong>230 million records per second in production</strong>, what changed across the architecture, which optimizations moved the needle and how different workload types get tuned. A lot of these ideas are portable if you run any distributed ingestion system where ordering, throughput and predictable latency all matter at once.</p><div><hr></div><h4>Venice overall ingestion pipeline</h4><p>At a high level, store owners write to Venice through three paths:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!BTop!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda1d3a94-d249-4fcc-82c0-50950cf13705_600x297.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!BTop!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda1d3a94-d249-4fcc-82c0-50950cf13705_600x297.png 424w, https://substackcdn.com/image/fetch/$s_!BTop!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda1d3a94-d249-4fcc-82c0-50950cf13705_600x297.png 848w, https://substackcdn.com/image/fetch/$s_!BTop!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda1d3a94-d249-4fcc-82c0-50950cf13705_600x297.png 1272w, https://substackcdn.com/image/fetch/$s_!BTop!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda1d3a94-d249-4fcc-82c0-50950cf13705_600x297.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!BTop!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda1d3a94-d249-4fcc-82c0-50950cf13705_600x297.png" width="600" height="297" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/da1d3a94-d249-4fcc-82c0-50950cf13705_600x297.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:297,&quot;width&quot;:600,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:27952,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/187999868?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda1d3a94-d249-4fcc-82c0-50950cf13705_600x297.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!BTop!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda1d3a94-d249-4fcc-82c0-50950cf13705_600x297.png 424w, https://substackcdn.com/image/fetch/$s_!BTop!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda1d3a94-d249-4fcc-82c0-50950cf13705_600x297.png 848w, https://substackcdn.com/image/fetch/$s_!BTop!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda1d3a94-d249-4fcc-82c0-50950cf13705_600x297.png 1272w, https://substackcdn.com/image/fetch/$s_!BTop!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda1d3a94-d249-4fcc-82c0-50950cf13705_600x297.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Venice overall ingestion pipeline (Source: LinkedIn)</figcaption></figure></div><ol><li><p><strong>Bulk loads</strong> from an offline processing platform (example: Spark)</p></li><li><p><strong>Nearline writes</strong> from a streaming processing platform (example: Samza)</p></li><li><p><strong>Direct writes</strong> from online applications</p></li></ol><p>No matter which path you take, the writes all pass through an intermediate PubSub broker layer. From there, the Venice Storage Node (VSN) consumes messages and persists data locally using RocksDB (an embedded key-value store).</p><p>The pipeline sounds straightforward until you operate it at scale. The same ingestion path has to support very different workloads. Some are throughput-driven (bootstrapping a massive store). Some are latency-driven (current-version updates). Some are CPU-heavy (partial updates and conflict resolution). Some are I/O-heavy (compaction, SST churn).</p><p>The following sections will look at the challenges and how the LinkedIn team resolved them.</p><div><hr></div><h4>Use case 1: bootstrapping from offline dataset</h4><p>Venice users can run bulk load jobs using offline processing platforms such as Spark to push new data versions to Venice stores. The hard part is performance for large or massive stores. If you want to find bottlenecks you need to understand the ingestion path end to end.</p><p><strong>What happens during a bulk load</strong></p><ul><li><p>A Venice Push Job (VPJ) creates a new version topic for the new store version, split into multiple partitions</p></li><li><p>The Spark job uses a map-reduce framework to produce messages to that version topic</p></li><li><p>It keeps one reducer per topic partition so message ordering is preserved</p></li><li><p>On the other side, the VSN spins up consumers, reads messages and persists them into RocksDB</p></li><li><p>There is one RocksDB instance per topic partition</p></li></ul><p>So you can hit bottlenecks in three obvious places:</p><ol><li><p>producing</p></li><li><p>consuming</p></li><li><p>persisting</p></li></ol><p>Production experience says you will hit all three, just not on the same day.</p><p><strong>Improving producing and consuming throughput</strong></p><p>The usual first lever is increasing the number of partitions for large stores so you can use more of the PubSub cluster capacity. More partitions tends to mean more parallelism and more throughput.</p><p>But it comes with trade-offs:</p><ul><li><p>more partitions means more management overhead across Venice and PubSub</p></li><li><p>there is a throughput ceiling per PubSub broker</p></li></ul><p>So partition count is not a free lunch. It&#8217;s a knob that buys you throughput and charges you complexity.</p><p><strong>Enhancing consumption scalability</strong></p><p>To keep up with production, VSN uses shared consumer pools across all hosted stores.</p><p>Instead of &#8220;one store version, one set of consumers,&#8221; each store version can use multiple consumers by distributing hosted partitions among them. The point is to keep multiple connections per PubSub broker to speed up consumption (similar to a <a href="https://en.wikipedia.org/wiki/Download_manager">Download Manager</a>).</p><p>The pool approach also does something boring but important: it sets an upper limit on total consumers which puts a ceiling on cost.</p><p><strong>Optimizing I/O performance</strong></p><p>VSN uses a shared writer pool to persist changes concurrently across multiple RocksDB instances and use local SSD capacity effectively.</p><p>Ordering is critical in Venice so for any given RocksDB instance there is only one writer actively writing to it. You still get concurrency across instances, not inside one instance which is the compromise that keeps ordering intact.</p><p><strong>Minimizing memory overhead</strong></p><p>Because messages for a partition are strictly ordered (thanks to the map-reduce framework), Venice uses <a href="https://github.com/facebook/rocksdb/wiki/creating-and-ingesting-sst-files">RocksDB&#8217;s SSTFileWriter</a> to generate SST files directly. That significantly reduces memory overhead during ingestion.</p><p><strong>Ingestion workflow in Venice Server</strong></p><p>Put together, the optimized workflow is basically: use the PubSub layer for distribution, use consumer pools for scalable reads, use writer pools for SSD throughput, preserve ordering by design and avoid memory blowups by writing SST files directly.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!pbHX!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7bb8e50b-527c-4d5b-81fe-4a43bb0f3bc1_1200x944.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!pbHX!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7bb8e50b-527c-4d5b-81fe-4a43bb0f3bc1_1200x944.png 424w, https://substackcdn.com/image/fetch/$s_!pbHX!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7bb8e50b-527c-4d5b-81fe-4a43bb0f3bc1_1200x944.png 848w, https://substackcdn.com/image/fetch/$s_!pbHX!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7bb8e50b-527c-4d5b-81fe-4a43bb0f3bc1_1200x944.png 1272w, https://substackcdn.com/image/fetch/$s_!pbHX!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7bb8e50b-527c-4d5b-81fe-4a43bb0f3bc1_1200x944.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!pbHX!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7bb8e50b-527c-4d5b-81fe-4a43bb0f3bc1_1200x944.png" width="1200" height="944" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7bb8e50b-527c-4d5b-81fe-4a43bb0f3bc1_1200x944.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:944,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:191738,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/187999868?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7bb8e50b-527c-4d5b-81fe-4a43bb0f3bc1_1200x944.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!pbHX!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7bb8e50b-527c-4d5b-81fe-4a43bb0f3bc1_1200x944.png 424w, https://substackcdn.com/image/fetch/$s_!pbHX!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7bb8e50b-527c-4d5b-81fe-4a43bb0f3bc1_1200x944.png 848w, https://substackcdn.com/image/fetch/$s_!pbHX!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7bb8e50b-527c-4d5b-81fe-4a43bb0f3bc1_1200x944.png 1272w, https://substackcdn.com/image/fetch/$s_!pbHX!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7bb8e50b-527c-4d5b-81fe-4a43bb0f3bc1_1200x944.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Optimised Venice pipeline (Source: LinkedIn)</figcaption></figure></div><div><hr></div><h4>Use case 2: hybrid store</h4><p>Venice supports Lambda architecture style use cases by merging updates from both <strong>bulk loads</strong> and <strong>nearline writes</strong>. Users query a single store and get a unified view.</p><p><strong>Venice hybrid store workflow</strong></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!BaZo!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94bfa8e8-d929-494f-ad6b-b512ede167aa_1024x375.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!BaZo!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94bfa8e8-d929-494f-ad6b-b512ede167aa_1024x375.png 424w, https://substackcdn.com/image/fetch/$s_!BaZo!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94bfa8e8-d929-494f-ad6b-b512ede167aa_1024x375.png 848w, https://substackcdn.com/image/fetch/$s_!BaZo!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94bfa8e8-d929-494f-ad6b-b512ede167aa_1024x375.png 1272w, https://substackcdn.com/image/fetch/$s_!BaZo!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94bfa8e8-d929-494f-ad6b-b512ede167aa_1024x375.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!BaZo!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94bfa8e8-d929-494f-ad6b-b512ede167aa_1024x375.png" width="1024" height="375" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/94bfa8e8-d929-494f-ad6b-b512ede167aa_1024x375.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:375,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:64710,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/187999868?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94bfa8e8-d929-494f-ad6b-b512ede167aa_1024x375.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!BaZo!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94bfa8e8-d929-494f-ad6b-b512ede167aa_1024x375.png 424w, https://substackcdn.com/image/fetch/$s_!BaZo!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94bfa8e8-d929-494f-ad6b-b512ede167aa_1024x375.png 848w, https://substackcdn.com/image/fetch/$s_!BaZo!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94bfa8e8-d929-494f-ad6b-b512ede167aa_1024x375.png 1272w, https://substackcdn.com/image/fetch/$s_!BaZo!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94bfa8e8-d929-494f-ad6b-b512ede167aa_1024x375.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Hybrid store workflow (Source: LinkedIn)</figcaption></figure></div><p>How it works:</p><ul><li><p>each bulk load creates a new store version</p></li><li><p>that version has a new Kafka topic and a new database instance</p></li><li><p>real-time updates produced by a Samza job via a real-time topic are appended to both version topics to keep them current</p></li><li><p>once the new version catches up fully, it is swapped in as the active version to serve reads</p></li></ul><p>The hybrid store is important because it gives you a clean &#8220;new version build&#8221; story without losing real-time freshness. But it creates a new challenge: the database transitions from <strong>read-only</strong> to <strong>read-write</strong>.</p><p>That&#8217;s where <a href="https://github.com/facebook/rocksdb/wiki">RocksDB</a> tuning matters, because duplicates start showing up more often. Keys get updated or deleted after they were inserted. RocksDB uses <a href="https://github.com/facebook/rocksdb/wiki/Compaction">log compaction</a> to remove stale entries, but that compaction has overhead: scan, merge, rewrite SST files, consume CPU, I/O and disk.</p><p>So the core problem becomes: tune RocksDB so you can balance <a href="https://github.com/facebook/rocksdb/wiki/RocksDB-Tuning-Guide#amplification-factors">three competing types of pain.</a></p><ul><li><p><strong>Write amplification</strong>: bytes written to storage vs bytes written to the DB</p></li><li><p><strong>Read amplification</strong>: number of disk reads per query</p></li><li><p><strong>Space amplification</strong>: size of DB files on disk vs the actual data size</p></li></ul><p>Venice uses <a href="https://github.com/facebook/rocksdb/wiki/Leveled-Compaction">leveled compaction</a> by default and relies primarily on two methods to balance those trade-offs.</p><p><strong>1. Tuning the compaction trigger</strong></p><p>The key setting here is:</p><ul><li><p><strong>level0_file_num_compaction_trigger</strong></p></li></ul><p>This controls the max number of files allowed in Level-0. Once you exceed it, compaction kicks in to push SST files from Level-0 to Level-1 and onward as upper levels fill.</p><p>Why it matters:</p><ul><li><p>higher threshold &#8594; fewer compactions &#8594; lower write amplification</p></li><li><p>but also more Level-0 files &#8594; higher read amplification since reads may need to scan multiple files</p></li><li><p>plus higher space amplification because duplicates hang around longer</p></li></ul><p>Venice tunes this per cluster because clusters have different bottlenecks:</p><ul><li><p><strong>memory-serving clusters</strong> want data in RAM to speed up lookups. Memory is the limiting resource, so they set a <strong>lower threshold</strong> to reduce space amplification</p></li><li><p><strong>disk-serving clusters</strong> are often limited by disk I/O, so they set a <strong>higher threshold</strong> to reduce compaction frequency and lower disk write rate</p></li></ul><p>This is a practical tuning philosophy: tune to your real bottleneck, not a generic best practice.</p><p><strong>2. RocksDB BlobDB integration</strong></p><p><a href="https://github.com/facebook/rocksdb/wiki/BlobDB">BlobDB</a> is aimed at large-value workloads through key-value separation:</p><ul><li><p>Large values go into blob files</p></li><li><p>LSM tree stores small pointers</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!DT0h!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F74d11f76-4d71-47dd-a41a-37f0bc48666a_1200x447.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!DT0h!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F74d11f76-4d71-47dd-a41a-37f0bc48666a_1200x447.png 424w, https://substackcdn.com/image/fetch/$s_!DT0h!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F74d11f76-4d71-47dd-a41a-37f0bc48666a_1200x447.png 848w, https://substackcdn.com/image/fetch/$s_!DT0h!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F74d11f76-4d71-47dd-a41a-37f0bc48666a_1200x447.png 1272w, https://substackcdn.com/image/fetch/$s_!DT0h!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F74d11f76-4d71-47dd-a41a-37f0bc48666a_1200x447.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!DT0h!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F74d11f76-4d71-47dd-a41a-37f0bc48666a_1200x447.png" width="1200" height="447" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/74d11f76-4d71-47dd-a41a-37f0bc48666a_1200x447.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:447,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:156086,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/187999868?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F74d11f76-4d71-47dd-a41a-37f0bc48666a_1200x447.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!DT0h!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F74d11f76-4d71-47dd-a41a-37f0bc48666a_1200x447.png 424w, https://substackcdn.com/image/fetch/$s_!DT0h!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F74d11f76-4d71-47dd-a41a-37f0bc48666a_1200x447.png 848w, https://substackcdn.com/image/fetch/$s_!DT0h!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F74d11f76-4d71-47dd-a41a-37f0bc48666a_1200x447.png 1272w, https://substackcdn.com/image/fetch/$s_!DT0h!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F74d11f76-4d71-47dd-a41a-37f0bc48666a_1200x447.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">RocksDB BlobDB structure</figcaption></figure></div><p>This avoids copying large values repeatedly during compaction, reducing write amplification. The cost is additional space amplification because blobs can become unreferenced and require garbage collection.</p><p>For Venice, BlobDB integration reduced write amplification significantly in multi-tenant clusters, especially for large-value use cases. The reported impact here is big: <strong>more than a 50% reduction of disk write throughput</strong>. That matters because it avoided scaling out clusters when CPU and storage space were still available.</p><p>The win here is: you stop paying the compaction tax over and over on the same large payloads.</p><div><hr></div><h4>Use case 3: Active/active replication with partial update</h4><p>Venice guarantees eventual consistency, not strong consistency. That matters because it means you cannot just do read-modify-write operations directly due to write delays.</p><p>To handle this, Venice introduces <strong>partial update</strong>, a specialized operation that supports field-level updates and collection merges.</p><p><strong>Venice partial update workflow</strong></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ay5v!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64784492-8dfb-479c-8521-7b2ce0f62c5d_840x1320.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ay5v!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64784492-8dfb-479c-8521-7b2ce0f62c5d_840x1320.png 424w, https://substackcdn.com/image/fetch/$s_!ay5v!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64784492-8dfb-479c-8521-7b2ce0f62c5d_840x1320.png 848w, https://substackcdn.com/image/fetch/$s_!ay5v!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64784492-8dfb-479c-8521-7b2ce0f62c5d_840x1320.png 1272w, https://substackcdn.com/image/fetch/$s_!ay5v!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64784492-8dfb-479c-8521-7b2ce0f62c5d_840x1320.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ay5v!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64784492-8dfb-479c-8521-7b2ce0f62c5d_840x1320.png" width="840" height="1320" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/64784492-8dfb-479c-8521-7b2ce0f62c5d_840x1320.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1320,&quot;width&quot;:840,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:279575,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/187999868?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64784492-8dfb-479c-8521-7b2ce0f62c5d_840x1320.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ay5v!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64784492-8dfb-479c-8521-7b2ce0f62c5d_840x1320.png 424w, https://substackcdn.com/image/fetch/$s_!ay5v!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64784492-8dfb-479c-8521-7b2ce0f62c5d_840x1320.png 848w, https://substackcdn.com/image/fetch/$s_!ay5v!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64784492-8dfb-479c-8521-7b2ce0f62c5d_840x1320.png 1272w, https://substackcdn.com/image/fetch/$s_!ay5v!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64784492-8dfb-479c-8521-7b2ce0f62c5d_840x1320.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Venice partial update (Source: LinkedIn)</figcaption></figure></div><p>Inside the Venice server, the leader replica:</p><ul><li><p>decodes the incoming payload</p></li><li><p>applies the update</p></li><li><p>re-encodes the result</p></li><li><p>writes to the local database</p></li><li><p>writes to the Version Topic</p></li><li><p>follower replicas consume the merged results</p></li></ul><p>Most of that is CPU-heavy.</p><p>Then the platform evolved further with active/active replication across multiple data centers. The key mechanism is deterministic conflict resolution (DCR), similar to CRDTs. Venice tracks update timestamps at row and field levels, compares incoming timestamps with existing ones and decides to apply or skip.</p><p><strong>Venice Active/Active workflow</strong></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!36Hk!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F285f3b43-13b3-4437-b6fd-72984728e60b_1024x1516.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!36Hk!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F285f3b43-13b3-4437-b6fd-72984728e60b_1024x1516.png 424w, https://substackcdn.com/image/fetch/$s_!36Hk!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F285f3b43-13b3-4437-b6fd-72984728e60b_1024x1516.png 848w, https://substackcdn.com/image/fetch/$s_!36Hk!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F285f3b43-13b3-4437-b6fd-72984728e60b_1024x1516.png 1272w, https://substackcdn.com/image/fetch/$s_!36Hk!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F285f3b43-13b3-4437-b6fd-72984728e60b_1024x1516.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!36Hk!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F285f3b43-13b3-4437-b6fd-72984728e60b_1024x1516.png" width="1024" height="1516" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/285f3b43-13b3-4437-b6fd-72984728e60b_1024x1516.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1516,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:510735,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/187999868?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F285f3b43-13b3-4437-b6fd-72984728e60b_1024x1516.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!36Hk!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F285f3b43-13b3-4437-b6fd-72984728e60b_1024x1516.png 424w, https://substackcdn.com/image/fetch/$s_!36Hk!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F285f3b43-13b3-4437-b6fd-72984728e60b_1024x1516.png 848w, https://substackcdn.com/image/fetch/$s_!36Hk!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F285f3b43-13b3-4437-b6fd-72984728e60b_1024x1516.png 1272w, https://substackcdn.com/image/fetch/$s_!36Hk!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F285f3b43-13b3-4437-b6fd-72984728e60b_1024x1516.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Venice Active/Active workflow (Source: LinkedIn)</figcaption></figure></div><p>Now the leader replica has even more to do for DCR:</p><ul><li><p>timestamp metadata lookup</p></li><li><p>decoding</p></li><li><p>encoding</p></li></ul><p>Again: CPU heavy. So the optimisation below focus on CPU efficiency.</p><p><strong>1. Fast-Avro adoption</strong></p><p><a href="https://github.com/linkedin/avro-util">Fast-Avro</a> was originally developed by RTBHouse but LinkedIn took over maintenance under the LinkedIn namespace and introduced many optimizations.</p><p>The key idea: Fast-Avro is an alternative to Apache Avro serialization and deserialization using runtime code generation which performs significantly better than the native implementation. It supports multiple Avro versions at runtime and is widely adopted inside LinkedIn.</p><p>Venice fully integrated Fast-Avro and saw, in one major use case, up to a <strong>90% improvement in deserialization latency at p99</strong> on the application side.</p><p><strong>2. Parallel processing</strong></p><p>In the traditional pipeline, DCR and partial update operations were executed sequentially, record by record within the same partition. That leads to CPU underutilization.</p><p>Venice introduced parallel processing so multiple records can be handled concurrently within the same partition <em>before</em> producing them to the version topic, while still preserving strict ordering in the final step.</p><p>Result: significantly improved write throughput for these complex record types.</p><div><hr></div><h4>Use Case 4: Active/active replication with deterministic write latency</h4><p>Eventually consistent systems still get judged by human expectations. People want their writes to show up and they want it to happen predictably.</p><p>Venice is versioned and can ingest backup, current and future versions concurrently in a single server instance. In practice though, only the current version serves reads so deterministic write latency guarantees focus mostly there.</p><p>To improve determinism, Venice introduced a pooling strategy in ingestion with <strong>different priorities</strong> for different workload types. The Venice consumer phase is the first phase in the server ingestion pipeline and controlling the polling rate via pools is how prioritization happens.</p><p>Broad priority tiers:</p><ul><li><p>top priority: active/active and partial update workloads for the <strong>current version on the leader replica</strong> (CPU-intensive and latency-sensitive)</p></li><li><p>next: other workload types targeting the current version</p></li><li><p>then: active/active or partial update workloads for backup or future versions on the leader replica</p></li><li><p>finally: everything else in a lower-priority bucket</p></li></ul><p>This design is trying to do a few practical things:</p><ul><li><p>isolate CPU-heavy workloads so they don&#8217;t slow down lighter ones</p></li><li><p>prioritize the current version so the most up-to-date data flows smoothly</p></li><li><p>keep the number of pools limited to avoid resource management turning into a second job</p></li></ul><p>The catch is tuning. Clusters see different workloads, store behavior varies widely even within one cluster, throughput swings over time and read traffic changes throughout the day. Static configs force you to tune for worst-case, which wastes resources most of the time.</p><p>So Venice introduced adaptive throttling: dynamically adjust ingestion based on recent performance.</p><ul><li><p>if the system is within agreed SLAs, ingestion rates are adjusted according to priorities</p></li><li><p>if an SLA is violated, ingestion is throttled back immediately</p></li></ul><p>Defining the SLAs matters. Venice focuses on two key criteria:</p><ol><li><p><strong>Read latency SLA</strong>: highest priority. Never violate read latency SLAs, even if it costs ingestion throughput</p></li><li><p><strong>Write latency SLA for the current version</strong>: while read latency SLAs are met, write latency for the current version becomes top priority, pools are tuned proportionally to maximize utilization and throughput</p></li></ol><div><hr></div><h4><strong>Wrapping up</strong></h4><p>With these optimizations, Venice at LinkedIn handles:</p><ul><li><p>Over <strong>175 million key lookups per second</strong></p></li><li><p>Over <strong>230 million writes per second</strong></p></li><li><p>While maintaining a <strong>write latency SLA under 10 minutes</strong></p></li></ul><div><hr></div><h3>The full scoop</h3><p>To learn more about this, check <a href="https://www.linkedin.com/blog/engineering/infrastructure/evolution-of-the-venice-ingestion-pipeline">LinkedIn's Engineering Blog</a> post on this topic</p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.datatinkerer.io/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">If you liked this post and don&#8217;t want to miss the next one, subscribe to Data Tinkerer!</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>If you are already subscribed and enjoyed the article, please give it a like and/or share it others, really appreciate it &#128591;</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.datatinkerer.io/p/how-linkedin-built-a-pipeline-that-scales-to-230-million-records?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.datatinkerer.io/p/how-linkedin-built-a-pipeline-that-scales-to-230-million-records?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><div><hr></div><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;7dd74b6f-84de-4b87-a0cf-3e440ec7dc65&quot;,&quot;caption&quot;:&quot;Grab needed to detect schema and value issues in Kafka streams while data was still in motion.<br /><br />This piece breaks down how they introduced real-time checks and fast alerts to catch poison events before they spread.&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;lg&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;How Grab Detects Data Issues across 100+ Kafka Topics Before They Spread&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:291590442,&quot;name&quot;:&quot;Data Tinkerer&quot;,&quot;bio&quot;:&quot;Head of analytics sharing deep dives and learnings about AI and all things data (science, engineering, analysis)&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/83d3bbe0-5fb8-4f8d-9b74-036abbd6fec9_500x500.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2026-01-15T04:15:57.055Z&quot;,&quot;cover_image&quot;:&quot;https://images.unsplash.com/photo-1624957083543-9a67140fabfd?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw0fHxncmFifGVufDB8fHx8MTc2NzkzMTIwOXww&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.datatinkerer.io/p/how-grab-detects-data-issues-across-100-kafka-topics&quot;,&quot;section_name&quot;:&quot;Data Engineering&quot;,&quot;video_upload_id&quot;:null,&quot;id&quot;:183755897,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:15,&quot;comment_count&quot;:0,&quot;publication_id&quot;:3422740,&quot;publication_name&quot;:&quot;Data Tinkerer&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!JEdj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6bea5ccd-f356-4154-a1db-16268300510e_500x500.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><h3>Keep learning</h3><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;2b5e61e3-2de5-4088-981d-80de61411bd4&quot;,&quot;caption&quot;:&quot;Uber rebuilt its data lake ingestion to move freshness from hours to minutes.<br /><br />This piece breaks down how they replaced batch Spark jobs with Flink streaming, cut compute by 25% and dealt with the very real problems that show up at petabyte scale.&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;lg&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;How Uber Cut Data Lake Freshness From Hours to Minutes With Flink&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:291590442,&quot;name&quot;:&quot;Data Tinkerer&quot;,&quot;bio&quot;:&quot;Ex-head of analytics sharing deep dives and learnings about AI and all things data (science, engineering, analysis)&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/83d3bbe0-5fb8-4f8d-9b74-036abbd6fec9_500x500.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2026-01-02T04:30:31.300Z&quot;,&quot;cover_image&quot;:&quot;https://images.unsplash.com/photo-1615929362369-4af8423ebd68?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxNHx8dWJlcnxlbnwwfHx8fDE3NjcwNzc2NDZ8MA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.datatinkerer.io/p/how-uber-cut-data-lake-freshness-from-hours-to-minutes-with-flink&quot;,&quot;section_name&quot;:&quot;Data Engineering&quot;,&quot;video_upload_id&quot;:null,&quot;id&quot;:182833470,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:17,&quot;comment_count&quot;:1,&quot;publication_id&quot;:3422740,&quot;publication_name&quot;:&quot;Data Tinkerer&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!JEdj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6bea5ccd-f356-4154-a1db-16268300510e_500x500.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div>]]></content:encoded></item><item><title><![CDATA[What the Data Crowd Was Reading in January 2026]]></title><description><![CDATA[Tools, techniques and deep dives worth reading that I came across in January 2026.]]></description><link>https://www.datatinkerer.io/p/what-the-data-crowd-was-reading-in-january-2026</link><guid isPermaLink="false">https://www.datatinkerer.io/p/what-the-data-crowd-was-reading-in-january-2026</guid><dc:creator><![CDATA[Data Tinkerer]]></dc:creator><pubDate>Thu, 05 Feb 2026 03:20:52 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/c8f4e23e-4e9e-4420-bbed-d16a4d242c7d_500x500.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Fellow Data Tinkerers</p><p>It&#8217;s time for another round-up on all things data and AI!</p><p>But before that, I wanted to share with you what you could unlock if you share Data Tinkerer with just <strong>1 more person</strong>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!hhar!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc1f5fadc-b0e9-40b1-88c7-13007ac8b1e4_800x402.gif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!hhar!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc1f5fadc-b0e9-40b1-88c7-13007ac8b1e4_800x402.gif 424w, https://substackcdn.com/image/fetch/$s_!hhar!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc1f5fadc-b0e9-40b1-88c7-13007ac8b1e4_800x402.gif 848w, https://substackcdn.com/image/fetch/$s_!hhar!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc1f5fadc-b0e9-40b1-88c7-13007ac8b1e4_800x402.gif 1272w, https://substackcdn.com/image/fetch/$s_!hhar!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc1f5fadc-b0e9-40b1-88c7-13007ac8b1e4_800x402.gif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!hhar!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc1f5fadc-b0e9-40b1-88c7-13007ac8b1e4_800x402.gif" width="800" height="402" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c1f5fadc-b0e9-40b1-88c7-13007ac8b1e4_800x402.gif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:402,&quot;width&quot;:800,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:3369150,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/gif&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/186553359?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc1f5fadc-b0e9-40b1-88c7-13007ac8b1e4_800x402.gif&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!hhar!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc1f5fadc-b0e9-40b1-88c7-13007ac8b1e4_800x402.gif 424w, https://substackcdn.com/image/fetch/$s_!hhar!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc1f5fadc-b0e9-40b1-88c7-13007ac8b1e4_800x402.gif 848w, https://substackcdn.com/image/fetch/$s_!hhar!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc1f5fadc-b0e9-40b1-88c7-13007ac8b1e4_800x402.gif 1272w, https://substackcdn.com/image/fetch/$s_!hhar!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc1f5fadc-b0e9-40b1-88c7-13007ac8b1e4_800x402.gif 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>There are 100+ resources to learn all things data (science, engineering, analysis). It includes videos, courses, projects and can be filtered by tech stack (Python, SQL, Spark and etc), skill level (Beginner, Intermediate and so on) provider name or free/paid. So if you know other people who like staying up to date on all things data, please share Data Tinkerer with them!</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.datatinkerer.io/leaderboard?&amp;utm_source=post&quot;,&quot;text&quot;:&quot;Refer a friend&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.datatinkerer.io/leaderboard?&amp;utm_source=post"><span>Refer a friend</span></a></p><p>Without further ado, let&#8217;s get to the round up for January!</p><div><hr></div><h3>Data science &amp; AI</h3><ul><li><p><strong><a href="https://rlancemartin.github.io/2026/01/09/agent_design/?utm_source=datatinkerer.io&amp;utm_medium=newsletter">Agent design patterns</a> (8 minute read)<br></strong>Anthropic engineer provides a grounded guide to designing AI agents that separates real, reliable architectures from overcomplicated agent hype that doesn&#8217;t survive production.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!bCMy!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F34bed63e-01f2-4740-9291-19f48056bb6f_1974x898.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!bCMy!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F34bed63e-01f2-4740-9291-19f48056bb6f_1974x898.png 424w, https://substackcdn.com/image/fetch/$s_!bCMy!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F34bed63e-01f2-4740-9291-19f48056bb6f_1974x898.png 848w, https://substackcdn.com/image/fetch/$s_!bCMy!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F34bed63e-01f2-4740-9291-19f48056bb6f_1974x898.png 1272w, https://substackcdn.com/image/fetch/$s_!bCMy!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F34bed63e-01f2-4740-9291-19f48056bb6f_1974x898.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!bCMy!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F34bed63e-01f2-4740-9291-19f48056bb6f_1974x898.png" width="1456" height="662" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/34bed63e-01f2-4740-9291-19f48056bb6f_1974x898.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:662,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:121158,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/186553359?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F34bed63e-01f2-4740-9291-19f48056bb6f_1974x898.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!bCMy!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F34bed63e-01f2-4740-9291-19f48056bb6f_1974x898.png 424w, https://substackcdn.com/image/fetch/$s_!bCMy!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F34bed63e-01f2-4740-9291-19f48056bb6f_1974x898.png 848w, https://substackcdn.com/image/fetch/$s_!bCMy!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F34bed63e-01f2-4740-9291-19f48056bb6f_1974x898.png 1272w, https://substackcdn.com/image/fetch/$s_!bCMy!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F34bed63e-01f2-4740-9291-19f48056bb6f_1974x898.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div></li><li><p><strong><a href="https://timdettmers.com/2026/01/13/use-agents-or-be-left-behind?utm_source=datatinkerer.io&amp;utm_medium=newsletter">Use Agents or Be Left Behind? A Personal Guide to Automating Your Own Work</a> (31 minute read)</strong><br>Tim Dettmers cuts through the agent hype, arguing the real value isn&#8217;t autonomous magic but practical agents that reliably coordinate tools, memory and execution.</p></li><li><p><strong><a href="https://theforecaster.substack.com/p/piecewise-regression-for-time-series">Piecewise Regression for Time Series Forecasting</a> (7 minute read)<br></strong><span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Rami Krispin&quot;,&quot;id&quot;:116325603,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/17d6b557-4338-48c7-ba47-12899dddc77e_3541x3541.jpeg&quot;,&quot;uuid&quot;:&quot;b8501dc7-40ac-4b98-9a94-7f007562707c&quot;}" data-component-name="MentionToDOM"></span> shares a practical walkthrough of using piecewise regression on time series to detect structural breaks, regime changes and trend shifts that single global models tend to smooth over.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!jl_j!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdc18836e-b7b1-41cf-8d83-67fea83ac853_1456x819.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!jl_j!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdc18836e-b7b1-41cf-8d83-67fea83ac853_1456x819.webp 424w, https://substackcdn.com/image/fetch/$s_!jl_j!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdc18836e-b7b1-41cf-8d83-67fea83ac853_1456x819.webp 848w, https://substackcdn.com/image/fetch/$s_!jl_j!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdc18836e-b7b1-41cf-8d83-67fea83ac853_1456x819.webp 1272w, https://substackcdn.com/image/fetch/$s_!jl_j!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdc18836e-b7b1-41cf-8d83-67fea83ac853_1456x819.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!jl_j!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdc18836e-b7b1-41cf-8d83-67fea83ac853_1456x819.webp" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/dc18836e-b7b1-41cf-8d83-67fea83ac853_1456x819.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:45984,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/webp&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/186553359?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdc18836e-b7b1-41cf-8d83-67fea83ac853_1456x819.webp&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!jl_j!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdc18836e-b7b1-41cf-8d83-67fea83ac853_1456x819.webp 424w, https://substackcdn.com/image/fetch/$s_!jl_j!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdc18836e-b7b1-41cf-8d83-67fea83ac853_1456x819.webp 848w, https://substackcdn.com/image/fetch/$s_!jl_j!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdc18836e-b7b1-41cf-8d83-67fea83ac853_1456x819.webp 1272w, https://substackcdn.com/image/fetch/$s_!jl_j!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdc18836e-b7b1-41cf-8d83-67fea83ac853_1456x819.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div></li><li><p><strong><a href="https://www.artificialintelligencemadesimple.com/p/ai-is-hitting-a-measurement-wall">AI is Hitting a Measurement Wall</a> (27 minute read)<br></strong><span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Devansh&quot;,&quot;id&quot;:8101724,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/48081c70-8afa-41e3-a44e-b0f917bc7577_1200x1600.jpeg&quot;,&quot;uuid&quot;:&quot;d6db2ead-6480-40db-843a-8367916eb34a&quot;}" data-component-name="MentionToDOM"></span> Makes the case that today&#8217;s AI benchmarks are saturated and misleading, masking the growing gap between model performance on tests and value in real applications.</p></li><li><p><strong><a href="https://towardsdatascience.com/drift-detection-in-robust-machine-learning-systems/?utm_source=datatinkerer.io&amp;utm_medium=newsletter">Drift Detection in Robust Machine Learning Systems</a> (18 minute read)</strong><br>The article shows how unnoticed drift can quietly degrade model performance and outlines practical techniques to detect it early in production.</p></li><li><p><strong><a href="https://www.interconnects.ai/p/8-plots-that-explain-the-state-of">8 plots that explain the state of open models</a> (7 minute read)<br></strong>Eight charts by <span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Interconnects AI&quot;,&quot;id&quot;:48206,&quot;type&quot;:&quot;pub&quot;,&quot;url&quot;:&quot;https://open.substack.com/pub/robotic&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c52e8097-8f3d-4f7e-808b-2f4ad37f3b52_720x720.png&quot;,&quot;uuid&quot;:&quot;73df8696-4f8a-456d-9784-a4b832044ed9&quot;}" data-component-name="MentionToDOM"></span> cut through the noise to show that Chinese open models, led by Qwen, dominate real-world adoption and benchmarks, while Western challengers only compete at the very top end.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!PO2B!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe505afa7-ea5f-46a4-bc96-50a0b71781b0_1456x1200.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!PO2B!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe505afa7-ea5f-46a4-bc96-50a0b71781b0_1456x1200.webp 424w, https://substackcdn.com/image/fetch/$s_!PO2B!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe505afa7-ea5f-46a4-bc96-50a0b71781b0_1456x1200.webp 848w, https://substackcdn.com/image/fetch/$s_!PO2B!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe505afa7-ea5f-46a4-bc96-50a0b71781b0_1456x1200.webp 1272w, https://substackcdn.com/image/fetch/$s_!PO2B!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe505afa7-ea5f-46a4-bc96-50a0b71781b0_1456x1200.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!PO2B!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe505afa7-ea5f-46a4-bc96-50a0b71781b0_1456x1200.webp" width="1456" height="1200" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e505afa7-ea5f-46a4-bc96-50a0b71781b0_1456x1200.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1200,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:49346,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/webp&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/186553359?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe505afa7-ea5f-46a4-bc96-50a0b71781b0_1456x1200.webp&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!PO2B!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe505afa7-ea5f-46a4-bc96-50a0b71781b0_1456x1200.webp 424w, https://substackcdn.com/image/fetch/$s_!PO2B!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe505afa7-ea5f-46a4-bc96-50a0b71781b0_1456x1200.webp 848w, https://substackcdn.com/image/fetch/$s_!PO2B!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe505afa7-ea5f-46a4-bc96-50a0b71781b0_1456x1200.webp 1272w, https://substackcdn.com/image/fetch/$s_!PO2B!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe505afa7-ea5f-46a4-bc96-50a0b71781b0_1456x1200.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div></li><li><p><strong><a href="https://aashidutt.substack.com/p/llms-as-judges-measuring-bias-hinting">LLMs as Judges: Measuring Bias, Hinting Effects, and Tier Preferences</a> (10 minute read)<br></strong><span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Aashi&quot;,&quot;id&quot;:167292575,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3d539466-2556-451d-b783-13fb60c28bf0_144x144.png&quot;,&quot;uuid&quot;:&quot;f79f296c-3d8e-4e07-b2fa-9ff3df6a1323&quot;}" data-component-name="MentionToDOM"></span> and <span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Sayak&quot;,&quot;id&quot;:5753925,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2ea2e9c7-f95e-4da6-9971-8e75384902d1_500x500.jpeg&quot;,&quot;uuid&quot;:&quot;e48311f2-82a9-4111-b629-c0269ca0d15f&quot;}" data-component-name="MentionToDOM"></span> examine when LLMs can act as evaluators, showing how bias, prompt framing and hinting can distort model-as-judge benchmarks.</p></li><li><p><strong><a href="https://www.datatinkerer.io/p/how-to-build-a-recommendation-system-at-scale">How to Build a Recommendation System at Scale: Insights from Instacart</a> (10 minute read)<br></strong>A practical walk-through of how large-scale recommendation systems are actually built in production by <span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Ahsaas Bajaj&quot;,&quot;id&quot;:175610076,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/34dac958-9c70-4f48-89ed-6c2e0d6f197e_899x901.png&quot;,&quot;uuid&quot;:&quot;2d33319e-2798-4159-9843-f21a33367409&quot;}" data-component-name="MentionToDOM"></span> , covering modeling choices and the tradeoffs that matter once you move past toy examples.</p></li></ul><div><hr></div><h3>Data engineering</h3><ul><li><p><strong><a href="https://vutr.substack.com/p/i-spent-5-hours-learning-unity-catalog">I spent 5 hours learning Unity Catalog. Here&#8217;s everything you need to know</a> (10 minute read)<br></strong><span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Vu Trinh&quot;,&quot;id&quot;:167177248,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4805f673-db97-4f7c-85c4-44b345a8de80_256x256.png&quot;,&quot;uuid&quot;:&quot;c36904c1-7997-4c4b-84c1-cc07f367a13b&quot;}" data-component-name="MentionToDOM"></span> provides a breakdown of how Databricks&#8217; open-sourced Unity Catalog works under the hood.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!a8cL!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3b165bd5-ff03-43b1-921d-bf0f8d4f1952_1456x1040.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!a8cL!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3b165bd5-ff03-43b1-921d-bf0f8d4f1952_1456x1040.webp 424w, https://substackcdn.com/image/fetch/$s_!a8cL!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3b165bd5-ff03-43b1-921d-bf0f8d4f1952_1456x1040.webp 848w, https://substackcdn.com/image/fetch/$s_!a8cL!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3b165bd5-ff03-43b1-921d-bf0f8d4f1952_1456x1040.webp 1272w, https://substackcdn.com/image/fetch/$s_!a8cL!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3b165bd5-ff03-43b1-921d-bf0f8d4f1952_1456x1040.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!a8cL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3b165bd5-ff03-43b1-921d-bf0f8d4f1952_1456x1040.webp" width="1456" height="1040" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3b165bd5-ff03-43b1-921d-bf0f8d4f1952_1456x1040.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1040,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:77856,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/webp&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/186553359?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3b165bd5-ff03-43b1-921d-bf0f8d4f1952_1456x1040.webp&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!a8cL!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3b165bd5-ff03-43b1-921d-bf0f8d4f1952_1456x1040.webp 424w, https://substackcdn.com/image/fetch/$s_!a8cL!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3b165bd5-ff03-43b1-921d-bf0f8d4f1952_1456x1040.webp 848w, https://substackcdn.com/image/fetch/$s_!a8cL!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3b165bd5-ff03-43b1-921d-bf0f8d4f1952_1456x1040.webp 1272w, https://substackcdn.com/image/fetch/$s_!a8cL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3b165bd5-ff03-43b1-921d-bf0f8d4f1952_1456x1040.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div></li><li><p><strong><a href="https://dataengineeringcentral.substack.com/p/databricks-lakeflow-vs-apache-airflow">Databricks Lakeflow vs Apache Airflow</a> (13 minute read)<br></strong>A candid comparison by <span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Daniel Beach&quot;,&quot;id&quot;:21715962,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F81caaeec-9053-487c-a59c-ba5f8e4644ad_256x256.jpeg&quot;,&quot;uuid&quot;:&quot;45d7c0b4-a878-48a7-9c93-cf571b32ab66&quot;}" data-component-name="MentionToDOM"></span> showing how Databricks Lakeflow trades Airflow&#8217;s flexibility and openness for tighter platform integration, simpler ops and better defaults if you&#8217;re already all-in on Databricks.</p></li><li><p><strong><a href="https://www.datagibberish.com/p/the-certifications-scam">The Certifications Scam</a> (7 minute read)<br></strong>A blunt takedown of data certifications by <span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Yordan Ivanov&quot;,&quot;id&quot;:40945395,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!Ma-p!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F76f52904-5428-4d97-82a5-3faa722b8d46_2234x1253.jpeg&quot;,&quot;uuid&quot;:&quot;d037c2d2-4a2d-4ab2-b694-509b385c8f66&quot;}" data-component-name="MentionToDOM"></span> , arguing they mostly signal marketing and gatekeeping rather than real skills, experience or on-the-job impact.</p></li><li><p><strong><a href="https://pipeline2insights.substack.com/p/end-to-end-agentic-data-modeling-with-openmetadata-and-mcp">End To End Agentic Data Modeling: Using AI and OpenMetadata MCP for Impact Analysis</a> (8 minute read)</strong><br>A hands-on look at building end-to-end agentic data modeling by combining OpenMetadata with MCP-style agents to automate lineage, context sharing and model evolution across the data stack by <span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Alejandro Aboy&quot;,&quot;id&quot;:22949723,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!u1Ao!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdca2c63d-9f5e-4cd3-99ac-7d8e71dc114b_1024x1024.jpeg&quot;,&quot;uuid&quot;:&quot;814b43cd-dbe6-4127-a76a-edfb4560b7ac&quot;}" data-component-name="MentionToDOM"></span> with <span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Pipeline to Insights&quot;,&quot;id&quot;:42238863,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd98ddb69-fdec-4599-b3f2-906f7673c8de_408x408.png&quot;,&quot;uuid&quot;:&quot;6d531ff8-6365-44c6-98c7-9e3ba9bcc39f&quot;}" data-component-name="MentionToDOM"></span></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!qbBX!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4605018-38ba-4caa-aefb-707302b61f92_1297x782.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!qbBX!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4605018-38ba-4caa-aefb-707302b61f92_1297x782.webp 424w, https://substackcdn.com/image/fetch/$s_!qbBX!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4605018-38ba-4caa-aefb-707302b61f92_1297x782.webp 848w, https://substackcdn.com/image/fetch/$s_!qbBX!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4605018-38ba-4caa-aefb-707302b61f92_1297x782.webp 1272w, https://substackcdn.com/image/fetch/$s_!qbBX!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4605018-38ba-4caa-aefb-707302b61f92_1297x782.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!qbBX!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4605018-38ba-4caa-aefb-707302b61f92_1297x782.webp" width="1297" height="782" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f4605018-38ba-4caa-aefb-707302b61f92_1297x782.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:782,&quot;width&quot;:1297,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:31526,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/webp&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/186553359?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4605018-38ba-4caa-aefb-707302b61f92_1297x782.webp&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!qbBX!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4605018-38ba-4caa-aefb-707302b61f92_1297x782.webp 424w, https://substackcdn.com/image/fetch/$s_!qbBX!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4605018-38ba-4caa-aefb-707302b61f92_1297x782.webp 848w, https://substackcdn.com/image/fetch/$s_!qbBX!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4605018-38ba-4caa-aefb-707302b61f92_1297x782.webp 1272w, https://substackcdn.com/image/fetch/$s_!qbBX!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4605018-38ba-4caa-aefb-707302b61f92_1297x782.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div></li><li><p><strong><a href="https://dlthub.com/blog/building-semantic-models-with-llms-and-dlt?utm_source=datatinkerer.io&amp;utm_medium=newsletter">Autofilling the Boring Semantic Layer: From Sakila to Chat-BI with dltHub</a> (9 minute read)</strong><br>Adrian Brudaru explores how LLMs can help generate and maintain semantic models on top of data pipelines, reducing manual modeling effort while keeping analytics definitions consistent.</p></li><li><p><strong><a href="https://www.ssp.sh/blog/diary-of-a-data-engineer">A Diary of a Data Engineer</a> (13 minute read)<br></strong>A candid, day-in-the-life reflection on what data engineering actually looks like in practice by <span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Simon Sp&#228;ti&quot;,&quot;id&quot;:27855874,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/6fc84efb-1b87-4fb3-bfb1-076664f32de4_2199x2199.jpeg&quot;,&quot;uuid&quot;:&quot;36f92e57-2c7f-4348-b6a2-af9c8037e210&quot;}" data-component-name="MentionToDOM"></span>, highlighting the unglamorous but essential work that keeps data systems running day to day.</p></li><li><p><strong><a href="https://www.brentozar.com/archive/2026/01/database-development-with-ai-in-2026?utm_source=datatinkerer.io&amp;utm_medium=newsletter">Database Development with AI in 2026</a> (11 minute read)</strong><br>Brent Ozar argues that in 2026 AI will meaningfully speed up database development tasks like query writing and troubleshooting but real impact still depends on human judgment and understanding production constraints.</p></li><li><p><strong><a href="https://www.datatinkerer.io/p/how-uber-cut-data-lake-freshness-from-hours-to-minutes-with-flink">How Uber Cut Data Lake Freshness From Hours to Minutes With Flink</a> (11 minute read)<br></strong>Uber rebuilt its data lake ingestion to move freshness from hours to minutes. This piece breaks down how they replaced batch Spark jobs with Flink streaming, cut compute by 25% and dealt with the very real problems that show up at petabyte scale.</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!KkKT!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3cfb36f7-e003-4497-9e59-8a3d8b2f1d7f_768x349.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!KkKT!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3cfb36f7-e003-4497-9e59-8a3d8b2f1d7f_768x349.webp 424w, https://substackcdn.com/image/fetch/$s_!KkKT!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3cfb36f7-e003-4497-9e59-8a3d8b2f1d7f_768x349.webp 848w, https://substackcdn.com/image/fetch/$s_!KkKT!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3cfb36f7-e003-4497-9e59-8a3d8b2f1d7f_768x349.webp 1272w, https://substackcdn.com/image/fetch/$s_!KkKT!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3cfb36f7-e003-4497-9e59-8a3d8b2f1d7f_768x349.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!KkKT!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3cfb36f7-e003-4497-9e59-8a3d8b2f1d7f_768x349.webp" width="768" height="349" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3cfb36f7-e003-4497-9e59-8a3d8b2f1d7f_768x349.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:349,&quot;width&quot;:768,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:15044,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/webp&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/186553359?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3cfb36f7-e003-4497-9e59-8a3d8b2f1d7f_768x349.webp&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!KkKT!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3cfb36f7-e003-4497-9e59-8a3d8b2f1d7f_768x349.webp 424w, https://substackcdn.com/image/fetch/$s_!KkKT!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3cfb36f7-e003-4497-9e59-8a3d8b2f1d7f_768x349.webp 848w, https://substackcdn.com/image/fetch/$s_!KkKT!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3cfb36f7-e003-4497-9e59-8a3d8b2f1d7f_768x349.webp 1272w, https://substackcdn.com/image/fetch/$s_!KkKT!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3cfb36f7-e003-4497-9e59-8a3d8b2f1d7f_768x349.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div><hr></div><h3>Data analysis and visualisation</h3><ul><li><p><strong><a href="https://flowingdata.com/2025/12/31/best-data-visualization-2025?utm_source=datatinkerer.io&amp;utm_medium=newsletter">Best Data Visualization Projects of 2025</a> (3 minute read)</strong><br>FlowingData shares the best data visualisations of 2025</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!E-Ir!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9c9021a-0859-4415-a8ce-f7f8f507a015_750x579.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!E-Ir!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9c9021a-0859-4415-a8ce-f7f8f507a015_750x579.png 424w, https://substackcdn.com/image/fetch/$s_!E-Ir!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9c9021a-0859-4415-a8ce-f7f8f507a015_750x579.png 848w, https://substackcdn.com/image/fetch/$s_!E-Ir!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9c9021a-0859-4415-a8ce-f7f8f507a015_750x579.png 1272w, https://substackcdn.com/image/fetch/$s_!E-Ir!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9c9021a-0859-4415-a8ce-f7f8f507a015_750x579.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!E-Ir!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9c9021a-0859-4415-a8ce-f7f8f507a015_750x579.png" width="750" height="579" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b9c9021a-0859-4415-a8ce-f7f8f507a015_750x579.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:579,&quot;width&quot;:750,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:32463,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/186553359?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9c9021a-0859-4415-a8ce-f7f8f507a015_750x579.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!E-Ir!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9c9021a-0859-4415-a8ce-f7f8f507a015_750x579.png 424w, https://substackcdn.com/image/fetch/$s_!E-Ir!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9c9021a-0859-4415-a8ce-f7f8f507a015_750x579.png 848w, https://substackcdn.com/image/fetch/$s_!E-Ir!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9c9021a-0859-4415-a8ce-f7f8f507a015_750x579.png 1272w, https://substackcdn.com/image/fetch/$s_!E-Ir!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9c9021a-0859-4415-a8ce-f7f8f507a015_750x579.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div></li><li><p><strong><a href="https://joseparreogarcia.substack.com/p/storytelling-with-data-book-review">The book that finally taught me how to tell stories with data</a> (12 minute read)<br></strong><span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Jose Parre&#241;o Garcia&quot;,&quot;id&quot;:255728031,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!h_mv!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c4dad41-478b-4960-a5e0-98ed1e54657e_1168x1046.jpeg&quot;,&quot;uuid&quot;:&quot;d67f4c91-08f3-46e1-acaa-b45e7b2d73a7&quot;}" data-component-name="MentionToDOM"></span> reviews <em>Storytelling with Data</em>, highlighting that impact comes from framing the message and audience first, not from visualisation tricks.</p></li><li><p><strong><a href="https://nrennie.rbind.io/blog/accessible-line-chart?utm_source=datatinkerer.io&amp;utm_medium=newsletter">How to create a more accessible line chart</a> (10 minute read)<br></strong>Nicola Rennie<strong> </strong>shows how small design choices in line charts (color, contrast, labeling and annotations) dramatically improve accessibility without sacrificing clarity or insight.</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!TWIW!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F48b966f9-6048-4043-b385-ea02f874b6b0_1344x1008.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!TWIW!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F48b966f9-6048-4043-b385-ea02f874b6b0_1344x1008.png 424w, https://substackcdn.com/image/fetch/$s_!TWIW!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F48b966f9-6048-4043-b385-ea02f874b6b0_1344x1008.png 848w, https://substackcdn.com/image/fetch/$s_!TWIW!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F48b966f9-6048-4043-b385-ea02f874b6b0_1344x1008.png 1272w, https://substackcdn.com/image/fetch/$s_!TWIW!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F48b966f9-6048-4043-b385-ea02f874b6b0_1344x1008.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!TWIW!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F48b966f9-6048-4043-b385-ea02f874b6b0_1344x1008.png" width="1344" height="1008" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/48b966f9-6048-4043-b385-ea02f874b6b0_1344x1008.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1008,&quot;width&quot;:1344,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:74549,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/186553359?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F48b966f9-6048-4043-b385-ea02f874b6b0_1344x1008.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!TWIW!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F48b966f9-6048-4043-b385-ea02f874b6b0_1344x1008.png 424w, https://substackcdn.com/image/fetch/$s_!TWIW!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F48b966f9-6048-4043-b385-ea02f874b6b0_1344x1008.png 848w, https://substackcdn.com/image/fetch/$s_!TWIW!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F48b966f9-6048-4043-b385-ea02f874b6b0_1344x1008.png 1272w, https://substackcdn.com/image/fetch/$s_!TWIW!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F48b966f9-6048-4043-b385-ea02f874b6b0_1344x1008.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><ul><li><p><strong><a href="https://nastengraph.substack.com/p/5-rules-for-dashboard-filter-placement">5 Rules for Dashboard Filter Placement</a> (6 minute read)</strong><br><span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Anastasiya Kuznetsova&quot;,&quot;id&quot;:99725349,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!2E6h!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa7eb9d9c-d4e0-4f30-bc37-73eb9ffe4d53_516x534.png&quot;,&quot;uuid&quot;:&quot;0ee8d645-8d3e-4016-a263-296df3043a06&quot;}" data-component-name="MentionToDOM"></span> breaks down five practical rules for placing dashboard filters so users understand what they&#8217;re controlling without adding cognitive load or breaking trust.</p></li></ul><div><hr></div><h3><strong>Other interesting reads</strong></h3><ul><li><p><strong><a href="https://williaminmon.substack.com/p/ontologies-some-perspectives">ONTOLOGIES - SOME PERSPECTIVES</a> (20 minute read)<br></strong>A great intro and explanation of ontologies by <span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;William Inmon&quot;,&quot;id&quot;:125217701,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0c2feef8-6c8a-42f6-b044-9823cbe10e5d_144x144.png&quot;,&quot;uuid&quot;:&quot;20b045af-deb2-4334-8e3d-bf06afa00bd9&quot;}" data-component-name="MentionToDOM"></span> (Bill Inmon) and <span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Jessica Talisman&quot;,&quot;id&quot;:24176542,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!zEsI!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18f1fe4e-779e-4a27-be92-71fac460ee01_935x935.jpeg&quot;,&quot;uuid&quot;:&quot;3094c2d1-7a16-4b32-8ce3-f2a9962adbc9&quot;}" data-component-name="MentionToDOM"></span>. Really worth a read if you have heard the term a lot but are not sure what it means and how it can be applied </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!zdtA!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F785284ed-c831-4e35-baac-6f2092e946ba_336x262.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!zdtA!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F785284ed-c831-4e35-baac-6f2092e946ba_336x262.webp 424w, https://substackcdn.com/image/fetch/$s_!zdtA!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F785284ed-c831-4e35-baac-6f2092e946ba_336x262.webp 848w, https://substackcdn.com/image/fetch/$s_!zdtA!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F785284ed-c831-4e35-baac-6f2092e946ba_336x262.webp 1272w, https://substackcdn.com/image/fetch/$s_!zdtA!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F785284ed-c831-4e35-baac-6f2092e946ba_336x262.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!zdtA!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F785284ed-c831-4e35-baac-6f2092e946ba_336x262.webp" width="336" height="262" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/785284ed-c831-4e35-baac-6f2092e946ba_336x262.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:262,&quot;width&quot;:336,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:9792,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/webp&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/186553359?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F785284ed-c831-4e35-baac-6f2092e946ba_336x262.webp&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!zdtA!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F785284ed-c831-4e35-baac-6f2092e946ba_336x262.webp 424w, https://substackcdn.com/image/fetch/$s_!zdtA!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F785284ed-c831-4e35-baac-6f2092e946ba_336x262.webp 848w, https://substackcdn.com/image/fetch/$s_!zdtA!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F785284ed-c831-4e35-baac-6f2092e946ba_336x262.webp 1272w, https://substackcdn.com/image/fetch/$s_!zdtA!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F785284ed-c831-4e35-baac-6f2092e946ba_336x262.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div></li><li><p><strong><a href="https://www.nicolasbustamante.com/p/lessons-from-building-ai-agents-for?utm_source=datatinkerer.io&amp;utm_medium=newsletter">Lessons from Building AI Agents for Financial Services</a> (23 minute read)<br></strong><span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Nicolas Bustamante&quot;,&quot;id&quot;:17282676,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/cba30217-51b3-4192-b82a-d4f006dd8ad3_1536x2049.jpeg&quot;,&quot;uuid&quot;:&quot;ccdeb7af-9549-4253-b592-bc84780150ed&quot;}" data-component-name="MentionToDOM"></span> breaks down what building AI agents actually looks like in production, separating real engineering constraints from agent hype.</p></li><li><p><strong><a href="https://epoch.ai/blog/introducing-the-ai-chip-sales-data-explorer?utm_source=datatinkerer.io&amp;utm_medium=newsletter">Introducing the AI Chip Sales Data Explorer</a> (3 minute read)</strong><br>Epoch AI introduces an interactive dataset tracking global AI chip sales, shedding light on who&#8217;s actually buying compute and how hardware demand is shaping the AI race.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!H598!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd25e6466-e44a-4728-94a6-454b921092f7_1920x1080.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!H598!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd25e6466-e44a-4728-94a6-454b921092f7_1920x1080.png 424w, https://substackcdn.com/image/fetch/$s_!H598!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd25e6466-e44a-4728-94a6-454b921092f7_1920x1080.png 848w, https://substackcdn.com/image/fetch/$s_!H598!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd25e6466-e44a-4728-94a6-454b921092f7_1920x1080.png 1272w, https://substackcdn.com/image/fetch/$s_!H598!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd25e6466-e44a-4728-94a6-454b921092f7_1920x1080.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!H598!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd25e6466-e44a-4728-94a6-454b921092f7_1920x1080.png" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d25e6466-e44a-4728-94a6-454b921092f7_1920x1080.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:151406,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/186553359?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd25e6466-e44a-4728-94a6-454b921092f7_1920x1080.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!H598!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd25e6466-e44a-4728-94a6-454b921092f7_1920x1080.png 424w, https://substackcdn.com/image/fetch/$s_!H598!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd25e6466-e44a-4728-94a6-454b921092f7_1920x1080.png 848w, https://substackcdn.com/image/fetch/$s_!H598!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd25e6466-e44a-4728-94a6-454b921092f7_1920x1080.png 1272w, https://substackcdn.com/image/fetch/$s_!H598!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd25e6466-e44a-4728-94a6-454b921092f7_1920x1080.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div></li></ul><div><hr></div><h3><strong>Quick favor - need your take</strong></h3><div class="poll-embed" data-attrs="{&quot;id&quot;:443083}" data-component-name="PollToDOM"></div><p><strong>Was there any standout article or topic from January I missed? Feel free to drop a comment or hit reply, even a quick line helps.</strong></p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.datatinkerer.io/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">If you liked this post and don&#8217;t want to miss the next one, subscribe to Data Tinkerer!</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>If you are already subscribed and enjoyed the article, please give it a like and/or share it others, really appreciate it &#128591;</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.datatinkerer.io/p/how-expedia-monitors-1000-ab-tests-in-real-time-with-flink-and-kafka?utm_source=substack&amp;utm_medium=email&amp;utm_content=share&amp;action=share&amp;token=eyJ1c2VyX2lkIjoyOTE1OTA0NDIsInBvc3RfaWQiOjE2OTA5NDI3MywiaWF0IjoxNzU0NTE5MDY3LCJleHAiOjE3NTcxMTEwNjcsImlzcyI6InB1Yi0zNDIyNzQwIiwic3ViIjoicG9zdC1yZWFjdGlvbiJ9.oZvHOJmFWdVqE7IbG0eqLLsohZgpmGBltKU1W08ZN4c&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://www.datatinkerer.io/p/how-expedia-monitors-1000-ab-tests-in-real-time-with-flink-and-kafka?utm_source=substack&amp;utm_medium=email&amp;utm_content=share&amp;action=share&amp;token=eyJ1c2VyX2lkIjoyOTE1OTA0NDIsInBvc3RfaWQiOjE2OTA5NDI3MywiaWF0IjoxNzU0NTE5MDY3LCJleHAiOjE3NTcxMTEwNjcsImlzcyI6InB1Yi0zNDIyNzQwIiwic3ViIjoicG9zdC1yZWFjdGlvbiJ9.oZvHOJmFWdVqE7IbG0eqLLsohZgpmGBltKU1W08ZN4c"><span>Share</span></a></p><div><hr></div><h3>Keep learning</h3><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;6e0c97bb-8be5-42ce-a02a-36b05fdd232c&quot;,&quot;caption&quot;:&quot;It's time for another data/AI roundup and here are the highlights from December&#128071;<br /><br />&#119811;&#119834;&#119853;&#119834; &#119826;&#119836;&#119842;&#119838;&#119847;&#119836;&#119838; &amp;amp; &#119808;&#119816;<br />The state of LLMs in 2025<br />Building a data cleaning agent with LangGraph<br />Making sense of memory in AI agents<br />Exploring TabPFN: a foundation model built for tabular data<br /><br />&#119811;&#119834;&#119853;&#119834; &#119812;&#119847;&#119840;&#119842;&#119847;&#119838;&#119838;&#119851;&#119842;&#119847;&#119840;<br />Opinionated data platforms vs. open-source<br />Data quality design patterns<br />LLM for PDF data pipelines<br />DuckDB: the Swiss army knife for data engineers<br /><br />&#119811;&#119834;&#119853;&#119834; &#119808;&#119847;&#119834;&#119845;&#119858;&#119852;&#119842;&#119852; &amp;amp; &#119809;&#119816;<br />A comprehensive guide to data visualization<br />Broken charts and 9 visualization alternatives<br /><br />Plus: The most useful skill to learn as a data professional, predictions about AI in 2026 and the next data bottleneck&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;lg&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;What the Data Crowd Was Reading in December 2025&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:291590442,&quot;name&quot;:&quot;Data Tinkerer&quot;,&quot;bio&quot;:&quot;Ex-head of analytics sharing deep dives and learnings about AI and all things data (science, engineering, analysis)&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/83d3bbe0-5fb8-4f8d-9b74-036abbd6fec9_500x500.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2026-01-08T05:01:52.132Z&quot;,&quot;cover_image&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/29125fa4-9a37-40a2-a85c-c795fb77137f_500x500.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.datatinkerer.io/p/what-the-data-crowd-was-reading-in-december-2025&quot;,&quot;section_name&quot;:&quot;Data Roundup&quot;,&quot;video_upload_id&quot;:null,&quot;id&quot;:183495145,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:16,&quot;comment_count&quot;:2,&quot;publication_id&quot;:3422740,&quot;publication_name&quot;:&quot;Data Tinkerer&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!JEdj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6bea5ccd-f356-4154-a1db-16268300510e_500x500.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;738bc346-aa09-4943-a656-9c97ecf88686&quot;,&quot;caption&quot;:&quot;It's time for another data/AI roundup and here are the highlights from November&#128071;<br /><br />Data Science &amp;amp; AI<br />Context engineering becomes the real bottleneck for AI agents<br />Classic algorithms still beat most enterprise AI in ROI<br />A practical framework to identify true agentic use cases<br />Gemini 3 benefits from direct structured prompting<br /><br />Data Engineering<br />DuckLake revives relational metadata for lakehouses<br />Event streaming hits market saturation<br />Real-world consulting lessons point to simpler pipelines over hype<br />Dark data hoarding kills AI signal<br /><br />Data Analysis &amp;amp; BI<br />Dashboard testing gets a full end-to-end checklist<br />Guidance on balancing accuracy vs speed when answering business questions.<br /><br />Plus: AI-coded &#8220;good enough&#8221; apps shift the buy-vs-build boundary, low-tech industries become prime AI adopters as margins flip and new benchmark analysis suggests model performance is mostly general capability with a smaller &#8220;Claudiness&#8221; axis on top.&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;lg&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;What the Data Crowd Was Reading in November 2025&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:291590442,&quot;name&quot;:&quot;Data Tinkerer&quot;,&quot;bio&quot;:&quot;Ex-head of analytics sharing deep dives and learnings about AI and all things data (science, engineering, analysis)&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/83d3bbe0-5fb8-4f8d-9b74-036abbd6fec9_500x500.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2025-12-03T07:52:29.847Z&quot;,&quot;cover_image&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c31550f6-1fdf-4738-b384-2eeb55f71662_500x500.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.datatinkerer.io/p/what-the-data-crowd-was-reading-in-november-2025&quot;,&quot;section_name&quot;:&quot;Data Roundup&quot;,&quot;video_upload_id&quot;:null,&quot;id&quot;:180567973,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:13,&quot;comment_count&quot;:3,&quot;publication_id&quot;:3422740,&quot;publication_name&quot;:&quot;Data Tinkerer&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!JEdj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6bea5ccd-f356-4154-a1db-16268300510e_500x500.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div>]]></content:encoded></item><item><title><![CDATA[How to Build a Recommendation System at Scale: Insights from Instacart]]></title><description><![CDATA[A Senior ML Engineer on production constraints, rules vs ML and the workflow behind large-scale recommender systems]]></description><link>https://www.datatinkerer.io/p/how-to-build-a-recommendation-system-at-scale</link><guid isPermaLink="false">https://www.datatinkerer.io/p/how-to-build-a-recommendation-system-at-scale</guid><dc:creator><![CDATA[Data Tinkerer]]></dc:creator><pubDate>Thu, 29 Jan 2026 03:30:24 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/3e6c5924-ed6c-4998-8e4a-8f88d9102c8b_844x473.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Fellow Data Tinkerers,</p><p>Following on from previous posts talking to people in the field, today I will be talking with Ahsaas Bajaj who is a Senior Machine Learning Engineer at Instacart. He works on large-scale recommendation systems that serves millions of customers.</p><p>We talked about his rise from software engineering to machine learning at Instacart, how does he decide between rules based vs ML approaches and how he approaches the work now as a more senior stakeholder.</p><p>So without further ado, let&#8217;s get into it!</p><div><hr></div><h4><strong>Can you tell us a bit about your role?</strong></h4><p>I&#8217;m a Senior ML Engineer at Instacart, working across customer and shopper experiences on large-scale recommendation systems that make millions of decisions each day. For the past three years, I&#8217;ve led the technical strategy for the Product Substitutions ML system, focused on solving the out-of-stock problem. </p><p>The goal is simple: when an item isn&#8217;t available, suggest a replacement that preserves customer intent and keeps the order intact. My role spans system design, modeling and evaluation, balancing customer satisfaction, shopper efficiency and business impact at scale.</p><div><hr></div><h4><strong>How did you get into machine learning?</strong></h4><p>My path into ML wasn&#8217;t a straight line. I started as a software engineer at Samsung Research on the on-device search team, which pushed me deep into information retrieval and search system design. That work sparked an interest in research and led me to pursue a graduate degree in computer science. </p><p>It shaped how I approach ML today: less focus on models in isolation, more on how systems behave in production. I wanted that work to have real user impact, which took me to Walmart Labs and eventually to Instacart.</p><div class="pullquote"><p><em><strong>Ahsaas&#8217;s path</strong></em></p><p><em><strong>software engineer &#8594; data scientist &#8594; ML engineer &#8594; senior ML engineer</strong></em></p></div><h4><strong>What does a &#8216;typical&#8217; week look like for you?</strong></h4><p>As I&#8217;ve moved into a more senior role, the balance has shifted from pure coding to a mix of execution and direction. My week usually breaks down into three buckets:</p><p><strong>Alignment (30%)</strong>: The glue work. I spend time with product, backend engineering, and leadership aligning on roadmaps. The focus isn&#8217;t just <em>what</em> we&#8217;re building, but <em>why</em>, making sure ML work ties directly to business goals.</p><p><strong>Deep work (30%)</strong>: Hands-on modeling, coding and system design. Staying close to the code is non-negotiable for me, even at a senior level.</p><p><strong>Analysis and &#8220;the why&#8221; (40%)</strong>: This is where I spend the most time. I dig into model errors, read raw customer complaints about failed substitutions and sanity-check improvement ideas. This is also where I write proposal docs. In my view, the highest-leverage work a senior MLE does is deciding what problems to solve next, not just executing on what&#8217;s assigned.</p><div><hr></div><h4><strong>How do you decide when a problem actually needs ML or if rules-based is good enough?</strong></h4><p>I think about it in terms of complexity versus value.</p><p>If a problem can be solved deterministically with clear rules and those rules are stable and understandable, that&#8217;s often the right solution. Machine learning becomes useful when the space of behaviors is too large, nuanced, or context-dependent for rules to scale.</p><p>Good data is also a prerequisite. Without reliable signals and feedback loops, even the most sophisticated model won&#8217;t perform well in production.</p><div><hr></div><h4><strong>You have written about your work on a recommendation model at Instacart. Can you share a summary of what you have done?</strong></h4><p>I&#8217;ve spent the past three years leading the technical development of Instacart&#8217;s <a href="https://tech.instacart.com/how-instacart-uses-machine-learning-to-suggest-replacements-for-out-of-stock-products-8f80d03bb5af">Product Substitutions system</a>, which handles millions of replacement decisions daily. The core challenge is deceptively simple: when a customer&#8217;s requested item is out of stock, what should we suggest instead?</p><p>What makes this interesting from an ML perspective is that it&#8217;s fundamentally a relevance problem, not a search problem. We&#8217;re not just matching product attributes&#8212;we&#8217;re trying to understand what the customer actually wanted and find alternatives that preserve that intent. This required rethinking how we model the relationship between items, how we define &#8220;good&#8221; substitutions, and how we evaluate success in a way that maps to real customer satisfaction.</p><p>The system has evolved significantly over time, moving from simpler heuristics to more sophisticated learned representations. But the north star has always been the same: keep orders complete while respecting what customers actually care about.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!YXvE!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06d474a9-46da-42ae-be81-a0ca692fb52f_720x187.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!YXvE!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06d474a9-46da-42ae-be81-a0ca692fb52f_720x187.webp 424w, https://substackcdn.com/image/fetch/$s_!YXvE!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06d474a9-46da-42ae-be81-a0ca692fb52f_720x187.webp 848w, https://substackcdn.com/image/fetch/$s_!YXvE!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06d474a9-46da-42ae-be81-a0ca692fb52f_720x187.webp 1272w, https://substackcdn.com/image/fetch/$s_!YXvE!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06d474a9-46da-42ae-be81-a0ca692fb52f_720x187.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!YXvE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06d474a9-46da-42ae-be81-a0ca692fb52f_720x187.webp" width="720" height="187" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/06d474a9-46da-42ae-be81-a0ca692fb52f_720x187.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:187,&quot;width&quot;:720,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:8232,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/webp&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/181648418?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06d474a9-46da-42ae-be81-a0ca692fb52f_720x187.webp&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!YXvE!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06d474a9-46da-42ae-be81-a0ca692fb52f_720x187.webp 424w, https://substackcdn.com/image/fetch/$s_!YXvE!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06d474a9-46da-42ae-be81-a0ca692fb52f_720x187.webp 848w, https://substackcdn.com/image/fetch/$s_!YXvE!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06d474a9-46da-42ae-be81-a0ca692fb52f_720x187.webp 1272w, https://substackcdn.com/image/fetch/$s_!YXvE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06d474a9-46da-42ae-be81-a0ca692fb52f_720x187.webp 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption">Siamese network (Source: <a href="https://tech.instacart.com/how-instacart-uses-machine-learning-to-suggest-replacements-for-out-of-stock-products-8f80d03bb5af">Instacart</a>)</figcaption></figure></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!uWDY!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbe97d103-ab8d-46c0-a6c1-dc95484a86c1_720x447.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!uWDY!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbe97d103-ab8d-46c0-a6c1-dc95484a86c1_720x447.webp 424w, https://substackcdn.com/image/fetch/$s_!uWDY!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbe97d103-ab8d-46c0-a6c1-dc95484a86c1_720x447.webp 848w, https://substackcdn.com/image/fetch/$s_!uWDY!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbe97d103-ab8d-46c0-a6c1-dc95484a86c1_720x447.webp 1272w, https://substackcdn.com/image/fetch/$s_!uWDY!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbe97d103-ab8d-46c0-a6c1-dc95484a86c1_720x447.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!uWDY!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbe97d103-ab8d-46c0-a6c1-dc95484a86c1_720x447.webp" width="720" height="447" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/be97d103-ab8d-46c0-a6c1-dc95484a86c1_720x447.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:447,&quot;width&quot;:720,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:15228,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/webp&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/181648418?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbe97d103-ab8d-46c0-a6c1-dc95484a86c1_720x447.webp&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!uWDY!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbe97d103-ab8d-46c0-a6c1-dc95484a86c1_720x447.webp 424w, https://substackcdn.com/image/fetch/$s_!uWDY!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbe97d103-ab8d-46c0-a6c1-dc95484a86c1_720x447.webp 848w, https://substackcdn.com/image/fetch/$s_!uWDY!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbe97d103-ab8d-46c0-a6c1-dc95484a86c1_720x447.webp 1272w, https://substackcdn.com/image/fetch/$s_!uWDY!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbe97d103-ab8d-46c0-a6c1-dc95484a86c1_720x447.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Product layer: one each for original and candidate product (Source: <a href="https://tech.instacart.com/how-instacart-uses-machine-learning-to-suggest-replacements-for-out-of-stock-products-8f80d03bb5af">Instacart</a>)</figcaption></figure></div><div><hr></div><h4><strong>And what has been the impact on the business?</strong></h4><p>Substitutions sit at a critical junction in the order lifecycle. When done well, they&#8217;re invisible - customers get what they need and the order stays intact. When done poorly, they create friction everywhere: customers reject items or request refunds, shoppers waste time on unsuccessful suggestions, and order values drop.</p><p>Our work has meaningfully moved the needle on the metrics that matter: replacement acceptance rates, refund frequency, and what we call &#8220;perfect order fill rate&#8221;&#8212;the percentage of orders where every item was either found or successfully replaced. These improvements compound across millions of weekly orders.</p><p>Beyond the immediate transactional metrics, we&#8217;ve also seen positive signals in repeat ordering behavior and customer satisfaction scores, particularly for orders that required multiple substitutions. Instacart has <a href="https://investors.instacart.com/static-files/27fac1c6-da32-40ca-8ef4-c8261b5ee12b">referenced</a> this system publicly when discussing operational improvements at scale.</p><p>For me, the real validation is when customers don&#8217;t notice the algorithm at all - they just notice their groceries arrived complete.</p><div><hr></div><h4><strong>What does the tech stack look like for ML at Instacart?</strong></h4><p>Instacart&#8217;s ML stack is built around an internal platform called <a href="https://www.instacart.com/company/tech-innovation/griffin-how-instacarts-ml-platform-tripled-ml-applications-in-a-year">Griffin</a>, which standardizes the end-to-end ML lifecycle, from feature engineering and training to deployment and real-time inference. A core piece of this is a shared Feature Marketplace, where teams define, version and reuse batch and streaming features with strong offline-to-online consistency.</p><p>Workflows are orchestrated with Apache Airflow and model training runs through a unified abstraction that supports multiple compute backends and common ML frameworks. With <a href="https://tech.instacart.com/introducing-griffin-2-0-instacarts-next-gen-ml-platform-b7331e73b8d7">Griffin 2.0</a>, the platform moved to a Kubernetes-based setup and added distributed training with Ray, which significantly improved scalability and iteration speed.</p><p>Griffin also includes a centralized model registry and metadata store, making experiments easier to track and reproduce. In production, models are deployed as standardized services that handle feature loading and low-latency inference across both customer and shopper experiences.</p><p>The main benefit is focus: teams spend less time on infrastructure and more time on modeling, evaluation and trade-offs.</p><div><hr></div><h4><strong>How do you use AI in your day-to-day work and where do you find it genuinely valuable?</strong></h4><p>I&#8217;ve integrated GenAI primarily to shift my focus from execution to decision-making. It&#8217;s useful for routine tasks like scaffolding data pipelines or optimizing SQL queries, but I find the highest leverage comes from <strong>qualitative analysis</strong>.</p><p>I routinely feed thousands of customer comments and shopper notes about bad substitutions into LLM-driven pipelines that cluster feedback into coherent themes. What used to be unstructured noise becomes a prioritized list of failure modes. This allows me to spend less time parsing data and more time solving the specific problems that actually impact customer trust.</p><div><hr></div><h4><strong>How has your perspective changed moving to a more senior role? </strong></h4><p>The biggest shift is realizing that <strong>Judgment &gt; Code</strong>. Early in my career, I obsessed over the <em>how</em> - the architecture, the libraries, the latency. Now, I obsess over the <em>what </em>and the<em> why.</em> The real work is filtering ideas. In a sea of seemingly good ideas, my job is to find the <em>most bullish</em> one - the one with the highest ROI - and kill the others.</p><p>I&#8217;ve also learned that <strong>Writing is Engineering.</strong> You cannot build big things alone. To get buy-in from leadership and cross-functional teams, you must be able to write crisp, narrative-driven proposals that explain <em>why</em> this mathematical solution solves a human problem.</p><div class="pullquote"><p><strong>The biggest shift is realizing that Judgment &gt; Code</strong></p></div><h4><strong>What&#8217;s one thing you wish you&#8217;d known earlier about machine learning?</strong></h4><p>The value of <strong>error analysis</strong>. It&#8217;s easy to celebrate aggregate metrics like accuracy or F1 but the real breakthroughs come from studying the &#8220;horror cases,&#8221; where the model is confidently wrong. Those examples are uncomfortable to look at but they&#8217;re where the most useful ideas come from. You can&#8217;t fix what you don&#8217;t deeply understand.</p><div><hr></div><p>If you enjoyed reading this, check out Ahsaas&#8217;s <a href="https://tech.instacart.com/how-instacart-uses-machine-learning-to-suggest-replacements-for-out-of-stock-products-8f80d03bb5af">original article</a> about his work at Instacart</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://tech.instacart.com/how-instacart-uses-machine-learning-to-suggest-replacements-for-out-of-stock-products-8f80d03bb5af" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!QfuD!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F276ea8ad-5379-4559-8bae-2cb8d384a294_692x394.png 424w, https://substackcdn.com/image/fetch/$s_!QfuD!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F276ea8ad-5379-4559-8bae-2cb8d384a294_692x394.png 848w, https://substackcdn.com/image/fetch/$s_!QfuD!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F276ea8ad-5379-4559-8bae-2cb8d384a294_692x394.png 1272w, https://substackcdn.com/image/fetch/$s_!QfuD!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F276ea8ad-5379-4559-8bae-2cb8d384a294_692x394.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!QfuD!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F276ea8ad-5379-4559-8bae-2cb8d384a294_692x394.png" width="692" height="394" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/276ea8ad-5379-4559-8bae-2cb8d384a294_692x394.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:394,&quot;width&quot;:692,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:197952,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:&quot;https://tech.instacart.com/how-instacart-uses-machine-learning-to-suggest-replacements-for-out-of-stock-products-8f80d03bb5af&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/181648418?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F276ea8ad-5379-4559-8bae-2cb8d384a294_692x394.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!QfuD!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F276ea8ad-5379-4559-8bae-2cb8d384a294_692x394.png 424w, https://substackcdn.com/image/fetch/$s_!QfuD!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F276ea8ad-5379-4559-8bae-2cb8d384a294_692x394.png 848w, https://substackcdn.com/image/fetch/$s_!QfuD!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F276ea8ad-5379-4559-8bae-2cb8d384a294_692x394.png 1272w, https://substackcdn.com/image/fetch/$s_!QfuD!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F276ea8ad-5379-4559-8bae-2cb8d384a294_692x394.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div><hr></div><p>Was there a question that you would like to ask?</p><p><strong>Let me know your thoughts by replying to the email or leaving a comment below!</strong></p><p>If you are already subscribed and enjoyed the article, please give it a like and/or share it others, really appreciate it &#128591;</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.datatinkerer.io/p/from-dental-cleaning-to-data-cleaning-pivoting-to-healthcare-analytics?utm_source=substack&amp;utm_medium=email&amp;utm_content=share&amp;action=share&amp;token=eyJ1c2VyX2lkIjoyOTE1OTA0NDIsInBvc3RfaWQiOjE3NjQ3MzIyMywiaWF0IjoxNzY4ODkyNzM0LCJleHAiOjE3NzE0ODQ3MzQsImlzcyI6InB1Yi0zNDIyNzQwIiwic3ViIjoicG9zdC1yZWFjdGlvbiJ9.vxPR9Jc4G7L4Yjw3wvlaaj8dKYSscG1A_D7Wiblqr1o&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://www.datatinkerer.io/p/from-dental-cleaning-to-data-cleaning-pivoting-to-healthcare-analytics?utm_source=substack&amp;utm_medium=email&amp;utm_content=share&amp;action=share&amp;token=eyJ1c2VyX2lkIjoyOTE1OTA0NDIsInBvc3RfaWQiOjE3NjQ3MzIyMywiaWF0IjoxNzY4ODkyNzM0LCJleHAiOjE3NzE0ODQ3MzQsImlzcyI6InB1Yi0zNDIyNzQwIiwic3ViIjoicG9zdC1yZWFjdGlvbiJ9.vxPR9Jc4G7L4Yjw3wvlaaj8dKYSscG1A_D7Wiblqr1o"><span>Share</span></a></p><div><hr></div><h3><strong>Keep reading</strong></h3><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;2d3312e0-ea8d-4705-959d-5748abc99f31&quot;,&quot;caption&quot;:&quot;Today I will be talking with Jose Parre&#241;o Garcia who is a Senior Data Science Manager at Skyscanner and writer of the Senior Data Science Lead newsletter.<br /><br />We talked about his rise from data analyst to Senior DS Manager at Skyscanner, what &#8220;production-ready&#8221; really means and why the real intelligence in data science lives before and after the model.&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;lg&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;From Data Analyst to Senior DS Manager at Skyscanner&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:291590442,&quot;name&quot;:&quot;Data Tinkerer&quot;,&quot;bio&quot;:&quot;Ex-head of analytics sharing deep dives and learnings about AI and all things data (science, engineering, analysis)&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/83d3bbe0-5fb8-4f8d-9b74-036abbd6fec9_500x500.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null},{&quot;id&quot;:255728031,&quot;name&quot;:&quot;Jose Parre&#241;o Garcia&quot;,&quot;bio&quot;:&quot;I write about Data Science, Machine Learning and leading data teams. I have built teams from scratch and lead 50+ data scientists @Skyscanner. Now, I share my experience with you.&quot;,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!h_mv!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c4dad41-478b-4960-a5e0-98ed1e54657e_1168x1046.jpeg&quot;,&quot;is_guest&quot;:true,&quot;bestseller_tier&quot;:null,&quot;primaryPublicationSubscribeUrl&quot;:&quot;https://joseparreogarcia.substack.com/subscribe?&quot;,&quot;primaryPublicationUrl&quot;:&quot;https://joseparreogarcia.substack.com&quot;,&quot;primaryPublicationName&quot;:&quot;Senior Data Science Lead&quot;,&quot;primaryPublicationId&quot;:2833541}],&quot;post_date&quot;:&quot;2025-11-13T03:54:26.969Z&quot;,&quot;cover_image&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/06735d58-e8f2-4106-88ae-efe0658c217c_764x661.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.datatinkerer.io/p/from-data-analyst-to-senior-ds-manager-at-skyscanner&quot;,&quot;section_name&quot;:&quot;Data Science&quot;,&quot;video_upload_id&quot;:null,&quot;id&quot;:176541975,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:13,&quot;comment_count&quot;:2,&quot;publication_id&quot;:3422740,&quot;publication_name&quot;:&quot;Data Tinkerer&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!JEdj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6bea5ccd-f356-4154-a1db-16268300510e_500x500.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;c93130ad-2195-48c8-b16d-9ee951675f0b&quot;,&quot;caption&quot;:&quot;Check out the breakdown of Ahsaas's original article which we published last year!&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;lg&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;The Art of Substitution: Instacart&#8217;s ML Model for Better Shopping Choices&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:291590442,&quot;name&quot;:&quot;Data Tinkerer&quot;,&quot;bio&quot;:&quot;Ex-head of analytics sharing deep dives and learnings about AI and all things data (science, engineering, analysis)&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/83d3bbe0-5fb8-4f8d-9b74-036abbd6fec9_500x500.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2025-01-12T23:01:15.656Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/$s_!9h_o!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F00a24c5b-b7a6-4d5c-9bc2-0e7b691d7d75_4800x2700.webp&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.datatinkerer.io/p/the-art-of-substitution-instacarts-ml-model&quot;,&quot;section_name&quot;:&quot;Data Science&quot;,&quot;video_upload_id&quot;:null,&quot;id&quot;:154057578,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:2,&quot;comment_count&quot;:0,&quot;publication_id&quot;:3422740,&quot;publication_name&quot;:&quot;Data Tinkerer&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!JEdj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6bea5ccd-f356-4154-a1db-16268300510e_500x500.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div>]]></content:encoded></item><item><title><![CDATA[How DoorDash Saves Tens of Millions of Dollars Per Year by Detecting Fraud 30× Faster]]></title><description><![CDATA[A daily anomaly detection system that cut discovery time from 100+ days to under three.]]></description><link>https://www.datatinkerer.io/p/how-doordash-saves-tens-of-millions-a-year-by-detecting-fraud</link><guid isPermaLink="false">https://www.datatinkerer.io/p/how-doordash-saves-tens-of-millions-a-year-by-detecting-fraud</guid><dc:creator><![CDATA[Data Tinkerer]]></dc:creator><pubDate>Fri, 23 Jan 2026 05:56:24 GMT</pubDate><enclosure url="https://images.unsplash.com/photo-1648091855444-76f97897dcd4?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxfHxkb29yZGFzaHxlbnwwfHx8fDE3NjkxNDc0MjB8MA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Fellow Data Tinkerers!</p><p>Today we will look at how DoorDash uses anomaly detection to save millions of dollars by flagging fraud trends early. </p><p>But before that, I wanted to share with you what you could unlock if you share Data Tinkerer with just <strong>1 more person</strong>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!2Roe!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5e760a1-29af-4234-b7eb-7b6070bb0d44_800x402.gif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!2Roe!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5e760a1-29af-4234-b7eb-7b6070bb0d44_800x402.gif 424w, https://substackcdn.com/image/fetch/$s_!2Roe!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5e760a1-29af-4234-b7eb-7b6070bb0d44_800x402.gif 848w, https://substackcdn.com/image/fetch/$s_!2Roe!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5e760a1-29af-4234-b7eb-7b6070bb0d44_800x402.gif 1272w, https://substackcdn.com/image/fetch/$s_!2Roe!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5e760a1-29af-4234-b7eb-7b6070bb0d44_800x402.gif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!2Roe!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5e760a1-29af-4234-b7eb-7b6070bb0d44_800x402.gif" width="800" height="402" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c5e760a1-29af-4234-b7eb-7b6070bb0d44_800x402.gif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:402,&quot;width&quot;:800,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:3369150,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/gif&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/175671629?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5e760a1-29af-4234-b7eb-7b6070bb0d44_800x402.gif&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!2Roe!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5e760a1-29af-4234-b7eb-7b6070bb0d44_800x402.gif 424w, https://substackcdn.com/image/fetch/$s_!2Roe!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5e760a1-29af-4234-b7eb-7b6070bb0d44_800x402.gif 848w, https://substackcdn.com/image/fetch/$s_!2Roe!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5e760a1-29af-4234-b7eb-7b6070bb0d44_800x402.gif 1272w, https://substackcdn.com/image/fetch/$s_!2Roe!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5e760a1-29af-4234-b7eb-7b6070bb0d44_800x402.gif 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>There are 100+ resources to learn all things data (science, engineering, analysis). It includes videos, courses, projects and can be filtered by tech stack (Python, SQL, Spark and etc), skill level (Beginner, Intermediate and so on)  provider name or free/paid. So if you know other people who like staying up to date on all things data, please share Data Tinkerer with them!</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.datatinkerer.io/leaderboard?&amp;referrer_token=4tlsmi&amp;utm_source=post&quot;,&quot;text&quot;:&quot;Refer a friend&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://www.datatinkerer.io/leaderboard?&amp;referrer_token=4tlsmi&amp;utm_source=post"><span>Refer a friend</span></a></p><p>Now, with that out of the way, let&#8217;s get to DoorDash&#8217;s fraud detection!</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://images.unsplash.com/photo-1648091855444-76f97897dcd4?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxfHxkb29yZGFzaHxlbnwwfHx8fDE3NjkxNDc0MjB8MA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://images.unsplash.com/photo-1648091855444-76f97897dcd4?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxfHxkb29yZGFzaHxlbnwwfHx8fDE3NjkxNDc0MjB8MA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 424w, https://images.unsplash.com/photo-1648091855444-76f97897dcd4?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxfHxkb29yZGFzaHxlbnwwfHx8fDE3NjkxNDc0MjB8MA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 848w, https://images.unsplash.com/photo-1648091855444-76f97897dcd4?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxfHxkb29yZGFzaHxlbnwwfHx8fDE3NjkxNDc0MjB8MA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 1272w, https://images.unsplash.com/photo-1648091855444-76f97897dcd4?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxfHxkb29yZGFzaHxlbnwwfHx8fDE3NjkxNDc0MjB8MA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 1456w" sizes="100vw"><img src="https://images.unsplash.com/photo-1648091855444-76f97897dcd4?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxfHxkb29yZGFzaHxlbnwwfHx8fDE3NjkxNDc0MjB8MA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080" width="5320" height="3377" data-attrs="{&quot;src&quot;:&quot;https://images.unsplash.com/photo-1648091855444-76f97897dcd4?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxfHxkb29yZGFzaHxlbnwwfHx8fDE3NjkxNDc0MjB8MA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:3377,&quot;width&quot;:5320,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;a close up of a cell phone on a table&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="a close up of a cell phone on a table" title="a close up of a cell phone on a table" srcset="https://images.unsplash.com/photo-1648091855444-76f97897dcd4?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxfHxkb29yZGFzaHxlbnwwfHx8fDE3NjkxNDc0MjB8MA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 424w, https://images.unsplash.com/photo-1648091855444-76f97897dcd4?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxfHxkb29yZGFzaHxlbnwwfHx8fDE3NjkxNDc0MjB8MA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 848w, https://images.unsplash.com/photo-1648091855444-76f97897dcd4?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxfHxkb29yZGFzaHxlbnwwfHx8fDE3NjkxNDc0MjB8MA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 1272w, https://images.unsplash.com/photo-1648091855444-76f97897dcd4?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxfHxkb29yZGFzaHxlbnwwfHx8fDE3NjkxNDc0MjB8MA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Photo by <a href="https://unsplash.com/@querysprout">Marques Thomas</a> on <a href="https://unsplash.com">Unsplash</a></figcaption></figure></div><h3>TL;DR</h3><div><hr></div><h4><strong>Situation</strong></h4><p>Fraud trends at DoorDash often blended into normal delivery noise and went unnoticed for weeks, causing avoidable losses. Existing detection was reactive and too slow.</p><h4><strong>Task</strong></h4><p>Detect emerging fraud trends early across millions of users and segments, before they materially impact top-line metrics.</p><h4><strong>Action</strong></h4><p>Build a daily anomaly detection platform that segments key fraud metrics across millions of overlapping dimensions, applies time-series z-score detection, clusters related anomalies and routes them into an ops investigation workflow.</p><h4><strong>Result</strong></h4><p>Cut average fraud detection time from 100+ days to under 3 days, surfaced 60%+ of new fraud trends early, and saved tens of millions annually.</p><h4><strong>Use Cases</strong></h4><p>Anomaly detection, fraud detection, payment monitoring, policy change impact monitoring</p><h4><strong>Tech Stack/Framework</strong></h4><p>Apache Airflow, DuckDB, Apache Spark, Python</p><div><hr></div><h3>Explained further</h3><div><hr></div><h4>Fraud trend detection before it becomes a headline</h4><p>Fraud doesn&#8217;t always kick the door down. Sometimes it slips in through the side window and blends into the noise of millions of legitimate deliveries.</p><p>A small spike in refund claims. A pattern in high-risk charges linked to a specific bank. A subtle shift in behavior that looks like randomness until it isn&#8217;t. Left alone, those early signals can snowball into a large trend with real top-line impact.</p><p>DoorDash&#8217;s fraud team wanted to flip the script. Instead of reacting after a new fraud trend has had weeks to grow unchecked, how could they spot it as early as possible, before significant damage is done?</p><p>This post shares how the DoorDash team built an anomaly detection platform that scans for emerging patterns across millions of user segments and surfaces the ones that matter before they spiral into major losses.</p><div><hr></div><h4>Terminology</h4><p>&#8216;Anomaly detection&#8217; is a broad term. Even within fraud, people can mean very different things by it. For this system, DoorDash defined two categories up front:</p><p><strong>Anomalous trend detection</strong></p><p>Looking for anomalous behavior in a <em>collection</em> of users that may represent a new fraud or false-positive trend.</p><p>Here, no single datapoint needs to be weird. The anomaly is the time-series pattern that emerges from many points together, like a growing fraud segment over time.</p><p><strong>Anomalous outlier detection</strong></p><p>Looking for <em>individual</em> outliers, like a specific user or transaction that is rare or deviates sharply from normal behavior.</p><p>In this case, the datapoint is the anomaly. It might be part of a broader trend, or it might be a one-off.</p><p>This post focuses how DoorDash built a system to detect <strong>anomalous trends</strong>.</p><p>Here are some terms used within the article and their definitions and examples to make them easier to understand.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!a0Fu!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98747126-080a-4ec0-954e-03e2ce6fdeb8_2173x757.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!a0Fu!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98747126-080a-4ec0-954e-03e2ce6fdeb8_2173x757.png 424w, https://substackcdn.com/image/fetch/$s_!a0Fu!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98747126-080a-4ec0-954e-03e2ce6fdeb8_2173x757.png 848w, https://substackcdn.com/image/fetch/$s_!a0Fu!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98747126-080a-4ec0-954e-03e2ce6fdeb8_2173x757.png 1272w, https://substackcdn.com/image/fetch/$s_!a0Fu!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98747126-080a-4ec0-954e-03e2ce6fdeb8_2173x757.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!a0Fu!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98747126-080a-4ec0-954e-03e2ce6fdeb8_2173x757.png" width="1456" height="507" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/98747126-080a-4ec0-954e-03e2ce6fdeb8_2173x757.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:507,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:132407,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/185495640?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98747126-080a-4ec0-954e-03e2ce6fdeb8_2173x757.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!a0Fu!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98747126-080a-4ec0-954e-03e2ce6fdeb8_2173x757.png 424w, https://substackcdn.com/image/fetch/$s_!a0Fu!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98747126-080a-4ec0-954e-03e2ce6fdeb8_2173x757.png 848w, https://substackcdn.com/image/fetch/$s_!a0Fu!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98747126-080a-4ec0-954e-03e2ce6fdeb8_2173x757.png 1272w, https://substackcdn.com/image/fetch/$s_!a0Fu!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98747126-080a-4ec0-954e-03e2ce6fdeb8_2173x757.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Terminology table (Source: Doordash)</figcaption></figure></div><div><hr></div><h4><strong>Designing the system around real fraud failures</strong></h4><p>The DoorDash team started the way you&#8217;d hope a fraud platform starts: by talking to the people who have to use it.</p><p>They met with frontline fraud teams responsible for tracking and fighting new fraud trends and asked for concrete historical examples of trends that simmered longer than ideal before being discovered and mitigated. These became the positive test cases.</p><p>Next, the teams were asked for:</p><ul><li><p>Their most useful early-warning indicator <strong>metrics</strong></p></li><li><p>The <strong>dimensions</strong> they commonly use to slice data when investigating a new fraud trend</p></li></ul><p>That produced a working set of:</p><ul><li><p>Positive examples (historical missed or late-found fraud trends)</p></li><li><p>A set of metrics that act as early-warning signals</p></li><li><p>A set of dimensions that represent how investigators naturally segment the world</p></li></ul><p>Then the DoorDash team built the system and backtested it. Tuning came next, but the tuning goal was very specific:</p><p>1- Maintain 100% recall on the test trends<br>2- Minimise the number of non-fraudulent anomalies per day</p><p>One observation stood out from this phase. The system was fairly insensitive to exact tuning values. What mattered more was upstream: choosing thoughtful metrics and dimensions that can actually capture fraud trends in the first place.</p><p>In other words: the math is important but the slices you choose decide what you can even see.</p><div><hr></div><h4>Architecture overview</h4><p>The anomaly detection platform runs as a daily job coordinated by Airflow. It looks for fraud trends growing on a day-to-week timescale.</p><p>DoorDash currently runs anomaly detection jobs for both <strong>consumer fraud</strong> and <strong>Dasher fraud</strong>, with plans to expand to more applications over time.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Osej!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0b856a6b-f238-472b-a3a3-8ab309843cf8_1024x268.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Osej!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0b856a6b-f238-472b-a3a3-8ab309843cf8_1024x268.webp 424w, https://substackcdn.com/image/fetch/$s_!Osej!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0b856a6b-f238-472b-a3a3-8ab309843cf8_1024x268.webp 848w, https://substackcdn.com/image/fetch/$s_!Osej!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0b856a6b-f238-472b-a3a3-8ab309843cf8_1024x268.webp 1272w, https://substackcdn.com/image/fetch/$s_!Osej!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0b856a6b-f238-472b-a3a3-8ab309843cf8_1024x268.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Osej!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0b856a6b-f238-472b-a3a3-8ab309843cf8_1024x268.webp" width="1024" height="268" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0b856a6b-f238-472b-a3a3-8ab309843cf8_1024x268.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:268,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:43376,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/webp&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/185495640?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0b856a6b-f238-472b-a3a3-8ab309843cf8_1024x268.webp&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Osej!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0b856a6b-f238-472b-a3a3-8ab309843cf8_1024x268.webp 424w, https://substackcdn.com/image/fetch/$s_!Osej!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0b856a6b-f238-472b-a3a3-8ab309843cf8_1024x268.webp 848w, https://substackcdn.com/image/fetch/$s_!Osej!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0b856a6b-f238-472b-a3a3-8ab309843cf8_1024x268.webp 1272w, https://substackcdn.com/image/fetch/$s_!Osej!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0b856a6b-f238-472b-a3a3-8ab309843cf8_1024x268.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Doordash anomaly detection platform (Source: Doordash)</figcaption></figure></div><p>The platform has five steps:</p><ol><li><p>Preparing daily fraud snapshots</p></li><li><p>Metric aggregation on multi-dimensional segments</p></li><li><p>Time-series anomaly detection</p></li><li><p>Hierarchical clustering on anomalous segments</p></li><li><p>Turning clusters into investigations and containment</p></li></ol><div><hr></div><h4>Step 1: Preparing daily fraud snapshots</h4><p>DoorDash chose a daily batch job for the initial implementation because the fraud trends they historically missed developed over <strong>a few days to a few weeks</strong>.</p><p>An Airflow DAG prepares a dataset for each anomaly detection job containing the day&#8217;s data snapshot in a wide-table format.</p><p>If the trends you historically missed unfold across days and weeks, you do not need sub-second streaming to get meaningful wins. You need consistency, coverage and a reliable cadence.</p><div><hr></div><h4>Step 2: Metric aggregation on multi-dimensional segments</h4><p>This is the scale step. Once the daily snapshot is ready, DoorDash loads the single date&#8217;s data into a Python environment via Spark, then computes metric aggregates across segments.</p><p>For each metric, they track both:</p><ul><li><p><strong>Absolute value</strong> of the metric</p><ul><li><p>Example: dollar value of credit and refund claims</p></li></ul></li><li><p><strong>Relative (normalized) value</strong> of the metric</p><ul><li><p>Example: credit and refund claims divided by dollar value of orders</p></li></ul></li></ul><p>Why both? because absolute values catch &#8216;this is costing real money&#8217; and relative values catch &#8216;this is spiking compared to what is normal for this slice&#8217;.</p><p>Then comes segmentation. Segments are formed from single, double and triple product combinations of all dimensions. That quickly becomes huge and can run into 100s of millions of segments at Doordash scale and compute becomes important</p><p><strong>DuckDB for aggregation</strong></p><p>DoorDash computes metric aggregates using DuckDB, an in-memory Python database optimised for fast OLAP-style operations.</p><p>They chose DuckDB because it was:</p><ul><li><p>Much faster (less than 10 minutes)</p></li><li><p>More memory efficient than Pandas</p></li></ul><p>The system also excludes dimensional products with cardinality greater than 10^7 to reduce the total number of segments to a manageable size.</p><p>Finally, storage format.</p><p>The day&#8217;s metrics aggregated across hundreds of millions of segments are stored in the data warehouse in <strong>sparse tall table format</strong>.</p><p>In plain English: if a segment has a metric value of zero, DoorDash drops it. That cuts storage and keeps both DuckDB and the downstream warehouse from filling up with rows that say &#8216;nothing happened here.&#8217;</p><div><hr></div><h4>Step 3: Time-series anomaly detection</h4><p>After Step 2, DoorDash has daily metric aggregates by segment. They keep the previous 28 days of data in the data warehouse, so the platform now has several hundred million metric time series, each of length 28.</p><p>DoorDash chose a simple <strong>moving-window z-score</strong> approach, because it performed well in testing and detected all historical fraudulent trends they used as positive examples.</p><p><strong>Baseline and test setup</strong></p><ul><li><p>First <strong>21 days</strong> form the baseline</p></li><li><p>The <strong>28th day</strong> is the test day</p></li><li><p>There is a <strong>7-day gap</strong> between the baseline and the test day</p></li></ul><p>That gap exists for a very specific reason. The team noticed many historical fraud trends had a noisy phase when they first started scaling. By leaving a gap, the baseline variance better reflects &#8216;normal before the trend&#8217; which reduces missed trends.</p><p><strong>What counts as an anomaly</strong></p><p>A segment&#8217;s time series is flagged as anomalous if it meets both:</p><ol><li><p><strong>Statistical significance: </strong>The 28th-day <em>relative</em> metric is greater than X standard deviations above the mean of the 21-day baseline. DoorDash found <strong>6 standard deviations</strong> worked well empirically.</p></li><li><p><strong>Business significance: </strong>The 28th-day <em>absolute</em> metric exceeds the 21-day baseline by a dollar value and/or count that is meaningful for that metric. Thresholds vary by metric and were chosen with operations partners.</p></li></ol><p>That two-part rule matters. Statistical significance alone finds weirdness. Business significance filters it down to weirdness that&#8217;s worth a human&#8217;s time.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!-sSP!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30486ff1-68ef-4ae3-bc16-a96093efa794_1024x595.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!-sSP!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30486ff1-68ef-4ae3-bc16-a96093efa794_1024x595.webp 424w, https://substackcdn.com/image/fetch/$s_!-sSP!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30486ff1-68ef-4ae3-bc16-a96093efa794_1024x595.webp 848w, https://substackcdn.com/image/fetch/$s_!-sSP!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30486ff1-68ef-4ae3-bc16-a96093efa794_1024x595.webp 1272w, https://substackcdn.com/image/fetch/$s_!-sSP!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30486ff1-68ef-4ae3-bc16-a96093efa794_1024x595.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!-sSP!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30486ff1-68ef-4ae3-bc16-a96093efa794_1024x595.webp" width="1024" height="595" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/30486ff1-68ef-4ae3-bc16-a96093efa794_1024x595.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:595,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:60360,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/webp&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/185495640?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30486ff1-68ef-4ae3-bc16-a96093efa794_1024x595.webp&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!-sSP!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30486ff1-68ef-4ae3-bc16-a96093efa794_1024x595.webp 424w, https://substackcdn.com/image/fetch/$s_!-sSP!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30486ff1-68ef-4ae3-bc16-a96093efa794_1024x595.webp 848w, https://substackcdn.com/image/fetch/$s_!-sSP!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30486ff1-68ef-4ae3-bc16-a96093efa794_1024x595.webp 1272w, https://substackcdn.com/image/fetch/$s_!-sSP!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30486ff1-68ef-4ae3-bc16-a96093efa794_1024x595.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Anomaly calculation example (Source: Doordash)</figcaption></figure></div><div><hr></div><h4><strong>Step 4: Hierarchical clustering on anomalous segments</strong></h4><p>Real fraud trends rarely show up as a single clean segment anomaly. A single trend often triggers anomalous increases across many partially overlapping segments. Example:</p><p>A spike in credit and refund claims at &#8216;Retailer One&#8217; could cause anomalies in segments like:</p><ul><li><p><code>{business_name='Retailer One'}</code></p></li><li><p><code>{country='US', business_name='Retailer One'}</code></p></li><li><p><code>{business_vertical='retail', business_name='Retailer One'}</code></p></li></ul><p>So Step 4 exists to shrink &#8216;thousands of anomalies&#8217; into &#8216;a few dozen things to look at&#8217;.</p><p><strong>Segment graph structure</strong></p><p>Dimensional segments have a natural structure that can be represented as a three-layer graph:</p><ul><li><p><strong>Top layer:</strong> singlets</p><ul><li><p><code>{business_name='Retailer One'}</code></p></li></ul></li><li><p><strong>Middle layer:</strong> pairs</p><ul><li><p><code>{business_name='Retailer One', country='US'}</code></p></li></ul></li><li><p><strong>Bottom layer:</strong> triplets</p><ul><li><p><code>{business_name='Retailer One', country='US', checkout_platform='iOS'}</code></p></li></ul></li></ul><p>DoorDash further partitions the graph by <code>METRIC_NAME</code> so clustering happens within a metric type.</p><p><strong>Clustering rules</strong></p><p>To connect anomalies within the same metric type:</p><ol><li><p><strong>Connect parent anomalies with child anomalies</strong></p><ul><li><p><code>{business_name='Retailer One'}</code> is parent of <code>{country='US', business_name='Retailer One'}</code></p></li><li><p><code>{country='US', business_name='Retailer One'}</code> is parent of <code>{business_name='Retailer One', country='US', checkout_platform='iOS'}</code></p></li></ul></li><li><p><strong>Connect sibling anomaly triplets</strong> if they share <strong>2/3</strong> of their keys and values</p><ul><li><p><code>{business_name='Retailer One', country='US', checkout_platform='iOS'}</code><br>connects with<br><code>{business_name='Retailer One', country='US', business_vertical='retail'}</code></p></li></ul></li></ol><p>Then DoorDash runs a graph partition algorithm to find connected anomaly clusters.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!FXWL!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff2331571-5361-4687-9c9e-661654117e83_912x317.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!FXWL!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff2331571-5361-4687-9c9e-661654117e83_912x317.webp 424w, https://substackcdn.com/image/fetch/$s_!FXWL!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff2331571-5361-4687-9c9e-661654117e83_912x317.webp 848w, https://substackcdn.com/image/fetch/$s_!FXWL!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff2331571-5361-4687-9c9e-661654117e83_912x317.webp 1272w, https://substackcdn.com/image/fetch/$s_!FXWL!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff2331571-5361-4687-9c9e-661654117e83_912x317.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!FXWL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff2331571-5361-4687-9c9e-661654117e83_912x317.webp" width="912" height="317" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f2331571-5361-4687-9c9e-661654117e83_912x317.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:317,&quot;width&quot;:912,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:23418,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/webp&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/185495640?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff2331571-5361-4687-9c9e-661654117e83_912x317.webp&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!FXWL!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff2331571-5361-4687-9c9e-661654117e83_912x317.webp 424w, https://substackcdn.com/image/fetch/$s_!FXWL!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff2331571-5361-4687-9c9e-661654117e83_912x317.webp 848w, https://substackcdn.com/image/fetch/$s_!FXWL!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff2331571-5361-4687-9c9e-661654117e83_912x317.webp 1272w, https://substackcdn.com/image/fetch/$s_!FXWL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff2331571-5361-4687-9c9e-661654117e83_912x317.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Red circles indicate anomalous segments, while grey circles indicate non-anomalous segments. (Source: Doordash)</figcaption></figure></div><p><strong>Picking a representative segment</strong></p><p>Ops teams review a cluster starting from a single representative segment chosen using a fitness function:</p><pre><code><code>fitness = abs_anom_amt * rel_amt / level^1.2
</code></code></pre><p>Where:</p><ul><li><p><code>abs_anom_amt</code> = 28th-day metric minus the 21-day baseline</p></li><li><p><code>rel_amt</code> = relative (normalized) 28th-day metric within the segment</p></li><li><p><code>level</code> = 0 for singlets, 1 for pairs, 2 for triplets</p></li></ul><p>The intuition:</p><ul><li><p><code>abs_anom_amt</code> behaves a bit like &#8216;how much impact&#8217; (think recall)</p></li><li><p><code>rel_amt</code> behaves a bit like &#8216;how concentrated&#8217; (think precision)</p></li><li><p>dividing by a weak function of <code>level</code> biases toward simpler segments</p></li></ul><p>So the representative is usually a segment that is impactful, unusually high relative to its baseline and not needlessly specific.</p><p><strong>What volume looks like in practice</strong></p><p>In real operation, DoorDash typically sees anomalies in several thousand segments per day. Clustering reduces that to <strong>20 to 60 anomalous clusters per day</strong> across consumer and Dasher fraud areas, which is a volume the operations team can realistically investigate.</p><div><hr></div><h4><strong>Step 5: Turning clusters into investigations and containment</strong></h4><p>Detection is not the finish line, it is just the trigger.</p><p>The representative anomalous segments, along with all other segments in the cluster and example events (deliveries and Dasher assignments), are accessible in a workflow tool for ops investigation.</p><p>Ops agents review example deliveries or assignments within the representative segment, looking for trends or patterns that may represent a new fraud trend.</p><p>Sometimes the pattern is non-fraudulent, like a new promotion causing a spike in refunds. Other times it is fraudulent.</p><p>When a trend is deemed fraudulent:</p><ul><li><p>it is root-caused in partnership with engineering and product teams so the root cause can be addressed</p></li><li><p>a separate containment team runs queries to identify and stop fraudsters matching the trend pattern until product fixes land</p></li></ul><p>So the system is not just detection. It&#8217;s detection wired into investigation, containment and longer-term remediation.</p><div><hr></div><h4>Results</h4><p>DoorDash now uses the anomaly detection platform as its primary early-warning source for new fraudulent trends.</p><p>Key results reported by the team:</p><ul><li><p>More than <strong>60%</strong> of all new fraud trends today are found through anomaly detection, and that share is growing as coverage expands.</p></li><li><p>Average time-to-detect new fraud trends dropped from <strong>more than 100 days</strong> to <strong>less than three days</strong> over the past year.</p></li><li><p>The platform saves <strong>tens of millions of dollars per year</strong> by flagging small but growing fraud trends before they get out of control.</p></li></ul><div><hr></div><h3>The full scoop</h3><p>To learn more about this, check <a href="https://careersatdoordash.com/blog/doordash-anomaly-detection-platform-to-catch-fraud-trends">DoorDash's Engineering Blog</a> post on this topic</p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.datatinkerer.io/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">If you liked this post and don&#8217;t want to miss the next one, subscribe to Data Tinkerer!</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>If you are already subscribed and enjoyed the article, please give it a like and/or share it others, really appreciate it &#128591;</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.datatinkerer.io/p/how-doordash-saves-tens-of-millions-a-year-by-detecting-fraud?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.datatinkerer.io/p/how-doordash-saves-tens-of-millions-a-year-by-detecting-fraud?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><div><hr></div><h3>Keep learning</h3><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;61ab4b4f-02dc-4ef8-a0eb-c6193f2bb650&quot;,&quot;caption&quot;:&quot;How do you handle search queries like &#8220;low-carb spicy chicken wrap with gluten-free tortilla&#8221; at scale?<br /><br />DoorDash rebuilt its search pipeline to better understand both user intent and product metadata. The result? A 30% increase in relevant results and measurable gains across key engagement metrics.<br /><br />This post breaks down the hybrid approach they used; combining LLMs, structured taxonomies and real-time retrieval without sacrificing speed or accuracy.&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;lg&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;How DoorDash Used LLMs to Trigger 30% More Relevant Results&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:291590442,&quot;name&quot;:&quot;Data Tinkerer&quot;,&quot;bio&quot;:&quot;Ex-head of analytics sharing deep dives and learnings about AI and all things data (science, engineering, analysis)&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/83d3bbe0-5fb8-4f8d-9b74-036abbd6fec9_500x500.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2025-06-26T09:37:56.405Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/$s_!8K0n!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F37f9be1c-e138-41d4-9596-b4cd02897f95_432x860.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.datatinkerer.io/p/how-doordash-used-llms-to-trigger&quot;,&quot;section_name&quot;:&quot;Data Science&quot;,&quot;video_upload_id&quot;:null,&quot;id&quot;:166857110,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:7,&quot;comment_count&quot;:0,&quot;publication_id&quot;:3422740,&quot;publication_name&quot;:&quot;Data Tinkerer&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!JEdj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6bea5ccd-f356-4154-a1db-16268300510e_500x500.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;b7387bad-e21d-43d0-9227-adbed3439e2b&quot;,&quot;caption&quot;:&quot;Behind every 'smart' answer is a chain of fallible steps: retrieval, ranking, prompting and others.<br /><br />Dropbox Dash turned that complexity into a testable, measurable system.<br /><br />Here&#8217;s how they made their evaluation as rigorous as code.&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;lg&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;How Dropbox Made AI Evaluation Work at Scale&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:291590442,&quot;name&quot;:&quot;Data Tinkerer&quot;,&quot;bio&quot;:&quot;Ex-head of analytics sharing deep dives and learnings about AI and all things data (science, engineering, analysis)&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/83d3bbe0-5fb8-4f8d-9b74-036abbd6fec9_500x500.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2025-10-09T07:14:50.996Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/$s_!oMNY!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a58eb26-c9ac-492d-96f2-343a7f503ddc_800x450.gif&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.datatinkerer.io/p/how-dropbox-made-ai-evaluation-work-at-scale&quot;,&quot;section_name&quot;:&quot;Data Science&quot;,&quot;video_upload_id&quot;:null,&quot;id&quot;:175671629,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:8,&quot;comment_count&quot;:0,&quot;publication_id&quot;:3422740,&quot;publication_name&quot;:&quot;Data Tinkerer&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!JEdj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6bea5ccd-f356-4154-a1db-16268300510e_500x500.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div>]]></content:encoded></item><item><title><![CDATA[How Grab Detects Data Issues across 100+ Kafka Topics Before They Spread]]></title><description><![CDATA[Real-time stream validation surfaces poison records early and notifies owners with context]]></description><link>https://www.datatinkerer.io/p/how-grab-detects-data-issues-across-100-kafka-topics</link><guid isPermaLink="false">https://www.datatinkerer.io/p/how-grab-detects-data-issues-across-100-kafka-topics</guid><dc:creator><![CDATA[Data Tinkerer]]></dc:creator><pubDate>Thu, 15 Jan 2026 04:15:57 GMT</pubDate><enclosure url="https://images.unsplash.com/photo-1624957083543-9a67140fabfd?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw0fHxncmFifGVufDB8fHx8MTc2NzkzMTIwOXww&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Fellow Data Tinkerers</p><p>Today we will look at how Grab detects data issues in real-time. </p><p>But before that, I wanted to share with you what you could unlock if you share Data Tinkerer with just <strong>1 more person</strong>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!6Doc!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f76efe7-9d16-4ec3-8e25-0e3d2282705b_800x402.gif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!6Doc!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f76efe7-9d16-4ec3-8e25-0e3d2282705b_800x402.gif 424w, https://substackcdn.com/image/fetch/$s_!6Doc!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f76efe7-9d16-4ec3-8e25-0e3d2282705b_800x402.gif 848w, https://substackcdn.com/image/fetch/$s_!6Doc!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f76efe7-9d16-4ec3-8e25-0e3d2282705b_800x402.gif 1272w, https://substackcdn.com/image/fetch/$s_!6Doc!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f76efe7-9d16-4ec3-8e25-0e3d2282705b_800x402.gif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!6Doc!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f76efe7-9d16-4ec3-8e25-0e3d2282705b_800x402.gif" width="800" height="402" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5f76efe7-9d16-4ec3-8e25-0e3d2282705b_800x402.gif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:402,&quot;width&quot;:800,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:3369150,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/gif&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/183755897?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f76efe7-9d16-4ec3-8e25-0e3d2282705b_800x402.gif&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!6Doc!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f76efe7-9d16-4ec3-8e25-0e3d2282705b_800x402.gif 424w, https://substackcdn.com/image/fetch/$s_!6Doc!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f76efe7-9d16-4ec3-8e25-0e3d2282705b_800x402.gif 848w, https://substackcdn.com/image/fetch/$s_!6Doc!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f76efe7-9d16-4ec3-8e25-0e3d2282705b_800x402.gif 1272w, https://substackcdn.com/image/fetch/$s_!6Doc!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f76efe7-9d16-4ec3-8e25-0e3d2282705b_800x402.gif 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>There are 100+ resources to learn all things data (science, engineering, analysis). It includes videos, courses, projects and can be filtered by tech stack (Python, SQL, Spark and etc), skill level (Beginner, Intermediate and so on) provider name or free/paid. So if you know other people who like staying up to date on all things data, please share Data Tinkerer with them!</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.datatinkerer.io/leaderboard?&amp;utm_source=post&quot;,&quot;text&quot;:&quot;Refer a friend&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.datatinkerer.io/leaderboard?&amp;utm_source=post"><span>Refer a friend</span></a></p><p>Now, with that out of the way, let&#8217;s get to Grab&#8217;s real-time work!</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://images.unsplash.com/photo-1558899367-3cd83fb31ed8?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxfHxncmFifGVufDB8fHx8MTc2NzkzMTIwOXww&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://images.unsplash.com/photo-1558899367-3cd83fb31ed8?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxfHxncmFifGVufDB8fHx8MTc2NzkzMTIwOXww&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 424w, https://images.unsplash.com/photo-1558899367-3cd83fb31ed8?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxfHxncmFifGVufDB8fHx8MTc2NzkzMTIwOXww&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 848w, https://images.unsplash.com/photo-1558899367-3cd83fb31ed8?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxfHxncmFifGVufDB8fHx8MTc2NzkzMTIwOXww&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 1272w, https://images.unsplash.com/photo-1558899367-3cd83fb31ed8?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxfHxncmFifGVufDB8fHx8MTc2NzkzMTIwOXww&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 1456w" sizes="100vw"><img src="https://images.unsplash.com/photo-1558899367-3cd83fb31ed8?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxfHxncmFifGVufDB8fHx8MTc2NzkzMTIwOXww&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080" width="6000" height="4000" data-attrs="{&quot;src&quot;:&quot;https://images.unsplash.com/photo-1558899367-3cd83fb31ed8?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxfHxncmFifGVufDB8fHx8MTc2NzkzMTIwOXww&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:4000,&quot;width&quot;:6000,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;man riding bicycle&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="man riding bicycle" title="man riding bicycle" srcset="https://images.unsplash.com/photo-1558899367-3cd83fb31ed8?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxfHxncmFifGVufDB8fHx8MTc2NzkzMTIwOXww&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 424w, https://images.unsplash.com/photo-1558899367-3cd83fb31ed8?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxfHxncmFifGVufDB8fHx8MTc2NzkzMTIwOXww&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 848w, https://images.unsplash.com/photo-1558899367-3cd83fb31ed8?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxfHxncmFifGVufDB8fHx8MTc2NzkzMTIwOXww&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 1272w, https://images.unsplash.com/photo-1558899367-3cd83fb31ed8?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxfHxncmFifGVufDB8fHx8MTc2NzkzMTIwOXww&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Photo by <a href="https://unsplash.com/@javaistan">Afif Ramdhasuma</a> on <a href="https://unsplash.com">Unsplash</a></figcaption></figure></div><h3>TL;DR</h3><div><hr></div><h4><strong>Situation</strong></h4><p>Grab runs critical systems on Kafka streams, where bad data can spread and break downstream consumers. Existing checks were slow and mostly limited to schemas, making issues hard to catch and debug.</p><h4><strong>Task</strong></h4><p>Detect bad streaming data early, cover both schema and value-level issues and give stream owners fast, actionable visibility without centralising ownership.</p><h4><strong>Action</strong></h4><p>Grab built contract-driven stream checks on Coban, turning schemas, field rules and ownership into real-time FlinkSQL tests with Slack alerts and UI-based inspection of bad records.</p><h4><strong>Result</strong></h4><p>The system now monitors 100+ Kafka topics in real time, surfaces poison data quickly and helps teams stop issues before they cascade downstream.</p><h4><strong>Use Cases</strong></h4><p>Root cause analysis, real-time monitoring, real-time alerting</p><h4><strong>Tech Stack/Framework</strong></h4><p>Apache Kafka, Apache Flink, Amazon S3, Slack, LLM</p><div><hr></div><h3>Explained further</h3><div><hr></div><h4><strong>About Grab</strong></h4><p><a href="https://www.grab.com/">Grab</a> is often called the Uber of Southeast Asia but that might be selling it short. What started as a ride-hailing app now powers food delivery, groceries, payments and even insurance all bundled into one super app. They run across over 800 cities in 8 Southeast Asian countries. Behind the rides, meals, and payments lies an enormous stream of events flowing through Grab&#8217;s systems.</p><div><hr></div><h4>Background</h4><p>Grab runs a lot of business on streaming data. Kafka topics feed online systems, offline analytics and machine learning pipelines. When those streams are clean, life is good: teams can move faster, models behave, dashboards run smoothly. But when they&#8217;re not clean, it&#8217;s a major headache.</p><p>The tricky part is that &#8216;bad data&#8217; in Kafka isn&#8217;t always obvious. Sometimes it&#8217;s quiet: the stream still parses but key fields are wrong, missing or shaped differently than what downstream teams assume.</p><p>That&#8217;s why Grab decided to introduce a platform-level solution: Kafka stream contracts that let stream stakeholders define what &#8216;good&#8217; looks like, then automatically test streams in real time, catch issues as they happen and alert the owners quickly.</p><p>The core idea is simple:</p><ul><li><p>Let users define a data contract for a Kafka topic</p></li><li><p>Convert that contract into executable tests</p></li><li><p>Run those tests continuously</p></li><li><p>Capture the poison data plus context</p></li><li><p>Notify the right people with enough detail to act</p></li></ul><p>This supports a more decentralized, data-mesh style world where teams own their data products while still keeping the overall system reliable for everyone else.</p><div><hr></div><h4>What wasn&#8217;t working before</h4><p>Historically, monitoring Kafka stream data processing didn&#8217;t have a strong, end-to-end solution for data quality validation. That created three big issues: detecting bad data, speed of detection and lack of visibility.</p><p><strong>1- Detecting bad data</strong></p><p>This can be broken down into two further categories:</p><p><strong>1.1 Schema issues</strong></p><p>These are schema mismatches between producers and consumers that can trigger deserialization errors. Even if schema backward compatibility is validated during schema evolution, the data inside the Kafka topic can still drift from the defined schema.</p><p>One concrete example: a rogue producer writes to a topic without using the expected schema. Now you&#8217;ve got a topic that &#8216;has a schema&#8217; but real events don&#8217;t match it. The painful bit is not just knowing something broke, it&#8217;s identifying which fields are causing the mismatch.</p><p><strong>1.2 Rule and value issues</strong><br>These are disagreements about what a field <em>means</em> or what shape it should take. Kafka stream schemas define structure but they don&#8217;t enforce rules like:</p><ul><li><p>expected length for an identifier</p></li><li><p>expected string pattern</p></li><li><p>valid numeric ranges</p></li><li><p>constant values that should never change</p></li></ul><p>There wasn&#8217;t an existing framework where stakeholders could define and enforce field-level semantic rules for streams.</p><p><strong>2- Speed of detection</strong></p><p>The second issue was speed of detection. There was no real-time mechanism to automatically validate data against predefined rules, identify issues quickly and alert stakeholders promptly.</p><p>Without real-time validation, issues could stick around for a while, quietly impacting multiple online and offline downstream systems before being discovered.</p><p><strong>3- Lack of visibility</strong></p><p>Even when teams did detect a problem, it was hard to pinpoint the exact &#8216;poison data&#8217; and understand what violated the schema or the semantic expectations.</p><p>Root cause analysis becomes painful when you cannot easily answer:</p><ul><li><p>Which records were bad?</p></li><li><p>Which fields failed?</p></li><li><p>What did the bad values look like?</p></li><li><p>When did it start and how frequent is it?</p></li></ul><div><hr></div><h4>The fix</h4><p>Grab&#8217;s Coban platform provides a standardized, platform-level data quality testing and observability setup for Kafka streams. It&#8217;s built around four core ideas:</p><ol><li><p><strong>Data Contract Definition: </strong>Stream stakeholders define a contract that includes schema agreements, semantic rules the topic data must follow, and ownership metadata for alerts and notifications.</p></li><li><p><strong>Automated Test Execution: </strong>A long-running test runner automatically executes real-time tests based on that contract.</p></li><li><p><strong>Real-time Data Quality Issue Identification: </strong>The system detects data issues in real time at both schema and rules/values levels.</p></li><li><p><strong>Alerts and Result Observability: </strong>It alerts the right people and makes it easier to observe issues through the platform UI and downstream tooling.</p></li></ol><p>Put simply: define the rules once, then let the platform watch the stream continuously.</p><p>The architecture has three main components:</p><ol><li><p><strong>Data contract definition</strong></p></li><li><p><strong>Test execution and data quality issue identification</strong></p></li><li><p><strong>Result observability</strong></p></li></ol><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!opMG!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f5a043c-954d-4cfc-b21e-a3dfd346fe47_1991x743.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!opMG!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f5a043c-954d-4cfc-b21e-a3dfd346fe47_1991x743.jpeg 424w, https://substackcdn.com/image/fetch/$s_!opMG!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f5a043c-954d-4cfc-b21e-a3dfd346fe47_1991x743.jpeg 848w, https://substackcdn.com/image/fetch/$s_!opMG!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f5a043c-954d-4cfc-b21e-a3dfd346fe47_1991x743.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!opMG!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f5a043c-954d-4cfc-b21e-a3dfd346fe47_1991x743.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!opMG!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f5a043c-954d-4cfc-b21e-a3dfd346fe47_1991x743.jpeg" width="1456" height="543" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5f5a043c-954d-4cfc-b21e-a3dfd346fe47_1991x743.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:543,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:224037,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/183755897?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f5a043c-954d-4cfc-b21e-a3dfd346fe47_1991x743.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!opMG!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f5a043c-954d-4cfc-b21e-a3dfd346fe47_1991x743.jpeg 424w, https://substackcdn.com/image/fetch/$s_!opMG!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f5a043c-954d-4cfc-b21e-a3dfd346fe47_1991x743.jpeg 848w, https://substackcdn.com/image/fetch/$s_!opMG!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f5a043c-954d-4cfc-b21e-a3dfd346fe47_1991x743.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!opMG!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f5a043c-954d-4cfc-b21e-a3dfd346fe47_1991x743.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Real-time Kafka Stream Data Quality Monitoring Architecture (Source: Grab)</figcaption></figure></div><p>All Flow mentions after this refer to those diagrammed steps above</p><div><hr></div><h4><strong>Data contract definition</strong></h4><p>Coban&#8217;s contract acts as a formal agreement among Kafka stream stakeholders. It includes a few building blocks.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!KSXy!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F73bb2962-4b96-49b4-ad1d-7709b39753d7_836x758.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!KSXy!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F73bb2962-4b96-49b4-ad1d-7709b39753d7_836x758.jpeg 424w, https://substackcdn.com/image/fetch/$s_!KSXy!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F73bb2962-4b96-49b4-ad1d-7709b39753d7_836x758.jpeg 848w, https://substackcdn.com/image/fetch/$s_!KSXy!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F73bb2962-4b96-49b4-ad1d-7709b39753d7_836x758.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!KSXy!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F73bb2962-4b96-49b4-ad1d-7709b39753d7_836x758.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!KSXy!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F73bb2962-4b96-49b4-ad1d-7709b39753d7_836x758.jpeg" width="836" height="758" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/73bb2962-4b96-49b4-ad1d-7709b39753d7_836x758.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:758,&quot;width&quot;:836,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:120852,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/183755897?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F73bb2962-4b96-49b4-ad1d-7709b39753d7_836x758.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!KSXy!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F73bb2962-4b96-49b4-ad1d-7709b39753d7_836x758.jpeg 424w, https://substackcdn.com/image/fetch/$s_!KSXy!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F73bb2962-4b96-49b4-ad1d-7709b39753d7_836x758.jpeg 848w, https://substackcdn.com/image/fetch/$s_!KSXy!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F73bb2962-4b96-49b4-ad1d-7709b39753d7_836x758.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!KSXy!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F73bb2962-4b96-49b4-ad1d-7709b39753d7_836x758.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Source: Grab</figcaption></figure></div><p><strong>Kafka Stream Schema (Flow 1.1)</strong></p><p>The contract includes the schema used by the Kafka topic under test. This helps the Test Runner validate schema compatibility across data streams.</p><p>Importantly, this is not only about &#8220;did the schema change.&#8221; It&#8217;s also about &#8220;does the data actually match what everyone believes the schema is.&#8221;</p><p><strong>Kafka Stream Configuration (Flow 1.2)</strong></p><p>This includes essential config like endpoint and topic name. Coban automatically populates this so users don&#8217;t have to wire everything manually.</p><p><strong>Observability Metadata (Flow 1.3)</strong></p><p>This is where ownership becomes real. The contract includes contact details for stream stakeholders and alert configurations so the right people get notified when issues show up.</p><p><strong>Kafka Stream Semantic Test Rules (Flow 1.5)</strong></p><p>This is the heart of the semantic side. Users can define intuitive field-level rules such as:</p><ul><li><p>string pattern checks</p></li><li><p>number range checks</p></li><li><p>constant value checks</p></li></ul><p>The point is to make the &#8220;meaning&#8221; of fields enforceable, not just their data types.</p><p><strong>LLM-Based Semantic Test Rules Recommendation (Flow 1.4)</strong></p><p>Defining dozens or hundreds of field rules can overwhelm people. To reduce that setup burden, Coban uses an LLM-based feature that recommends semantic test rules based on:</p><ul><li><p>the provided Kafka stream schema</p></li><li><p>anonymized sample data</p></li></ul><p>This feature helps users set up semantic rules efficiently, as demonstrated below</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!pu8X!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff96422b2-5c1c-4f0a-8fb8-af92a409d344_1999x717.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!pu8X!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff96422b2-5c1c-4f0a-8fb8-af92a409d344_1999x717.png 424w, https://substackcdn.com/image/fetch/$s_!pu8X!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff96422b2-5c1c-4f0a-8fb8-af92a409d344_1999x717.png 848w, https://substackcdn.com/image/fetch/$s_!pu8X!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff96422b2-5c1c-4f0a-8fb8-af92a409d344_1999x717.png 1272w, https://substackcdn.com/image/fetch/$s_!pu8X!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff96422b2-5c1c-4f0a-8fb8-af92a409d344_1999x717.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!pu8X!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff96422b2-5c1c-4f0a-8fb8-af92a409d344_1999x717.png" width="1456" height="522" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f96422b2-5c1c-4f0a-8fb8-af92a409d344_1999x717.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:522,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:241835,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/183755897?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff96422b2-5c1c-4f0a-8fb8-af92a409d344_1999x717.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!pu8X!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff96422b2-5c1c-4f0a-8fb8-af92a409d344_1999x717.png 424w, https://substackcdn.com/image/fetch/$s_!pu8X!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff96422b2-5c1c-4f0a-8fb8-af92a409d344_1999x717.png 848w, https://substackcdn.com/image/fetch/$s_!pu8X!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff96422b2-5c1c-4f0a-8fb8-af92a409d344_1999x717.png 1272w, https://substackcdn.com/image/fetch/$s_!pu8X!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff96422b2-5c1c-4f0a-8fb8-af92a409d344_1999x717.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Sample UI showcasing LLM-based Kafka stream schema field-level semantic test rules (Source: Grab)</figcaption></figure></div><p>The practical benefit: users get a starting point quickly, instead of staring at a schema and trying to invent rules from scratch.</p><div><hr></div><h4><strong>Data contract transformation</strong></h4><p>Once a contract is defined, Coban&#8217;s transformation engine converts it into configurations the Test Runner can interpret (Flow 2.1).</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!nvEa!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa26d3201-18ad-4d6f-af2f-727f179fc238_1122x660.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!nvEa!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa26d3201-18ad-4d6f-af2f-727f179fc238_1122x660.jpeg 424w, https://substackcdn.com/image/fetch/$s_!nvEa!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa26d3201-18ad-4d6f-af2f-727f179fc238_1122x660.jpeg 848w, https://substackcdn.com/image/fetch/$s_!nvEa!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa26d3201-18ad-4d6f-af2f-727f179fc238_1122x660.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!nvEa!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa26d3201-18ad-4d6f-af2f-727f179fc238_1122x660.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!nvEa!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa26d3201-18ad-4d6f-af2f-727f179fc238_1122x660.jpeg" width="1122" height="660" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a26d3201-18ad-4d6f-af2f-727f179fc238_1122x660.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:660,&quot;width&quot;:1122,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:135025,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/183755897?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0d4065d7-b4c5-4f78-8761-0addce18f606_1122x660.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!nvEa!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa26d3201-18ad-4d6f-af2f-727f179fc238_1122x660.jpeg 424w, https://substackcdn.com/image/fetch/$s_!nvEa!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa26d3201-18ad-4d6f-af2f-727f179fc238_1122x660.jpeg 848w, https://substackcdn.com/image/fetch/$s_!nvEa!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa26d3201-18ad-4d6f-af2f-727f179fc238_1122x660.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!nvEa!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa26d3201-18ad-4d6f-af2f-727f179fc238_1122x660.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Source: Grab</figcaption></figure></div><p>This transformation covers four things:</p><p><strong>Kafka Stream Schema: </strong>The contract schema is translated into a schema reference format the Test Runner can parse.</p><p><strong>Kafka Stream Configuration: </strong>The Kafka stream is set up as a source for the Test Runner.</p><p><strong>Observability metadata: </strong>Contact information is turned into runtime configs for alerting and routing.</p><p><strong>Kafka Stream Semantic Test Rules: </strong>Human-readable semantic rules are transformed into an <strong>inverse SQL query</strong> that captures data violating the rules.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!SeoF!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F00cfa6bd-52be-4143-a79d-caeb71276749_1972x1104.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!SeoF!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F00cfa6bd-52be-4143-a79d-caeb71276749_1972x1104.jpeg 424w, https://substackcdn.com/image/fetch/$s_!SeoF!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F00cfa6bd-52be-4143-a79d-caeb71276749_1972x1104.jpeg 848w, https://substackcdn.com/image/fetch/$s_!SeoF!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F00cfa6bd-52be-4143-a79d-caeb71276749_1972x1104.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!SeoF!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F00cfa6bd-52be-4143-a79d-caeb71276749_1972x1104.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!SeoF!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F00cfa6bd-52be-4143-a79d-caeb71276749_1972x1104.jpeg" width="1456" height="815" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/00cfa6bd-52be-4143-a79d-caeb71276749_1972x1104.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:815,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:213548,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/183755897?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F00cfa6bd-52be-4143-a79d-caeb71276749_1972x1104.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!SeoF!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F00cfa6bd-52be-4143-a79d-caeb71276749_1972x1104.jpeg 424w, https://substackcdn.com/image/fetch/$s_!SeoF!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F00cfa6bd-52be-4143-a79d-caeb71276749_1972x1104.jpeg 848w, https://substackcdn.com/image/fetch/$s_!SeoF!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F00cfa6bd-52be-4143-a79d-caeb71276749_1972x1104.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!SeoF!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F00cfa6bd-52be-4143-a79d-caeb71276749_1972x1104.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Illustration of semantic test rules being converted from human-readable formats into inverse SQL queries (Source: Grab)</figcaption></figure></div><p>&#8216;Inverse SQL&#8217; here means the query is designed to return the <em>bad rows</em>, not the good ones. That&#8217;s a smart design choice because it keeps the output focused on what needs investigation.</p><div><hr></div><h4>Test execution &amp; data quality issue identification</h4><p>Once the transformation engine generates the configuration, the platform automatically deploys the Test Runner.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Y-bs!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbbd8212e-0700-4411-be3d-384ee376c7ec_1010x734.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Y-bs!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbbd8212e-0700-4411-be3d-384ee376c7ec_1010x734.jpeg 424w, https://substackcdn.com/image/fetch/$s_!Y-bs!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbbd8212e-0700-4411-be3d-384ee376c7ec_1010x734.jpeg 848w, https://substackcdn.com/image/fetch/$s_!Y-bs!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbbd8212e-0700-4411-be3d-384ee376c7ec_1010x734.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!Y-bs!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbbd8212e-0700-4411-be3d-384ee376c7ec_1010x734.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Y-bs!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbbd8212e-0700-4411-be3d-384ee376c7ec_1010x734.jpeg" width="1010" height="734" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/bbd8212e-0700-4411-be3d-384ee376c7ec_1010x734.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:734,&quot;width&quot;:1010,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:96110,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/183755897?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf8dc273-8996-4ec1-a825-41a85d232746_1010x734.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Y-bs!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbbd8212e-0700-4411-be3d-384ee376c7ec_1010x734.jpeg 424w, https://substackcdn.com/image/fetch/$s_!Y-bs!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbbd8212e-0700-4411-be3d-384ee376c7ec_1010x734.jpeg 848w, https://substackcdn.com/image/fetch/$s_!Y-bs!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbbd8212e-0700-4411-be3d-384ee376c7ec_1010x734.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!Y-bs!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbbd8212e-0700-4411-be3d-384ee376c7ec_1010x734.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Source: Grab</figcaption></figure></div><p><strong>Test runner</strong></p><p>The Test Runner uses FlinkSQL as its compute engine. FlinkSQL was chosen because it makes defining rules straightforward using SQL statements, which also makes it easier for the platform to convert contracts into enforceable checks.</p><p><strong>Test execution workflow and problematic data identification</strong></p><p>Below are the 4 steps undertaken to execute the test and identify problematic data:</p><ol><li><p><strong>Consume Kafka data (Flow 2.2)</strong><br>FlinkSQL consumes data from the Kafka topic under test using its own consumer group. This is important because it avoids impacting other consumers.</p></li><li><p><strong>Run inverse SQL (Flow 2.3)</strong><br>The Test Runner runs the inverse SQL query to identify:</p><ul><li><p>data that violates semantic rules</p></li><li><p>data that is syntactically incorrect &#8220;in the first place&#8221;</p></li></ul></li><li><p><strong>Publish data quality issue events (Flow 3.2)</strong><br>When bad data is found, the Test Runner packages it into a data quality issue event enriched with:</p><ul><li><p>a test summary</p></li><li><p>total count of bad records</p></li><li><p>sample bad data</p></li></ul><p>Then it publishes the event to a dedicated Kafka topic.</p></li><li><p><strong>Sink events to S3 (Flow 3.1)</strong><br>The platform also sinks all data quality events to an AWS S3 bucket for deeper observability and analysis.</p></li></ol><p>This combo (Kafka for realtime events, S3 for deeper inspection) gives both fast alerting and a more durable store for later analysis.</p><div><hr></div><h4>Result observability</h4><p>Grab&#8217;s in-house data quality observability platform, Genchi, consumes the problematic data captured by the Test Runner (Flow 3.3).</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!n2A8!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b376a29-23b4-420d-b72c-def7101963c5_838x618.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!n2A8!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b376a29-23b4-420d-b72c-def7101963c5_838x618.jpeg 424w, https://substackcdn.com/image/fetch/$s_!n2A8!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b376a29-23b4-420d-b72c-def7101963c5_838x618.jpeg 848w, https://substackcdn.com/image/fetch/$s_!n2A8!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b376a29-23b4-420d-b72c-def7101963c5_838x618.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!n2A8!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b376a29-23b4-420d-b72c-def7101963c5_838x618.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!n2A8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b376a29-23b4-420d-b72c-def7101963c5_838x618.jpeg" width="838" height="618" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8b376a29-23b4-420d-b72c-def7101963c5_838x618.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:618,&quot;width&quot;:838,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:64056,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/183755897?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b376a29-23b4-420d-b72c-def7101963c5_838x618.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!n2A8!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b376a29-23b4-420d-b72c-def7101963c5_838x618.jpeg 424w, https://substackcdn.com/image/fetch/$s_!n2A8!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b376a29-23b4-420d-b72c-def7101963c5_838x618.jpeg 848w, https://substackcdn.com/image/fetch/$s_!n2A8!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b376a29-23b4-420d-b72c-def7101963c5_838x618.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!n2A8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b376a29-23b4-420d-b72c-def7101963c5_838x618.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Source: Grab</figcaption></figure></div><p><strong>Alerting</strong></p><p>Genchi sends Slack notifications to stream owners listed in the contract&#8217;s observability metadata (Flow 3.5).</p><p>Those notifications include useful debugging context such as:</p><ul><li><p>links to sample data in the Coban UI</p></li><li><p>observed time windows</p></li><li><p>counts of bad records</p></li><li><p>other relevant details</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!avzo!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff156f000-5325-4c18-b58c-1987c5cac707_1314x478.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!avzo!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff156f000-5325-4c18-b58c-1987c5cac707_1314x478.png 424w, https://substackcdn.com/image/fetch/$s_!avzo!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff156f000-5325-4c18-b58c-1987c5cac707_1314x478.png 848w, https://substackcdn.com/image/fetch/$s_!avzo!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff156f000-5325-4c18-b58c-1987c5cac707_1314x478.png 1272w, https://substackcdn.com/image/fetch/$s_!avzo!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff156f000-5325-4c18-b58c-1987c5cac707_1314x478.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!avzo!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff156f000-5325-4c18-b58c-1987c5cac707_1314x478.png" width="1314" height="478" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f156f000-5325-4c18-b58c-1987c5cac707_1314x478.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:478,&quot;width&quot;:1314,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:104689,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/183755897?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff156f000-5325-4c18-b58c-1987c5cac707_1314x478.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!avzo!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff156f000-5325-4c18-b58c-1987c5cac707_1314x478.png 424w, https://substackcdn.com/image/fetch/$s_!avzo!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff156f000-5325-4c18-b58c-1987c5cac707_1314x478.png 848w, https://substackcdn.com/image/fetch/$s_!avzo!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff156f000-5325-4c18-b58c-1987c5cac707_1314x478.png 1272w, https://substackcdn.com/image/fetch/$s_!avzo!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff156f000-5325-4c18-b58c-1987c5cac707_1314x478.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Sample Slack notifications (Source: Grab)</figcaption></figure></div><p>The key point is that alerts are not just &#8216;something broke&#8217;, they include the information you need to start investigating.</p><p><strong>Observability</strong></p><p>Users can access the Coban UI (Flow 3.4) to see:</p><ul><li><p>Kafka stream test rules</p></li><li><p>sample bad records</p></li><li><p>highlighted fields and values that violate rules</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!iqrn!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2d9e3d9b-5030-4a02-b677-8bd277354196_1551x486.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!iqrn!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2d9e3d9b-5030-4a02-b677-8bd277354196_1551x486.jpeg 424w, https://substackcdn.com/image/fetch/$s_!iqrn!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2d9e3d9b-5030-4a02-b677-8bd277354196_1551x486.jpeg 848w, https://substackcdn.com/image/fetch/$s_!iqrn!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2d9e3d9b-5030-4a02-b677-8bd277354196_1551x486.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!iqrn!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2d9e3d9b-5030-4a02-b677-8bd277354196_1551x486.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!iqrn!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2d9e3d9b-5030-4a02-b677-8bd277354196_1551x486.jpeg" width="1456" height="456" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2d9e3d9b-5030-4a02-b677-8bd277354196_1551x486.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:456,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:108090,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/183755897?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2d9e3d9b-5030-4a02-b677-8bd277354196_1551x486.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!iqrn!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2d9e3d9b-5030-4a02-b677-8bd277354196_1551x486.jpeg 424w, https://substackcdn.com/image/fetch/$s_!iqrn!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2d9e3d9b-5030-4a02-b677-8bd277354196_1551x486.jpeg 848w, https://substackcdn.com/image/fetch/$s_!iqrn!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2d9e3d9b-5030-4a02-b677-8bd277354196_1551x486.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!iqrn!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2d9e3d9b-5030-4a02-b677-8bd277354196_1551x486.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">The highlighted fields indicate violations of the semantic test rules (Source: Grab)</figcaption></figure></div><p>That UI piece matters because it shortens the path from &#8216;alert received&#8217; to &#8216;I know what field is failing and what the bad values look like.&#8217;</p><div><hr></div><h4>Results so far</h4><p>Since deploying earlier in the year, this solution enabled Kafka stream users to:</p><ul><li><p>define contracts with both schema and semantic rules</p></li><li><p>automate real-time test execution</p></li><li><p>alert stakeholders when problematic data is detected so they can act quickly</p></li></ul><p>It has been actively monitoring data quality across <strong>100+ critical Kafka topics</strong>.</p><p>The solution also offers the capability to immediately identify and halt the propagation of invalid data across multiple streams.</p><div><hr></div><h4>Wrapping up</h4><p>Grab implemented and rolled out a real-time data quality monitoring solution for Kafka streams through the Coban platform.</p><p>The key outcomes include:</p><ul><li><p>engineers can define syntactic and semantic tests through a data contract</p></li><li><p>tests run automatically in real time via a long-running Test Runner based on FlinkSQL</p></li><li><p>issues trigger fast Slack alerts through Genchi using ownership metadata in the contract</p></li><li><p>teams get better visibility into exactly which data fields violate rules via the Coban UI</p></li></ul><p>In short: Coban turned data quality from a vague hope into something stream owners can specify, enforce and observe in real time.</p><div><hr></div><h3>The full scoop</h3><p>To learn more about this, check <a href="https://engineering.grab.com/real-time-data-quality-monitoring">Grab's Engineering Blog</a> post on this topic</p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.datatinkerer.io/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">If you liked this post and don&#8217;t want to miss the next one, subscribe to Data Tinkerer!</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>If you are already subscribed and enjoyed the article, please give it a like and/or share it others, really appreciate it &#128591;</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.datatinkerer.io/p/how-grab-detects-data-issues-across-100-kafka-topics?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.datatinkerer.io/p/how-grab-detects-data-issues-across-100-kafka-topics?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><div><hr></div><h3>Keep learning</h3><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;2b5e61e3-2de5-4088-981d-80de61411bd4&quot;,&quot;caption&quot;:&quot;Uber rebuilt its data lake ingestion to move freshness from hours to minutes.<br /><br />This piece breaks down how they replaced batch Spark jobs with Flink streaming, cut compute by 25% and dealt with the very real problems that show up at petabyte scale.&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;lg&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;How Uber Cut Data Lake Freshness From Hours to Minutes With Flink&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:291590442,&quot;name&quot;:&quot;Data Tinkerer&quot;,&quot;bio&quot;:&quot;Ex-head of analytics sharing deep dives and learnings about AI and all things data (science, engineering, analysis)&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/83d3bbe0-5fb8-4f8d-9b74-036abbd6fec9_500x500.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2026-01-02T04:30:31.300Z&quot;,&quot;cover_image&quot;:&quot;https://images.unsplash.com/photo-1615929362369-4af8423ebd68?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxNHx8dWJlcnxlbnwwfHx8fDE3NjcwNzc2NDZ8MA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.datatinkerer.io/p/how-uber-cut-data-lake-freshness-from-hours-to-minutes-with-flink&quot;,&quot;section_name&quot;:&quot;Data Engineering&quot;,&quot;video_upload_id&quot;:null,&quot;id&quot;:182833470,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:17,&quot;comment_count&quot;:1,&quot;publication_id&quot;:3422740,&quot;publication_name&quot;:&quot;Data Tinkerer&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!JEdj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6bea5ccd-f356-4154-a1db-16268300510e_500x500.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;1904c23f-5462-4150-9c60-b6ad712234b6&quot;,&quot;caption&quot;:&quot;How do you keep ML teams fast when every experiment blasts your Spark cluster with spiky workloads, huge datasets and five different file formats?<br /><br />Snap&#8217;s answer: Prism, a unified Spark platform that hides infra pain, standardises pipelines and supports everything from ad-hoc exploration to 10k+ daily jobs in production.<br /><br />This post breaks down why raw Spark wasn&#8217;t enough, what Prism fixes and how Snap rebuilt their ML data layer without ditching Spark.&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;lg&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;How Snap Rebuilt Its ML Platform to Handle 10,000+ Daily Spark Jobs&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:291590442,&quot;name&quot;:&quot;Data Tinkerer&quot;,&quot;bio&quot;:&quot;Ex-head of analytics sharing deep dives and learnings about AI and all things data (science, engineering, analysis)&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/83d3bbe0-5fb8-4f8d-9b74-036abbd6fec9_500x500.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2025-11-20T04:59:47.340Z&quot;,&quot;cover_image&quot;:&quot;https://images.unsplash.com/photo-1620396749162-d61bdb715793?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw1fHxzbmFwY2hhdHxlbnwwfHx8fDE3NjM1Mjk0MDZ8MA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.datatinkerer.io/p/how-snap-rebuilt-its-ml-platform-to-handle-10000-daily-spark-jobs&quot;,&quot;section_name&quot;:&quot;Data Engineering&quot;,&quot;video_upload_id&quot;:null,&quot;id&quot;:179211962,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:9,&quot;comment_count&quot;:0,&quot;publication_id&quot;:3422740,&quot;publication_name&quot;:&quot;Data Tinkerer&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!JEdj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6bea5ccd-f356-4154-a1db-16268300510e_500x500.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div>]]></content:encoded></item><item><title><![CDATA[What the Data Crowd Was Reading in December 2025]]></title><description><![CDATA[Tools, techniques and deep dives worth reading that I came across in December 2025.]]></description><link>https://www.datatinkerer.io/p/what-the-data-crowd-was-reading-in-december-2025</link><guid isPermaLink="false">https://www.datatinkerer.io/p/what-the-data-crowd-was-reading-in-december-2025</guid><dc:creator><![CDATA[Data Tinkerer]]></dc:creator><pubDate>Thu, 08 Jan 2026 05:01:52 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/29125fa4-9a37-40a2-a85c-c795fb77137f_500x500.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Fellow Data Tinkerers,</p><p>It&#8217;s time for another round-up on all things data and AI!</p><p>But before that, I wanted to share with you what you could unlock if you share Data Tinkerer with just <strong>1 more person</strong>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!4SpS!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a950004-135e-4245-99de-ab58d8aac3c1_800x402.gif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!4SpS!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a950004-135e-4245-99de-ab58d8aac3c1_800x402.gif 424w, https://substackcdn.com/image/fetch/$s_!4SpS!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a950004-135e-4245-99de-ab58d8aac3c1_800x402.gif 848w, https://substackcdn.com/image/fetch/$s_!4SpS!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a950004-135e-4245-99de-ab58d8aac3c1_800x402.gif 1272w, https://substackcdn.com/image/fetch/$s_!4SpS!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a950004-135e-4245-99de-ab58d8aac3c1_800x402.gif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!4SpS!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a950004-135e-4245-99de-ab58d8aac3c1_800x402.gif" width="800" height="402" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2a950004-135e-4245-99de-ab58d8aac3c1_800x402.gif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:402,&quot;width&quot;:800,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:3369150,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/gif&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/183495145?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a950004-135e-4245-99de-ab58d8aac3c1_800x402.gif&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!4SpS!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a950004-135e-4245-99de-ab58d8aac3c1_800x402.gif 424w, https://substackcdn.com/image/fetch/$s_!4SpS!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a950004-135e-4245-99de-ab58d8aac3c1_800x402.gif 848w, https://substackcdn.com/image/fetch/$s_!4SpS!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a950004-135e-4245-99de-ab58d8aac3c1_800x402.gif 1272w, https://substackcdn.com/image/fetch/$s_!4SpS!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a950004-135e-4245-99de-ab58d8aac3c1_800x402.gif 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>There are 100+ resources to learn all things data (science, engineering, analysis). It includes videos, courses, projects and can be filtered by tech stack (Python, SQL, Spark and etc), skill level (Beginner, Intermediate and so on) provider name or free/paid. So if you know other people who like staying up to date on all things data, please share Data Tinkerer with them!</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.datatinkerer.io/leaderboard?&amp;utm_source=post&quot;,&quot;text&quot;:&quot;Refer a friend&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.datatinkerer.io/leaderboard?&amp;utm_source=post"><span>Refer a friend</span></a></p><p>Without further ado, let&#8217;s get to the round up for December!</p><div><hr></div><h3>Data science &amp; AI</h3><ul><li><p><strong><a href="https://magazine.sebastianraschka.com/p/state-of-llms-2025?utm_source=datatinkerer.io&amp;utm_medium=newsletter">The State Of LLMs 2025: Progress, Problems, and Predictions</a> (34 minute read)<br></strong><span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Sebastian Raschka, PhD&quot;,&quot;id&quot;:27393275,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F61f4c017-506f-4e9b-a24f-76340dad0309_800x800.jpeg&quot;,&quot;uuid&quot;:&quot;eb6310b4-4c18-45bc-a961-994bb151f1ac&quot;}" data-component-name="MentionToDOM"></span> provides a great recap of main developments in 2025 and a couple of predictions for 2026 (like classical RAG slowly fading away)</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!XAEE!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c719740-3865-4e4a-a7de-b6a296df733c_1456x892.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!XAEE!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c719740-3865-4e4a-a7de-b6a296df733c_1456x892.webp 424w, https://substackcdn.com/image/fetch/$s_!XAEE!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c719740-3865-4e4a-a7de-b6a296df733c_1456x892.webp 848w, https://substackcdn.com/image/fetch/$s_!XAEE!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c719740-3865-4e4a-a7de-b6a296df733c_1456x892.webp 1272w, https://substackcdn.com/image/fetch/$s_!XAEE!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c719740-3865-4e4a-a7de-b6a296df733c_1456x892.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!XAEE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c719740-3865-4e4a-a7de-b6a296df733c_1456x892.webp" width="1456" height="892" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9c719740-3865-4e4a-a7de-b6a296df733c_1456x892.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:892,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:29522,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/webp&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/183495145?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c719740-3865-4e4a-a7de-b6a296df733c_1456x892.webp&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!XAEE!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c719740-3865-4e4a-a7de-b6a296df733c_1456x892.webp 424w, https://substackcdn.com/image/fetch/$s_!XAEE!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c719740-3865-4e4a-a7de-b6a296df733c_1456x892.webp 848w, https://substackcdn.com/image/fetch/$s_!XAEE!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c719740-3865-4e4a-a7de-b6a296df733c_1456x892.webp 1272w, https://substackcdn.com/image/fetch/$s_!XAEE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c719740-3865-4e4a-a7de-b6a296df733c_1456x892.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div></li><li><p><strong><a href="https://read.futureproofds.com/p/building-a-data-cleaning-agent-with-langgraph?utm_source=datatinkerer.io&amp;utm_medium=newsletter">Building a Data Cleaning Agent with LangGraph</a> (7 minute read)<br></strong><span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Andres Vourakis&quot;,&quot;id&quot;:135808578,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9ebf6fc-4ed6-47e1-938e-a1fa37a2347e_1601x1646.jpeg&quot;,&quot;uuid&quot;:&quot;c2d6e3c0-dced-406a-a9d3-3c6a5205ac34&quot;}" data-component-name="MentionToDOM"></span> shows how to build a LangGraph-based data cleaning agent that auto-generates, executes, and fixes Python cleaning code to cut down manual data prep.</p></li><li><p><strong><a href="https://www.leoniemonigatti.com/blog/memory-in-ai-agents.html?utm_source=datatinkerer.io&amp;utm_medium=newsletter">Making Sense of Memory in AI Agents</a></strong> <strong>(10 minute read)<br></strong>This post breaks down how different memory types (short-term, long-term, and structured) let AI agents retain context across steps so they can act coherently instead of responding statelessly.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!OOyv!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F24aa3bd7-be79-41eb-a754-1a5fa8ceaedc_1455x1009.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!OOyv!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F24aa3bd7-be79-41eb-a754-1a5fa8ceaedc_1455x1009.png 424w, https://substackcdn.com/image/fetch/$s_!OOyv!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F24aa3bd7-be79-41eb-a754-1a5fa8ceaedc_1455x1009.png 848w, https://substackcdn.com/image/fetch/$s_!OOyv!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F24aa3bd7-be79-41eb-a754-1a5fa8ceaedc_1455x1009.png 1272w, https://substackcdn.com/image/fetch/$s_!OOyv!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F24aa3bd7-be79-41eb-a754-1a5fa8ceaedc_1455x1009.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!OOyv!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F24aa3bd7-be79-41eb-a754-1a5fa8ceaedc_1455x1009.png" width="1455" height="1009" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/24aa3bd7-be79-41eb-a754-1a5fa8ceaedc_1455x1009.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1009,&quot;width&quot;:1455,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:425978,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/183495145?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F24aa3bd7-be79-41eb-a754-1a5fa8ceaedc_1455x1009.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!OOyv!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F24aa3bd7-be79-41eb-a754-1a5fa8ceaedc_1455x1009.png 424w, https://substackcdn.com/image/fetch/$s_!OOyv!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F24aa3bd7-be79-41eb-a754-1a5fa8ceaedc_1455x1009.png 848w, https://substackcdn.com/image/fetch/$s_!OOyv!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F24aa3bd7-be79-41eb-a754-1a5fa8ceaedc_1455x1009.png 1272w, https://substackcdn.com/image/fetch/$s_!OOyv!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F24aa3bd7-be79-41eb-a754-1a5fa8ceaedc_1455x1009.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div></li><li><p><strong><a href="https://towardsdatascience.com/exploring-tabpfn-a-foundation-model-built-for-tabular-data/?utm_source=datatinkerer.io&amp;utm_medium=newsletter">Exploring TabPFN: A Foundation Model Built for Tabular Data</a> (12 minute read)<br></strong>The article explores TabPFN, a foundation model pretrained on synthetic tasks that delivers strong tabular ML performance with near-zero tuning by reframing tabular prediction as conditional inference.</p></li><li><p><strong><a href="https://towardsdatascience.com/how-to-use-simple-data-contracts-in-python-for-data-scientists?utm_source=datatinkerer.io&amp;utm_medium=newsletter">How to Use Simple Data Contracts in Python for Data Scientists</a> (8 minute read)<br></strong>Eirik Berge walks through a lightweight data contract implementation in Python to catch schema breakages early</p></li><li><p><strong> <a href="https://www.anthropic.com/research/how-ai-is-transforming-work-at-anthropic">How AI is transforming work at Anthropic</a> (35 minute read)<br></strong>An interesting look at how engineers and researchers at Anthropic actually use AI day to day and which parts of their work it genuinely helps with.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!UkqY!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6d9cfb90-76ea-431a-bc72-01373989c923_3840x2160.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!UkqY!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6d9cfb90-76ea-431a-bc72-01373989c923_3840x2160.png 424w, https://substackcdn.com/image/fetch/$s_!UkqY!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6d9cfb90-76ea-431a-bc72-01373989c923_3840x2160.png 848w, https://substackcdn.com/image/fetch/$s_!UkqY!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6d9cfb90-76ea-431a-bc72-01373989c923_3840x2160.png 1272w, https://substackcdn.com/image/fetch/$s_!UkqY!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6d9cfb90-76ea-431a-bc72-01373989c923_3840x2160.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!UkqY!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6d9cfb90-76ea-431a-bc72-01373989c923_3840x2160.png" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6d9cfb90-76ea-431a-bc72-01373989c923_3840x2160.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:52637,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/183495145?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6d9cfb90-76ea-431a-bc72-01373989c923_3840x2160.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!UkqY!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6d9cfb90-76ea-431a-bc72-01373989c923_3840x2160.png 424w, https://substackcdn.com/image/fetch/$s_!UkqY!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6d9cfb90-76ea-431a-bc72-01373989c923_3840x2160.png 848w, https://substackcdn.com/image/fetch/$s_!UkqY!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6d9cfb90-76ea-431a-bc72-01373989c923_3840x2160.png 1272w, https://substackcdn.com/image/fetch/$s_!UkqY!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6d9cfb90-76ea-431a-bc72-01373989c923_3840x2160.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div></li></ul><ul><li><p><strong><a href="https://vercel.com/blog/we-removed-80-percent-of-our-agents-tools?utm_source=datatinkerer.io&amp;utm_medium=newsletter">We removed 80% of our agent&#8217;s tools</a> (4 minute read)<br></strong>Vercel rebuilt their text-to-SQL agent by stripping away complex tooling and giving Claude direct file-system access, discovering that fewer tools, better documentation and &#8216;doing less&#8217; made the agent faster, cheaper and more reliable.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!0zIw!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F321887da-7beb-4a82-987f-df165d353918_650x325.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!0zIw!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F321887da-7beb-4a82-987f-df165d353918_650x325.png 424w, https://substackcdn.com/image/fetch/$s_!0zIw!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F321887da-7beb-4a82-987f-df165d353918_650x325.png 848w, https://substackcdn.com/image/fetch/$s_!0zIw!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F321887da-7beb-4a82-987f-df165d353918_650x325.png 1272w, https://substackcdn.com/image/fetch/$s_!0zIw!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F321887da-7beb-4a82-987f-df165d353918_650x325.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!0zIw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F321887da-7beb-4a82-987f-df165d353918_650x325.png" width="650" height="325" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/321887da-7beb-4a82-987f-df165d353918_650x325.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:325,&quot;width&quot;:650,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:26756,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/183495145?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F321887da-7beb-4a82-987f-df165d353918_650x325.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!0zIw!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F321887da-7beb-4a82-987f-df165d353918_650x325.png 424w, https://substackcdn.com/image/fetch/$s_!0zIw!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F321887da-7beb-4a82-987f-df165d353918_650x325.png 848w, https://substackcdn.com/image/fetch/$s_!0zIw!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F321887da-7beb-4a82-987f-df165d353918_650x325.png 1272w, https://substackcdn.com/image/fetch/$s_!0zIw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F321887da-7beb-4a82-987f-df165d353918_650x325.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div></li></ul><div><hr></div><h3>Data engineering</h3><ul><li><p><strong><a href="https://www.ssp.sh/blog/omakase-data-stack/?utm_source=datatinkerer.io&amp;utm_medium=newsletter">Opinionated Data Platforms vs. Open-Source</a> (18 minute read)<br></strong>Good article by <span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Simon Sp&#228;ti&quot;,&quot;id&quot;:27855874,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/6fc84efb-1b87-4fb3-bfb1-076664f32de4_2199x2199.jpeg&quot;,&quot;uuid&quot;:&quot;81c59bed-93c1-4b47-8b9b-26bc07109243&quot;}" data-component-name="MentionToDOM"></span> breaking down the tradeoffs between open-source and &#8216;opinionated&#8217; and when it makes sense to go for the latter.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!V-4O!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0fe0f54-4b27-4634-aacb-6f48c4e8a983_2523x1267.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!V-4O!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0fe0f54-4b27-4634-aacb-6f48c4e8a983_2523x1267.png 424w, https://substackcdn.com/image/fetch/$s_!V-4O!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0fe0f54-4b27-4634-aacb-6f48c4e8a983_2523x1267.png 848w, https://substackcdn.com/image/fetch/$s_!V-4O!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0fe0f54-4b27-4634-aacb-6f48c4e8a983_2523x1267.png 1272w, https://substackcdn.com/image/fetch/$s_!V-4O!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0fe0f54-4b27-4634-aacb-6f48c4e8a983_2523x1267.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!V-4O!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0fe0f54-4b27-4634-aacb-6f48c4e8a983_2523x1267.png" width="1456" height="731" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c0fe0f54-4b27-4634-aacb-6f48c4e8a983_2523x1267.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:731,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1018856,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/183495145?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0fe0f54-4b27-4634-aacb-6f48c4e8a983_2523x1267.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!V-4O!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0fe0f54-4b27-4634-aacb-6f48c4e8a983_2523x1267.png 424w, https://substackcdn.com/image/fetch/$s_!V-4O!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0fe0f54-4b27-4634-aacb-6f48c4e8a983_2523x1267.png 848w, https://substackcdn.com/image/fetch/$s_!V-4O!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0fe0f54-4b27-4634-aacb-6f48c4e8a983_2523x1267.png 1272w, https://substackcdn.com/image/fetch/$s_!V-4O!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0fe0f54-4b27-4634-aacb-6f48c4e8a983_2523x1267.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div></li><li><p><strong><a href="https://dataengineeringcentral.substack.com/p/llms-for-pdf-data-pipelines?utm_source=datatinkerer.io&amp;utm_medium=newsletter">LLMs for {PDF} Data Pipelines</a> (8 minute read)<br></strong><span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Daniel Beach&quot;,&quot;id&quot;:21715962,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F81caaeec-9053-487c-a59c-ba5f8e4644ad_256x256.jpeg&quot;,&quot;uuid&quot;:&quot;a6ac8240-80c4-4811-b274-3c28f4f96ad7&quot;}" data-component-name="MentionToDOM"></span> experiments with using LLMs as part of a data pipeline, showing that agent-style PDF-to-JSON extraction can work in practice despite slowness and may be good enough for real-world automation.</p></li><li><p><strong><a href="https://seattledataguy.substack.com/p/snowflake-vs-databricks-is-the-wrong?utm_source=datatinkerer.io&amp;utm_medium=newsletter">Snowflake vs Databricks Is the Wrong Debate</a> (9 minute read)<br></strong><span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;SeattleDataGuy&quot;,&quot;id&quot;:4963622,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F1ec905aa-9a7b-4f21-b0ff-fec92e8916d1_512x512.jpeg&quot;,&quot;uuid&quot;:&quot;b95e0031-6145-49e2-aae2-1898e3288485&quot;}" data-component-name="MentionToDOM"></span> argues that the Snowflake vs Databricks debate is a distraction, with Databricks deliberately expanding role by role to own the full data stack and compete with cloud and enterprise platforms.</p></li><li><p><strong><a href="https://pipeline2insights.substack.com/p/data-quality-design-patterns-wap-awap?utm_source=datatinkerer.io&amp;utm_medium=newsletter">Data Quality Design Patterns</a> (11 minute read)<br></strong><span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Erfan Hesami&quot;,&quot;id&quot;:277538242,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!rcW2!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9e2692f-48e0-43a5-9f33-7eebb007bd6e_1641x1641.jpeg&quot;,&quot;uuid&quot;:&quot;024d3eec-b14c-4f30-92fd-7a250d572cdd&quot;}" data-component-name="MentionToDOM"></span> breaks down practical data quality design patterns like WAP, AWAP, TAP and signal tables, showing how teams balance safety, cost and speed to keep bad data out of production pipelines.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!2TeV!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F625b6fd0-4fe7-447b-9a11-a2dc9c44e621_1423x358.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!2TeV!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F625b6fd0-4fe7-447b-9a11-a2dc9c44e621_1423x358.webp 424w, https://substackcdn.com/image/fetch/$s_!2TeV!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F625b6fd0-4fe7-447b-9a11-a2dc9c44e621_1423x358.webp 848w, https://substackcdn.com/image/fetch/$s_!2TeV!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F625b6fd0-4fe7-447b-9a11-a2dc9c44e621_1423x358.webp 1272w, https://substackcdn.com/image/fetch/$s_!2TeV!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F625b6fd0-4fe7-447b-9a11-a2dc9c44e621_1423x358.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!2TeV!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F625b6fd0-4fe7-447b-9a11-a2dc9c44e621_1423x358.webp" width="1423" height="358" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/625b6fd0-4fe7-447b-9a11-a2dc9c44e621_1423x358.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:358,&quot;width&quot;:1423,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:14380,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/webp&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/183495145?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F625b6fd0-4fe7-447b-9a11-a2dc9c44e621_1423x358.webp&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!2TeV!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F625b6fd0-4fe7-447b-9a11-a2dc9c44e621_1423x358.webp 424w, https://substackcdn.com/image/fetch/$s_!2TeV!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F625b6fd0-4fe7-447b-9a11-a2dc9c44e621_1423x358.webp 848w, https://substackcdn.com/image/fetch/$s_!2TeV!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F625b6fd0-4fe7-447b-9a11-a2dc9c44e621_1423x358.webp 1272w, https://substackcdn.com/image/fetch/$s_!2TeV!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F625b6fd0-4fe7-447b-9a11-a2dc9c44e621_1423x358.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div></li><li><p><strong><a href="https://thepipeandtheline.substack.com/p/duckdb-the-swiss-army-knife-for-data?utm_source=datatinkerer.io&amp;utm_medium=newsletter">DuckDB: The Swiss Army Knife For Data Engineers</a> (8 minute read)<br></strong><span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Alejandro Aboy&quot;,&quot;id&quot;:22949723,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!u1Ao!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdca2c63d-9f5e-4cd3-99ac-7d8e71dc114b_1024x1024.jpeg&quot;,&quot;uuid&quot;:&quot;f97d74b7-0d12-49b2-929c-072c0400b984&quot;}" data-component-name="MentionToDOM"></span> argues that DuckDB can replace most pandas, Spark, and Airflow workflows by letting data engineers run fast, scalable analytics and ETL directly with SQL, zero infrastructure and minimal complexity.</p></li><li><p><strong><a href="https://www.datatinkerer.io/p/how-snap-rebuilt-its-ml-platform-to-handle-10000-daily-spark-jobs">How Snap Rebuilt Its ML Platform to Handle 10,000+ Daily Spark Jobs</a> (14 minute read)<br></strong>This post<strong> </strong>breaks down how Snap rebuilt its ML platform with a unified Spark layer to tame spiky workloads, standardise pipelines, and reliably run 10,000+ production jobs a day without blowing up clusters.</p></li></ul><div><hr></div><h3>Data analysis and visualisation</h3><ul><li><p><strong><a href="https://www.scientificdiscovery.dev/p/salonis-guide-to-data-visualization?utm_source=datatinkerer.io&amp;utm_medium=newsletter">Saloni&#8217;s guide to data visualization</a> (41 minute read)<br></strong>Great and comprehensive post by <span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Saloni Dattani&quot;,&quot;id&quot;:4267654,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3bc76721-fe9b-4edc-bd5b-de3869518c08_400x400.jpeg&quot;,&quot;uuid&quot;:&quot;8ad6a4c6-c6ca-4229-b0cd-fc658c40e0ad&quot;}" data-component-name="MentionToDOM"></span> where she distils data visualization down to first principles, showing how to choose charts, reduce clutter and design visuals that communicate insight instead of just decorating dashboards.</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!vUTS!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F568c84ce-9e5f-41c5-bda2-5f1e666ebc59_1456x1447.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!vUTS!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F568c84ce-9e5f-41c5-bda2-5f1e666ebc59_1456x1447.webp 424w, https://substackcdn.com/image/fetch/$s_!vUTS!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F568c84ce-9e5f-41c5-bda2-5f1e666ebc59_1456x1447.webp 848w, https://substackcdn.com/image/fetch/$s_!vUTS!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F568c84ce-9e5f-41c5-bda2-5f1e666ebc59_1456x1447.webp 1272w, https://substackcdn.com/image/fetch/$s_!vUTS!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F568c84ce-9e5f-41c5-bda2-5f1e666ebc59_1456x1447.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!vUTS!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F568c84ce-9e5f-41c5-bda2-5f1e666ebc59_1456x1447.webp" width="1456" height="1447" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/568c84ce-9e5f-41c5-bda2-5f1e666ebc59_1456x1447.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1447,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:160262,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/webp&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/183495145?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F568c84ce-9e5f-41c5-bda2-5f1e666ebc59_1456x1447.webp&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!vUTS!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F568c84ce-9e5f-41c5-bda2-5f1e666ebc59_1456x1447.webp 424w, https://substackcdn.com/image/fetch/$s_!vUTS!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F568c84ce-9e5f-41c5-bda2-5f1e666ebc59_1456x1447.webp 848w, https://substackcdn.com/image/fetch/$s_!vUTS!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F568c84ce-9e5f-41c5-bda2-5f1e666ebc59_1456x1447.webp 1272w, https://substackcdn.com/image/fetch/$s_!vUTS!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F568c84ce-9e5f-41c5-bda2-5f1e666ebc59_1456x1447.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><ul><li><p><strong><a href="https://dominicroye.github.io/blog/2025-12-14-broken-charts/?utm_source=datatinkerer.io&amp;utm_medium=newsletter">Broken Chart: discover 9 visualization alternatives</a> (10 minute read)<br></strong>Dominic Roy&#233; breaks down how common chart design mistakes distort interpretation, showing why many broken charts mislead viewers and how to fix them with clearer scales, context, and visual discipline.</p></li></ul><div><hr></div><h3><strong>Other interesting reads</strong></h3><ul><li><p><strong><a href="https://sqlpatterns.com/p/the-most-powerful-timeless-skill?utm_source=datatinkerer.io&amp;utm_medium=newsletter">The Most Useful, Timeless Skill to Learn as a Data Professional</a> (11 minute read)<br></strong><span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Ergest Xheblati&quot;,&quot;id&quot;:245231,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/60c1d86e-f97d-4deb-8991-0662b9a07922_1024x1536.png&quot;,&quot;uuid&quot;:&quot;0790edb2-ac76-4182-89e9-6124a782d1dc&quot;}" data-component-name="MentionToDOM"></span> makes the case that real impact in data comes from using leverage and not just more lines of code</p></li><li><p><strong><a href="https://joereis.substack.com/p/2026-general-thoughts-on-whats-ahead?utm_source=datatinkerer.io&amp;utm_medium=newsletter">2026 - General Thoughts on What&#8217;s Ahead</a> (6 minute read)<br></strong><span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Joe Reis&quot;,&quot;id&quot;:3531217,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6e4716b1-c223-41e3-b943-def0291bf217_1175x783.jpeg&quot;,&quot;uuid&quot;:&quot;d09c89e5-6573-494f-8c0d-f91c443c3063&quot;}" data-component-name="MentionToDOM"></span> thinks that 2026 will be a deliberately &#8216;boring&#8217; year where AI hype cools off and teams are forced to focus on fundamentals that actually make AI work.</p></li><li><p><strong><a href="https://wrongbutuseful.substack.com/p/the-next-data-bottleneck?utm_source=datatinkerer.io&amp;utm_medium=newsletter">The next data bottleneck</a> (11 minute read)<br></strong><span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Katie Bauer&quot;,&quot;id&quot;:5505029,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/25c27590-5e07-44b1-958d-0aaa70195e65_400x400.jpeg&quot;,&quot;uuid&quot;:&quot;bf47ca09-d373-4704-86a7-af9669aab428&quot;}" data-component-name="MentionToDOM"></span> argues that as tools and models get better, the real constraint shifts to human bottlenecks like decision-making, ownership and organisational ability to turn data into action.</p></li></ul><div><hr></div><h3><strong>Quick favor - need your take</strong></h3><div class="poll-embed" data-attrs="{&quot;id&quot;:428078}" data-component-name="PollToDOM"></div><p><strong>Was there any standout article or topic from November I missed? Feel free to drop a comment or hit reply, even a quick line helps.</strong></p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.datatinkerer.io/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">If you liked this post and don&#8217;t want to miss the next one, subscribe to Data Tinkerer!</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>If you are already subscribed and enjoyed the article, please give it a like and/or share it others, really appreciate it &#128591;</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.datatinkerer.io/p/how-expedia-monitors-1000-ab-tests-in-real-time-with-flink-and-kafka?utm_source=substack&amp;utm_medium=email&amp;utm_content=share&amp;action=share&amp;token=eyJ1c2VyX2lkIjoyOTE1OTA0NDIsInBvc3RfaWQiOjE2OTA5NDI3MywiaWF0IjoxNzU0NTE5MDY3LCJleHAiOjE3NTcxMTEwNjcsImlzcyI6InB1Yi0zNDIyNzQwIiwic3ViIjoicG9zdC1yZWFjdGlvbiJ9.oZvHOJmFWdVqE7IbG0eqLLsohZgpmGBltKU1W08ZN4c&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://www.datatinkerer.io/p/how-expedia-monitors-1000-ab-tests-in-real-time-with-flink-and-kafka?utm_source=substack&amp;utm_medium=email&amp;utm_content=share&amp;action=share&amp;token=eyJ1c2VyX2lkIjoyOTE1OTA0NDIsInBvc3RfaWQiOjE2OTA5NDI3MywiaWF0IjoxNzU0NTE5MDY3LCJleHAiOjE3NTcxMTEwNjcsImlzcyI6InB1Yi0zNDIyNzQwIiwic3ViIjoicG9zdC1yZWFjdGlvbiJ9.oZvHOJmFWdVqE7IbG0eqLLsohZgpmGBltKU1W08ZN4c"><span>Share</span></a></p><div><hr></div><h3>Keep learning</h3><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;738bc346-aa09-4943-a656-9c97ecf88686&quot;,&quot;caption&quot;:&quot;It's time for another data/AI roundup and here are the highlights from November&#128071;<br /><br />Data Science &amp;amp; AI<br />Context engineering becomes the real bottleneck for AI agents<br />Classic algorithms still beat most enterprise AI in ROI<br />A practical framework to identify true agentic use cases<br />Gemini 3 benefits from direct structured prompting<br /><br />Data Engineering<br />DuckLake revives relational metadata for lakehouses<br />Event streaming hits market saturation<br />Real-world consulting lessons point to simpler pipelines over hype<br />Dark data hoarding kills AI signal<br /><br />Data Analysis &amp;amp; BI<br />Dashboard testing gets a full end-to-end checklist<br />Guidance on balancing accuracy vs speed when answering business questions.<br /><br />Plus: AI-coded &#8220;good enough&#8221; apps shift the buy-vs-build boundary, low-tech industries become prime AI adopters as margins flip and new benchmark analysis suggests model performance is mostly general capability with a smaller &#8220;Claudiness&#8221; axis on top.&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;lg&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;What the Data Crowd Was Reading in November 2025&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:291590442,&quot;name&quot;:&quot;Data Tinkerer&quot;,&quot;bio&quot;:&quot;Ex-head of analytics sharing deep dives and learnings about AI and all things data (science, engineering, analysis)&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/83d3bbe0-5fb8-4f8d-9b74-036abbd6fec9_500x500.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2025-12-03T07:52:29.847Z&quot;,&quot;cover_image&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c31550f6-1fdf-4738-b384-2eeb55f71662_500x500.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.datatinkerer.io/p/what-the-data-crowd-was-reading-in-november-2025&quot;,&quot;section_name&quot;:&quot;Data Roundup&quot;,&quot;video_upload_id&quot;:null,&quot;id&quot;:180567973,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:13,&quot;comment_count&quot;:3,&quot;publication_id&quot;:3422740,&quot;publication_name&quot;:&quot;Data Tinkerer&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!JEdj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6bea5ccd-f356-4154-a1db-16268300510e_500x500.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;27385512-431a-4d66-acc8-78b85b942c01&quot;,&quot;caption&quot;:&quot;It's time for another data/AI roundup and here are the highlights from October&#128071;<br /><br />Data Science &amp;amp; AI<br />How Gradient Descent Works<br />Recursive Language Models<br />The Continual Learning Problem<br />Why Analytics Agents Break Differently<br /><br />Data Engineering<br />How Kafka Works<br />Data Modeling for the Agentic Era<br />You&#8217;ll Never Have a FAANG Data Infrastructure<br />Getting Started with OpenMetadata<br /><br />Data Analysis &amp;amp; BI<br />Jobs-to-be-Done: Designing dashboards for what users need to achieve.<br />From Dental Cleaning to Data Cleaning: Pivoting into healthcare analytics.<br /><br />Plus: Real AI Agents and Real Work, Taking the Bitter Lesson Seriously: Let AI optimize compute, not humans, OpenAI Is a Consumer Company, Import AI 431: Technological optimism meets appropriate fear&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;lg&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;What the Data Crowd Was Reading in October 2025&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:291590442,&quot;name&quot;:&quot;Data Tinkerer&quot;,&quot;bio&quot;:&quot;Ex-head of analytics sharing deep dives and learnings about AI and all things data (science, engineering, analysis)&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/83d3bbe0-5fb8-4f8d-9b74-036abbd6fec9_500x500.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2025-11-06T07:22:24.105Z&quot;,&quot;cover_image&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a00481e6-bc3b-4419-9304-ed408b193853_500x500.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.datatinkerer.io/p/what-the-data-crowd-was-reading-in&quot;,&quot;section_name&quot;:&quot;Data Roundup&quot;,&quot;video_upload_id&quot;:null,&quot;id&quot;:178132882,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:12,&quot;comment_count&quot;:2,&quot;publication_id&quot;:3422740,&quot;publication_name&quot;:&quot;Data Tinkerer&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!JEdj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6bea5ccd-f356-4154-a1db-16268300510e_500x500.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div>]]></content:encoded></item><item><title><![CDATA[How Uber Cut Data Lake Freshness From Hours to Minutes With Flink]]></title><description><![CDATA[Why Uber moved ingestion from Spark batch to Flink streaming and what it took to run thousands of jobs reliably at petabyte scale.]]></description><link>https://www.datatinkerer.io/p/how-uber-cut-data-lake-freshness-from-hours-to-minutes-with-flink</link><guid isPermaLink="false">https://www.datatinkerer.io/p/how-uber-cut-data-lake-freshness-from-hours-to-minutes-with-flink</guid><dc:creator><![CDATA[Data Tinkerer]]></dc:creator><pubDate>Fri, 02 Jan 2026 04:30:31 GMT</pubDate><enclosure url="https://images.unsplash.com/photo-1615929362369-4af8423ebd68?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxNHx8dWJlcnxlbnwwfHx8fDE3NjcwNzc2NDZ8MA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Fellow Data Tinkerers!</p><p>Today we will look at how Uber moved from batch to streaming in their data lake.</p><p>But before that, I wanted to share with you what you could unlock if you share Data Tinkerer with just <strong>1 more person</strong>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!05-P!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a2affdf-e44f-452e-b02a-93809ad29f60_800x402.gif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!05-P!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a2affdf-e44f-452e-b02a-93809ad29f60_800x402.gif 424w, https://substackcdn.com/image/fetch/$s_!05-P!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a2affdf-e44f-452e-b02a-93809ad29f60_800x402.gif 848w, https://substackcdn.com/image/fetch/$s_!05-P!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a2affdf-e44f-452e-b02a-93809ad29f60_800x402.gif 1272w, https://substackcdn.com/image/fetch/$s_!05-P!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a2affdf-e44f-452e-b02a-93809ad29f60_800x402.gif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!05-P!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a2affdf-e44f-452e-b02a-93809ad29f60_800x402.gif" width="800" height="402" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6a2affdf-e44f-452e-b02a-93809ad29f60_800x402.gif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:402,&quot;width&quot;:800,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:3369150,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/gif&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/182833470?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a2affdf-e44f-452e-b02a-93809ad29f60_800x402.gif&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!05-P!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a2affdf-e44f-452e-b02a-93809ad29f60_800x402.gif 424w, https://substackcdn.com/image/fetch/$s_!05-P!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a2affdf-e44f-452e-b02a-93809ad29f60_800x402.gif 848w, https://substackcdn.com/image/fetch/$s_!05-P!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a2affdf-e44f-452e-b02a-93809ad29f60_800x402.gif 1272w, https://substackcdn.com/image/fetch/$s_!05-P!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a2affdf-e44f-452e-b02a-93809ad29f60_800x402.gif 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>There are 100+ resources to learn all things data (science, engineering, analysis). It includes videos, courses, projects and can be filtered by tech stack (Python, SQL, Spark and etc), skill level (Beginner, Intermediate and so on) provider name or free/paid. So if you know other people who like staying up to date on all things data, please share Data Tinkerer with them!</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.datatinkerer.io/leaderboard?&amp;utm_source=post&quot;,&quot;text&quot;:&quot;Refer a friend&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.datatinkerer.io/leaderboard?&amp;utm_source=post"><span>Refer a friend</span></a></p><p>Now, with that out of the way, let&#8217;s get to Uber&#8217;s streaming solution</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://images.unsplash.com/photo-1615929362369-4af8423ebd68?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxNHx8dWJlcnxlbnwwfHx8fDE3NjcwNzc2NDZ8MA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://images.unsplash.com/photo-1615929362369-4af8423ebd68?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxNHx8dWJlcnxlbnwwfHx8fDE3NjcwNzc2NDZ8MA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 424w, https://images.unsplash.com/photo-1615929362369-4af8423ebd68?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxNHx8dWJlcnxlbnwwfHx8fDE3NjcwNzc2NDZ8MA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 848w, https://images.unsplash.com/photo-1615929362369-4af8423ebd68?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxNHx8dWJlcnxlbnwwfHx8fDE3NjcwNzc2NDZ8MA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 1272w, https://images.unsplash.com/photo-1615929362369-4af8423ebd68?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxNHx8dWJlcnxlbnwwfHx8fDE3NjcwNzc2NDZ8MA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 1456w" sizes="100vw"><img src="https://images.unsplash.com/photo-1615929362369-4af8423ebd68?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxNHx8dWJlcnxlbnwwfHx8fDE3NjcwNzc2NDZ8MA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080" width="4000" height="6000" data-attrs="{&quot;src&quot;:&quot;https://images.unsplash.com/photo-1615929362369-4af8423ebd68?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxNHx8dWJlcnxlbnwwfHx8fDE3NjcwNzc2NDZ8MA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:6000,&quot;width&quot;:4000,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;person holding black iphone 5&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="person holding black iphone 5" title="person holding black iphone 5" srcset="https://images.unsplash.com/photo-1615929362369-4af8423ebd68?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxNHx8dWJlcnxlbnwwfHx8fDE3NjcwNzc2NDZ8MA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 424w, https://images.unsplash.com/photo-1615929362369-4af8423ebd68?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxNHx8dWJlcnxlbnwwfHx8fDE3NjcwNzc2NDZ8MA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 848w, https://images.unsplash.com/photo-1615929362369-4af8423ebd68?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxNHx8dWJlcnxlbnwwfHx8fDE3NjcwNzc2NDZ8MA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 1272w, https://images.unsplash.com/photo-1615929362369-4af8423ebd68?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxNHx8dWJlcnxlbnwwfHx8fDE3NjcwNzc2NDZ8MA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Photo by <a href="https://unsplash.com/@tingeyinjurylawfirm">Tingey Injury Law Firm</a> on <a href="https://unsplash.com">Unsplash</a></figcaption></figure></div><h3>TL;DR</h3><div><hr></div><h4><strong>Situation</strong></h4><p>Batch-based ingestion meant data freshness was hours to days, slowing experimentation, analytics and ML across Uber&#8217;s core business domains.</p><h4><strong>Task</strong></h4><p>Move ingestion to minutes-level freshness at petabyte scale while lowering compute cost and keeping operations reliable across thousands of datasets.</p><h4><strong>Action</strong></h4><p>Built IngestionNext using Flink streaming from Kafka to Hudi, plus a control plane for operating ingestion at scale. Solved streaming bottlenecks (small files, partition skew, checkpoint vs commit alignment) to keep performance and correctness intact.</p><h4><strong>Result</strong></h4><ul><li><p>Freshness improved from hours to <strong>minutes-level</strong>.</p></li><li><p>Compute usage reduced by <strong>~25%</strong> vs batch ingestion.</p></li><li><p>Compaction performance improved by <strong>~10x</strong> with row-group merging.</p></li></ul><h4><strong>Use Cases</strong></h4><p>Near-real time analytics, personalisation, operational analytics</p><h4><strong>Tech Stack/Framework</strong></h4><p>Apache Spark, Apache Kafka, Apache Flink, Apache Hudi, Apache Parquet</p><div><hr></div><h3>Explained further</h3><div><hr></div><h4>Why data freshness became a platform priority at Uber?</h4><p>Uber&#8217;s data lake sits underneath a lot of the company&#8217;s analytics and machine learning. If a team wants to measure an experiment, monitor performance, train a model or sanity-check a business change, it usually starts with is the data in the lake yet?</p><p>Historically, ingestion into the lake was batch-based. Freshness was measured in hours. That was fine when decisions moved at daily report speed. It starts to hurt when the business wants near-real-time loops: faster experiments, faster model iteration, faster detection of issues.</p><p>Over the past year, the team built and validated <strong>IngestionNext</strong>, a new ingestion system that switches the default mindset from batch to streaming. It&#8217;s centered on Apache Flink, reads events from Kafka, writes to the data lake in Apache Hudi format and operates at petabyte scale. Along the way, they had to solve the stuff that makes streaming annoying in practice: small files, partition skew, checkpoint vs commit alignment and the operational problem of running thousands of jobs reliably.</p><div><hr></div><h4><strong>Why batch ingestion became a bottleneck?</strong></h4><p>Two main reasons: <strong>freshness</strong> and<strong> efficiency</strong>.</p><p><strong>Freshness</strong></p><p>As the business sped up, teams across Delivery, Rider, Mobility, Finance and Marketing Analytics kept asking the same thing: &#8220;Can we get the data sooner?&#8221;</p><p>Batch ingestion creates delays measured in hours and sometimes days. That lag slows down iteration and decision-making. In a world of continuous experimentation and fast model cycles, hours of latency is basically a tax on everything.</p><p>By moving ingestion to Flink-based streaming, the team reduced freshness from hours to minutes. That directly supports faster model launches, quicker experiments and more accurate analytics because the lake stays closer to what&#8217;s happening now.</p><p><strong>Efficiency</strong></p><p>Batch ingestion with Apache Spark is heavy by nature. Jobs run on a schedule, kick off distributed work at fixed intervals and keep doing that even when the workload is uneven. At Uber&#8217;s scale, with thousands of datasets and hundreds of petabytes, that adds up to hundreds of thousands of CPU cores running daily.</p><p>Streaming smooths this out. Instead of repeatedly spinning up large batch work, resources can scale with traffic in a more continuous way. Less overhead from scheduling, less big bang compute and more efficient usage overall.</p><div><hr></div><h4><strong>IngestionNext: A streaming ingestion platform built for scale</strong></h4><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!AYPR!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7f26c0c-b910-4450-916a-c2e372d9c9fa_768x349.avif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!AYPR!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7f26c0c-b910-4450-916a-c2e372d9c9fa_768x349.avif 424w, https://substackcdn.com/image/fetch/$s_!AYPR!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7f26c0c-b910-4450-916a-c2e372d9c9fa_768x349.avif 848w, https://substackcdn.com/image/fetch/$s_!AYPR!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7f26c0c-b910-4450-916a-c2e372d9c9fa_768x349.avif 1272w, https://substackcdn.com/image/fetch/$s_!AYPR!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7f26c0c-b910-4450-916a-c2e372d9c9fa_768x349.avif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!AYPR!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7f26c0c-b910-4450-916a-c2e372d9c9fa_768x349.avif" width="768" height="349" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f7f26c0c-b910-4450-916a-c2e372d9c9fa_768x349.avif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:349,&quot;width&quot;:768,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:15051,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/avif&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/182833470?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7f26c0c-b910-4450-916a-c2e372d9c9fa_768x349.avif&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!AYPR!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7f26c0c-b910-4450-916a-c2e372d9c9fa_768x349.avif 424w, https://substackcdn.com/image/fetch/$s_!AYPR!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7f26c0c-b910-4450-916a-c2e372d9c9fa_768x349.avif 848w, https://substackcdn.com/image/fetch/$s_!AYPR!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7f26c0c-b910-4450-916a-c2e372d9c9fa_768x349.avif 1272w, https://substackcdn.com/image/fetch/$s_!AYPR!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7f26c0c-b910-4450-916a-c2e372d9c9fa_768x349.avif 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">IngestionNext architecture (Source: Uber)</figcaption></figure></div><p>At the data plane, events land in Apache Kafka. Flink jobs consume those events and write them into the data lake using Apache Hudi. Hudi provides transactional behavior like commits, rollbacks and time travel. Freshness and completeness are measured end-to-end from source to sink, not just &#8220;did the job run.&#8221;</p><p>Operating ingestion at this scale is not a set it and forget it situation. So the team built a control plane focused on automation and safety. It manages the ingestion job lifecycle (create, deploy, restart, stop, delete), handles config changes and runs health verification. The goal is simple: run thousands of ingestion jobs consistently without turning the platform into a giant manual babysitting exercise.</p><p>The system also supports regional failover and fallback strategies. If there&#8217;s an outage, ingestion can shift across regions. If needed, jobs can temporarily fall back to batch mode so ingestion stays available and data is not lost.</p><div><hr></div><h4><strong>Solving the hard parts of streaming ingestion</strong></h4><p>Streaming buys freshness but it also introduces new failure modes. The team highlighted three major ones: <strong>small files</strong>, <strong>partition skew</strong> and <strong>checkpoint/commit synchronization</strong>.</p><p><strong>Small files</strong></p><p>Streaming writes data continuously. That tends to create lots of small Parquet files. Small files are a classic way to make query performance worse while also increasing metadata and storage overhead. You get fresher data, then you pay for it every time someone queries.</p><p>The common compaction approach merges Parquet files record by record. That means each file gets decompressed, decoded from columnar format into rows, merged, then encoded and compressed again. It works but it&#8217;s expensive and slow because you keep doing encode/decode work over and over.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!25HV!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c8bb140-efab-484a-be52-7a93cf9214ba_768x527.avif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!25HV!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c8bb140-efab-484a-be52-7a93cf9214ba_768x527.avif 424w, https://substackcdn.com/image/fetch/$s_!25HV!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c8bb140-efab-484a-be52-7a93cf9214ba_768x527.avif 848w, https://substackcdn.com/image/fetch/$s_!25HV!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c8bb140-efab-484a-be52-7a93cf9214ba_768x527.avif 1272w, https://substackcdn.com/image/fetch/$s_!25HV!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c8bb140-efab-484a-be52-7a93cf9214ba_768x527.avif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!25HV!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c8bb140-efab-484a-be52-7a93cf9214ba_768x527.avif" width="768" height="527" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7c8bb140-efab-484a-be52-7a93cf9214ba_768x527.avif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:527,&quot;width&quot;:768,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:31057,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/avif&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/182833470?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c8bb140-efab-484a-be52-7a93cf9214ba_768x527.avif&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!25HV!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c8bb140-efab-484a-be52-7a93cf9214ba_768x527.avif 424w, https://substackcdn.com/image/fetch/$s_!25HV!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c8bb140-efab-484a-be52-7a93cf9214ba_768x527.avif 848w, https://substackcdn.com/image/fetch/$s_!25HV!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c8bb140-efab-484a-be52-7a93cf9214ba_768x527.avif 1272w, https://substackcdn.com/image/fetch/$s_!25HV!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c8bb140-efab-484a-be52-7a93cf9214ba_768x527.avif 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Parquet file merging row by row (Source: Uber)</figcaption></figure></div><p>To fix this, the team introduced row-group-level merging. Instead of dropping down into row format, the merge operates directly on Parquet&#8217;s native columnar structure. That avoids the expensive recompression path and improves compaction performance by more than an order of magnitude, around 10x.</p><p>There are open-source efforts exploring schema-evolution-aware merging using padding and masking to align schemas but that comes with added implementation complexity and maintenance risk.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!eGhg!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe2dfa45-17b0-4679-9e59-439fe164f2d7_768x347.avif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!eGhg!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe2dfa45-17b0-4679-9e59-439fe164f2d7_768x347.avif 424w, https://substackcdn.com/image/fetch/$s_!eGhg!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe2dfa45-17b0-4679-9e59-439fe164f2d7_768x347.avif 848w, https://substackcdn.com/image/fetch/$s_!eGhg!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe2dfa45-17b0-4679-9e59-439fe164f2d7_768x347.avif 1272w, https://substackcdn.com/image/fetch/$s_!eGhg!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe2dfa45-17b0-4679-9e59-439fe164f2d7_768x347.avif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!eGhg!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe2dfa45-17b0-4679-9e59-439fe164f2d7_768x347.avif" width="768" height="347" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/fe2dfa45-17b0-4679-9e59-439fe164f2d7_768x347.avif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:347,&quot;width&quot;:768,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:5298,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/avif&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/182833470?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe2dfa45-17b0-4679-9e59-439fe164f2d7_768x347.avif&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!eGhg!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe2dfa45-17b0-4679-9e59-439fe164f2d7_768x347.avif 424w, https://substackcdn.com/image/fetch/$s_!eGhg!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe2dfa45-17b0-4679-9e59-439fe164f2d7_768x347.avif 848w, https://substackcdn.com/image/fetch/$s_!eGhg!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe2dfa45-17b0-4679-9e59-439fe164f2d7_768x347.avif 1272w, https://substackcdn.com/image/fetch/$s_!eGhg!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe2dfa45-17b0-4679-9e59-439fe164f2d7_768x347.avif 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Row-group merging with data masking (Source: Uber)</figcaption></figure></div><p>So the team took a simpler path: enforce schema consistency during merging. Only files with identical schema are merged together. No masking, no low-level code modifications, less engineering overhead and still faster, more efficient and more reliable compaction.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!vAV6!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3c4bbe0-5d7e-4e7e-ac99-706f52a1f0e9_768x375.avif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!vAV6!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3c4bbe0-5d7e-4e7e-ac99-706f52a1f0e9_768x375.avif 424w, https://substackcdn.com/image/fetch/$s_!vAV6!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3c4bbe0-5d7e-4e7e-ac99-706f52a1f0e9_768x375.avif 848w, https://substackcdn.com/image/fetch/$s_!vAV6!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3c4bbe0-5d7e-4e7e-ac99-706f52a1f0e9_768x375.avif 1272w, https://substackcdn.com/image/fetch/$s_!vAV6!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3c4bbe0-5d7e-4e7e-ac99-706f52a1f0e9_768x375.avif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!vAV6!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3c4bbe0-5d7e-4e7e-ac99-706f52a1f0e9_768x375.avif" width="768" height="375" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f3c4bbe0-5d7e-4e7e-ac99-706f52a1f0e9_768x375.avif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:375,&quot;width&quot;:768,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:6507,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/avif&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/182833470?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3c4bbe0-5d7e-4e7e-ac99-706f52a1f0e9_768x375.avif&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!vAV6!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3c4bbe0-5d7e-4e7e-ac99-706f52a1f0e9_768x375.avif 424w, https://substackcdn.com/image/fetch/$s_!vAV6!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3c4bbe0-5d7e-4e7e-ac99-706f52a1f0e9_768x375.avif 848w, https://substackcdn.com/image/fetch/$s_!vAV6!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3c4bbe0-5d7e-4e7e-ac99-706f52a1f0e9_768x375.avif 1272w, https://substackcdn.com/image/fetch/$s_!vAV6!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3c4bbe0-5d7e-4e7e-ac99-706f52a1f0e9_768x375.avif 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Simplified row-group merging by groping schema (Source: Uber)</figcaption></figure></div><p><strong>Partition skew</strong></p><p>Streaming ingestion depends on steady consumption from Kafka across Flink subtasks. The messy reality is that short-lived downstream slowdowns, like garbage collection pauses can unbalance consumption. Some partitions get read more than others. You end up with skew.</p><p>Skew doesn&#8217;t just look ugly on a dashboard. It can reduce compression efficiency and lead to slower queries downstream.</p><p>The fixes came from three angles:</p><ul><li><p><strong>Operational tuning:</strong> aligning Flink parallelism with Kafka partitions and adjusting fetch parameters.</p></li><li><p><strong>Connector-level fairness:</strong> adding mechanisms like round-robin polling, pause/resume for heavy partitions and per-partition quotas.</p></li><li><p><strong>Observability:</strong> exposing per-partition lag metrics, adding skew-aware autoscaling and setting targeted alerts.</p></li></ul><p>This is a good reminder that streaming issues often show up first as weird lag and then become why are queries slower now&#8221; If you can&#8217;t see skew clearly, you&#8217;ll chase symptoms forever.</p><p><strong>Checkpoint and commit synchronization</strong></p><p>Flink and Hudi each track progress but they track different things.</p><ul><li><p><strong>Flink checkpoints</strong> track consumed offsets.</p></li><li><p><strong>Hudi commits</strong> track writes.</p></li></ul><p>If failures happen and these drift out of sync, the system can skip data or duplicate it. In ingestion, either outcome is a serious problem.</p><p>The team solved this by extending Hudi commit metadata to embed Flink checkpoint IDs. With that linkage, recovery becomes deterministic during rollbacks or failovers. The system can reason about which checkpoint corresponds to which commit and recover without guessing.</p><div><hr></div><h4><strong>Production results: faster data with lower cost</strong></h4><p>The team onboarded datasets to the Flink-based ingestion platform and validated performance on some of Uber&#8217;s largest datasets.</p><p>The early results:</p><ul><li><p><strong>Freshness:</strong> improved from hours to <strong>minutes-level freshness</strong>.</p></li><li><p><strong>Efficiency:</strong> <strong>25% reduction in compute usage</strong> compared to batch ingestion.</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!HbzO!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F16e2433c-5ac8-447c-b7de-9562a68daebd_768x210.avif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!HbzO!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F16e2433c-5ac8-447c-b7de-9562a68daebd_768x210.avif 424w, https://substackcdn.com/image/fetch/$s_!HbzO!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F16e2433c-5ac8-447c-b7de-9562a68daebd_768x210.avif 848w, https://substackcdn.com/image/fetch/$s_!HbzO!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F16e2433c-5ac8-447c-b7de-9562a68daebd_768x210.avif 1272w, https://substackcdn.com/image/fetch/$s_!HbzO!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F16e2433c-5ac8-447c-b7de-9562a68daebd_768x210.avif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!HbzO!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F16e2433c-5ac8-447c-b7de-9562a68daebd_768x210.avif" width="768" height="210" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/16e2433c-5ac8-447c-b7de-9562a68daebd_768x210.avif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:210,&quot;width&quot;:768,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:7326,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/avif&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/182833470?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F16e2433c-5ac8-447c-b7de-9562a68daebd_768x210.avif&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!HbzO!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F16e2433c-5ac8-447c-b7de-9562a68daebd_768x210.avif 424w, https://substackcdn.com/image/fetch/$s_!HbzO!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F16e2433c-5ac8-447c-b7de-9562a68daebd_768x210.avif 848w, https://substackcdn.com/image/fetch/$s_!HbzO!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F16e2433c-5ac8-447c-b7de-9562a68daebd_768x210.avif 1272w, https://substackcdn.com/image/fetch/$s_!HbzO!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F16e2433c-5ac8-447c-b7de-9562a68daebd_768x210.avif 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption">Before and after streaming ingestion (Source: Uber)</figcaption></figure></div><div><hr></div><h4><strong>Extending real-time beyond ingestion</strong></h4><p>IngestionNext improves ingestion latency from online Kafka into the offline raw data lake. That&#8217;s a big step but it&#8217;s not the full story.</p><p>Freshness still stalls downstream in transformation and analytics layers. If ingestion is minutes but transformation is still slow, the point of decision is still stale.</p><p>The next frontier for Uber is extending real-time capability end-to-end: <strong>ingestion &#8594; transformation &#8594; real-time insights and analytics</strong>. This matters because Uber&#8217;s lake powers a long list of domains: Delivery, Mobility, Machine Learning, Rider, Marketplace, Maps, Finance and Marketing Analytics. Freshness is a cross-cutting requirement.</p><div><hr></div><h4><strong>Conclusion</strong></h4><p>Uber&#8217;s shift from batch to streaming ingestion is a meaningful platform milestone. By re-architecting ingestion around Apache Flink, IngestionNext delivers fresher data, stronger reliability and scalable efficiency across a petabyte-scale lake.</p><p>The design is not just run Flink jobs. It includes operational foundations like an automated control plane, resiliency strategies and streaming-specific engineering work: faster compaction via row-group merging, skew controls and deterministic recovery by linking Flink checkpoints to Hudi commits.</p><p>The bigger idea is the mindset shift: treating freshness as a first-class dimension of data quality. With IngestionNext proven in production, the next push is clear: bring streaming into downstream transformation and analytics so the company can close the real-time loop, not just ingest data faster.</p><div><hr></div><h3>The full scoop</h3><p>To learn more about this, check <a href="https://www.uber.com/en-AU/blog/from-batch-to-streaming-accelerating-data-freshness-in-ubers-data-lake/">Uber's Engineering Blog</a> post on this topic</p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.datatinkerer.io/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">If you liked this post and don&#8217;t want to miss the next one, subscribe to Data Tinkerer!</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>If you are already subscribed and enjoyed the article, please give it a like and/or share it others, really appreciate it &#128591;</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.datatinkerer.io/p/how-uber-cut-data-lake-freshness-from-hours-to-minutes-with-flink?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.datatinkerer.io/p/how-uber-cut-data-lake-freshness-from-hours-to-minutes-with-flink?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><div><hr></div><h3>Keep learning</h3><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;1904c23f-5462-4150-9c60-b6ad712234b6&quot;,&quot;caption&quot;:&quot;How do you keep ML teams fast when every experiment blasts your Spark cluster with spiky workloads, huge datasets and five different file formats?<br /><br />Snap&#8217;s answer: Prism, a unified Spark platform that hides infra pain, standardises pipelines and supports everything from ad-hoc exploration to 10k+ daily jobs in production.<br /><br />This post breaks down why raw Spark wasn&#8217;t enough, what Prism fixes and how Snap rebuilt their ML data layer without ditching Spark.&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;lg&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;How Snap Rebuilt Its ML Platform to Handle 10,000+ Daily Spark Jobs&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:291590442,&quot;name&quot;:&quot;Data Tinkerer&quot;,&quot;bio&quot;:&quot;Ex-head of analytics sharing deep dives and learnings about AI and all things data (science, engineering, analysis)&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/83d3bbe0-5fb8-4f8d-9b74-036abbd6fec9_500x500.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2025-11-20T04:59:47.340Z&quot;,&quot;cover_image&quot;:&quot;https://images.unsplash.com/photo-1620396749162-d61bdb715793?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw1fHxzbmFwY2hhdHxlbnwwfHx8fDE3NjM1Mjk0MDZ8MA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.datatinkerer.io/p/how-snap-rebuilt-its-ml-platform-to-handle-10000-daily-spark-jobs&quot;,&quot;section_name&quot;:&quot;Data Engineering&quot;,&quot;video_upload_id&quot;:null,&quot;id&quot;:179211962,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:9,&quot;comment_count&quot;:0,&quot;publication_id&quot;:3422740,&quot;publication_name&quot;:&quot;Data Tinkerer&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!JEdj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6bea5ccd-f356-4154-a1db-16268300510e_500x500.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;8b717c15-913f-4e54-91a7-fb3f26e15721&quot;,&quot;caption&quot;:&quot;How do you keep data fresh for millions of merchants when you&#8217;re streaming from 100+ MySQL shards?<br /><br />Shopify&#8217;s answer: a 400TB Change Data Capture platform that pushes up to 100k events a second.<br /><br />This post dives into the trade-offs, the challenges and the lessons learned from building CDC at scale.&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;lg&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;How Shopify Uses Change Data Capture to Serve Millions of Merchants&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:291590442,&quot;name&quot;:&quot;Data Tinkerer&quot;,&quot;bio&quot;:&quot;Ex-head of analytics sharing deep dives and learnings about AI and all things data (science, engineering, analysis)&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/83d3bbe0-5fb8-4f8d-9b74-036abbd6fec9_500x500.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2025-09-18T07:53:42.206Z&quot;,&quot;cover_image&quot;:&quot;https://images.unsplash.com/photo-1730818874996-dea4bddf5554?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw1fHxzaG9waWZ5fGVufDB8fHx8MTc1ODE4MDY0NHww&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.datatinkerer.io/p/how-shopify-uses-change-data-capture-to-serve-millions-of-merchants&quot;,&quot;section_name&quot;:&quot;Data Engineering&quot;,&quot;video_upload_id&quot;:null,&quot;id&quot;:173822667,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:10,&quot;comment_count&quot;:0,&quot;publication_id&quot;:3422740,&quot;publication_name&quot;:&quot;Data Tinkerer&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!JEdj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6bea5ccd-f356-4154-a1db-16268300510e_500x500.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div>]]></content:encoded></item><item><title><![CDATA[What is Data Governance? A Practical Guide to Building Trustworthy Data in the Age of AI]]></title><description><![CDATA[From unclear ownership to missing standards, Charlotte Ledoux breaks down the simple governance practices that help organisations trust their data and ship faster.]]></description><link>https://www.datatinkerer.io/p/what-is-data-governance-a-practical-guide</link><guid isPermaLink="false">https://www.datatinkerer.io/p/what-is-data-governance-a-practical-guide</guid><dc:creator><![CDATA[Data Tinkerer]]></dc:creator><pubDate>Thu, 11 Dec 2025 04:01:53 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/71bf29d1-b0c7-4d21-9ce4-22fa2a91c4dd_760x546.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Fellow Data Tinkerers</p><p>Today I will be talking with <span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Charlotte Ledoux&quot;,&quot;id&quot;:30007326,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!TWOB!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc8b9f229-e6b2-4839-bc95-ddb75482793e_750x750.jpeg&quot;,&quot;uuid&quot;:&quot;c9425cfe-0db2-443a-8739-d6932d500d30&quot;}" data-component-name="MentionToDOM"></span> who writes the <em>The Data Governance Playbook</em> newsletter and works with companies on implementing data governance.</p><div class="embedded-publication-wrap" data-attrs="{&quot;id&quot;:2433880,&quot;name&quot;:&quot;The Data Governance Playbook&quot;,&quot;logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!74lv!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2d6350a0-688c-4368-9643-0cb5b7adc910_383x383.png&quot;,&quot;base_url&quot;:&quot;https://thedatagovernanceplaybook.substack.com&quot;,&quot;hero_text&quot;:&quot;Your go-to resource for mastering Data Governance through practical tips, expert insights, and a touch of humour !&quot;,&quot;author_name&quot;:&quot;Charlotte Ledoux&quot;,&quot;show_subscribe&quot;:true,&quot;logo_bg_color&quot;:&quot;#ffffff&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="EmbeddedPublicationToDOMWithSubscribe"><div class="embedded-publication show-subscribe"><a class="embedded-publication-link-part" native="true" href="https://thedatagovernanceplaybook.substack.com?utm_source=substack&amp;utm_campaign=publication_embed&amp;utm_medium=web"><img class="embedded-publication-logo" src="https://substackcdn.com/image/fetch/$s_!74lv!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2d6350a0-688c-4368-9643-0cb5b7adc910_383x383.png" width="56" height="56" style="background-color: rgb(255, 255, 255);"><span class="embedded-publication-name">The Data Governance Playbook</span><div class="embedded-publication-hero-text">Your go-to resource for mastering Data Governance through practical tips, expert insights, and a touch of humour !</div><div class="embedded-publication-author-name">By Charlotte Ledoux</div></a><form class="embedded-publication-subscribe" method="GET" action="https://thedatagovernanceplaybook.substack.com/subscribe?"><input type="hidden" name="source" value="publication-embed"><input type="hidden" name="autoSubmit" value="true"><input type="email" class="email-input" name="email" placeholder="Type your email..."><input type="submit" class="button primary" value="Subscribe"></form></div></div><p>I discovered her work through the <a href="https://www.whoisthebestcdo.com">CDO game</a> (worth trying if you haven&#8217;t!). It reminded me how often data governance is misunderstood, despite becoming essential as AI takes off.</p><p>We talked about her move from analytics to governance, how the real value comes from clarity and ownership rather than tools and why the smartest governance programs start with listening long before they start with policies.</p><p>So without further ado, let&#8217;s get into it!</p><div><hr></div><h4><strong>Can you tell us about your role?</strong></h4><p>I&#8217;m a Data &amp; AI Governance expert. In practice, that means I make sure an organisation&#8217;s data is trustworthy, secure and responsibly used, especially as AI adoption accelerates. I help define the roles, responsibilities, processes and tools that state how data is collected, shared, protected and used so that teams can innovate with confidence rather than chaos.</p><div><hr></div><h4><strong>How did you break into data governance?</strong></h4><p>Before specializing in data governance, I worked more hands-on in the data ecosystem : collaborating with data teams on data science, analytics and data strategy. Over time, I realized that the biggest blockers to effective data use weren&#8217;t tools or skills but rather unclear ownership, missing standards and a lack of trust. </p><p>Governance drew me in because it sits at the intersection of strategy, quality, ethics and business value. It&#8217;s the discipline that creates the structure needed for data to actually deliver impact.</p><div class="pullquote"><p><em><strong>Charlotte&#8217;s path</strong></em></p><p><em><strong>data analytics &#8594; data strategy &#8594; data governance</strong></em></p></div><h4><strong>So what is data governance? How do you explain it simply?</strong></h4><p>Data governance is the framework that ensures data is reliable, secure and used appropriately. It defines the rules, responsibilities and processes that allow an organization to manage data (and now AI!) in a controlled and value-driven way.</p><p>A simpler version I often use: it&#8217;s about enabling people to do great things with data.</p><div class="pullquote"><p><em><strong>It defines the rules, responsibilities and processes that allow an organization to manage data in a controlled and value-driven way.</strong></em></p></div><h4><strong>What&#8217;s a common misunderstanding about data governance?</strong></h4>
      <p>
          <a href="https://www.datatinkerer.io/p/what-is-data-governance-a-practical-guide">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[What the Data Crowd Was Reading in November 2025]]></title><description><![CDATA[Tools, techniques and deep dives worth reading that I came across in November 2025.]]></description><link>https://www.datatinkerer.io/p/what-the-data-crowd-was-reading-in-november-2025</link><guid isPermaLink="false">https://www.datatinkerer.io/p/what-the-data-crowd-was-reading-in-november-2025</guid><dc:creator><![CDATA[Data Tinkerer]]></dc:creator><pubDate>Wed, 03 Dec 2025 07:52:29 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/c31550f6-1fdf-4738-b384-2eeb55f71662_500x500.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Fellow Data Tinkerers</p><p>It&#8217;s time for another round-up on all things data!</p><p>But before that, I wanted to share with you what you could unlock if you share Data Tinkerer with just <strong>1 more person</strong>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!k7yZ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbb3ffdb9-87b8-43ce-a123-5ac639dcef82_800x402.gif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!k7yZ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbb3ffdb9-87b8-43ce-a123-5ac639dcef82_800x402.gif 424w, https://substackcdn.com/image/fetch/$s_!k7yZ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbb3ffdb9-87b8-43ce-a123-5ac639dcef82_800x402.gif 848w, https://substackcdn.com/image/fetch/$s_!k7yZ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbb3ffdb9-87b8-43ce-a123-5ac639dcef82_800x402.gif 1272w, https://substackcdn.com/image/fetch/$s_!k7yZ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbb3ffdb9-87b8-43ce-a123-5ac639dcef82_800x402.gif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!k7yZ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbb3ffdb9-87b8-43ce-a123-5ac639dcef82_800x402.gif" width="800" height="402" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/bb3ffdb9-87b8-43ce-a123-5ac639dcef82_800x402.gif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:402,&quot;width&quot;:800,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:3369150,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/gif&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/180567973?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbb3ffdb9-87b8-43ce-a123-5ac639dcef82_800x402.gif&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!k7yZ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbb3ffdb9-87b8-43ce-a123-5ac639dcef82_800x402.gif 424w, https://substackcdn.com/image/fetch/$s_!k7yZ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbb3ffdb9-87b8-43ce-a123-5ac639dcef82_800x402.gif 848w, https://substackcdn.com/image/fetch/$s_!k7yZ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbb3ffdb9-87b8-43ce-a123-5ac639dcef82_800x402.gif 1272w, https://substackcdn.com/image/fetch/$s_!k7yZ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbb3ffdb9-87b8-43ce-a123-5ac639dcef82_800x402.gif 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>There are 100+ resources to learn all things data (science, engineering, analysis). It includes videos, courses, projects and can be filtered by tech stack (Python, SQL, Spark and etc), skill level (Beginner, Intermediate and so on) provider name or free/paid. So if you know other people who like staying up to date on all things data, please share Data Tinkerer with them!</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.datatinkerer.io/leaderboard?&amp;utm_source=post&quot;,&quot;text&quot;:&quot;Refer a friend&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.datatinkerer.io/leaderboard?&amp;utm_source=post"><span>Refer a friend</span></a></p><p>Without further ado, let&#8217;s get to the round up for November.</p>
      <p>
          <a href="https://www.datatinkerer.io/p/what-the-data-crowd-was-reading-in-november-2025">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[How Snap Rebuilt Its ML Platform to Handle 10,000+ Daily Spark Jobs]]></title><description><![CDATA[Inside Prism, the system that turned scattered Spark workflows into a unified, ML-ready platform.]]></description><link>https://www.datatinkerer.io/p/how-snap-rebuilt-its-ml-platform-to-handle-10000-daily-spark-jobs</link><guid isPermaLink="false">https://www.datatinkerer.io/p/how-snap-rebuilt-its-ml-platform-to-handle-10000-daily-spark-jobs</guid><dc:creator><![CDATA[Data Tinkerer]]></dc:creator><pubDate>Thu, 20 Nov 2025 04:59:47 GMT</pubDate><enclosure url="https://images.unsplash.com/photo-1620396749162-d61bdb715793?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw1fHxzbmFwY2hhdHxlbnwwfHx8fDE3NjM1Mjk0MDZ8MA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Fellow Data Tinkerers!</p><p>Today we will look at how Snap unified Spark, ML workflows and 10k+ daily jobs under one platform.</p><p>But before that, I wanted to share with you what you could unlock if you share Data Tinkerer with just <strong>1 more person</strong>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!TBRd!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F177f23c2-03d2-4ec5-bc67-33e5bb8b23fd_800x402.gif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!TBRd!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F177f23c2-03d2-4ec5-bc67-33e5bb8b23fd_800x402.gif 424w, https://substackcdn.com/image/fetch/$s_!TBRd!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F177f23c2-03d2-4ec5-bc67-33e5bb8b23fd_800x402.gif 848w, https://substackcdn.com/image/fetch/$s_!TBRd!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F177f23c2-03d2-4ec5-bc67-33e5bb8b23fd_800x402.gif 1272w, https://substackcdn.com/image/fetch/$s_!TBRd!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F177f23c2-03d2-4ec5-bc67-33e5bb8b23fd_800x402.gif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!TBRd!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F177f23c2-03d2-4ec5-bc67-33e5bb8b23fd_800x402.gif" width="800" height="402" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/177f23c2-03d2-4ec5-bc67-33e5bb8b23fd_800x402.gif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:402,&quot;width&quot;:800,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:3369150,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/gif&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/179211962?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F177f23c2-03d2-4ec5-bc67-33e5bb8b23fd_800x402.gif&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!TBRd!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F177f23c2-03d2-4ec5-bc67-33e5bb8b23fd_800x402.gif 424w, https://substackcdn.com/image/fetch/$s_!TBRd!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F177f23c2-03d2-4ec5-bc67-33e5bb8b23fd_800x402.gif 848w, https://substackcdn.com/image/fetch/$s_!TBRd!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F177f23c2-03d2-4ec5-bc67-33e5bb8b23fd_800x402.gif 1272w, https://substackcdn.com/image/fetch/$s_!TBRd!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F177f23c2-03d2-4ec5-bc67-33e5bb8b23fd_800x402.gif 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>There are 100+ resources to learn all things data (science, engineering, analysis). It includes videos, courses, projects and can be filtered by tech stack (Python, SQL, Spark and etc), skill level (Beginner, Intermediate and so on) provider name or free/paid. So if you know other people who like staying up to date on all things data, please share Data Tinkerer with them!</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.datatinkerer.io/leaderboard?&amp;utm_source=post&quot;,&quot;text&quot;:&quot;Refer a friend&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.datatinkerer.io/leaderboard?&amp;utm_source=post"><span>Refer a friend</span></a></p><p>Now, with that out of the way, let&#8217;s get to Snap&#8217;s ML platform transformation.</p>
      <p>
          <a href="https://www.datatinkerer.io/p/how-snap-rebuilt-its-ml-platform-to-handle-10000-daily-spark-jobs">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[From Data Analyst to Senior DS Manager at Skyscanner]]></title><description><![CDATA[How a mechanical engineer found data through robotics. Data led to modelling. Modelling led to managing teams at Skyscanner.]]></description><link>https://www.datatinkerer.io/p/from-data-analyst-to-senior-ds-manager-at-skyscanner</link><guid isPermaLink="false">https://www.datatinkerer.io/p/from-data-analyst-to-senior-ds-manager-at-skyscanner</guid><dc:creator><![CDATA[Data Tinkerer]]></dc:creator><pubDate>Thu, 13 Nov 2025 03:54:26 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/06735d58-e8f2-4106-88ae-efe0658c217c_764x661.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Fellow Data Tinkerers,</p><p>Following on from previous posts talking to people in the field, today I will be talking with <span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Jose Parre&#241;o Garcia&quot;,&quot;id&quot;:255728031,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!h_mv!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c4dad41-478b-4960-a5e0-98ed1e54657e_1168x1046.jpeg&quot;,&quot;uuid&quot;:&quot;fe93d583-b52c-4160-878b-e32c4f822419&quot;}" data-component-name="MentionToDOM"></span> who is a Senior Data Science Manager at Skyscanner and writer of the <em>Senior Data Science Lead</em> newsletter.</p><div class="embedded-publication-wrap" data-attrs="{&quot;id&quot;:2833541,&quot;name&quot;:&quot;Senior Data Science Lead&quot;,&quot;logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!t4IN!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbbe3704e-4589-40b2-bbb8-007336c4f09a_990x990.png&quot;,&quot;base_url&quot;:&quot;https://joseparreogarcia.substack.com&quot;,&quot;hero_text&quot;:&quot;Helping managers build world-class teams, data professionals master storytelling and guiding those looking to break into Data Science. I have built teams from scratch and lead 50+ data scientists. Now, I share my experience with you.&quot;,&quot;author_name&quot;:&quot;Jose Parre&#241;o Garcia&quot;,&quot;show_subscribe&quot;:true,&quot;logo_bg_color&quot;:&quot;#ffffff&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="EmbeddedPublicationToDOMWithSubscribe"><div class="embedded-publication show-subscribe"><a class="embedded-publication-link-part" native="true" href="https://joseparreogarcia.substack.com?utm_source=substack&amp;utm_campaign=publication_embed&amp;utm_medium=web"><img class="embedded-publication-logo" src="https://substackcdn.com/image/fetch/$s_!t4IN!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbbe3704e-4589-40b2-bbb8-007336c4f09a_990x990.png" width="56" height="56" style="background-color: rgb(255, 255, 255);"><span class="embedded-publication-name">Senior Data Science Lead</span><div class="embedded-publication-hero-text">Helping managers build world-class teams, data professionals master storytelling and guiding those looking to break into Data Science. I have built teams from scratch and lead 50+ data scientists. Now, I share my experience with you.</div><div class="embedded-publication-author-name">By Jose Parre&#241;o Garcia</div></a><form class="embedded-publication-subscribe" method="GET" action="https://joseparreogarcia.substack.com/subscribe?"><input type="hidden" name="source" value="publication-embed"><input type="hidden" name="autoSubmit" value="true"><input type="email" class="email-input" name="email" placeholder="Type your email..."><input type="submit" class="button primary" value="Subscribe"></form></div></div><p>We talked about his rise from data analyst to Senior DS Manager at Skyscanner, what &#8220;production-ready&#8221; really means and why the real intelligence in data science lives before and after the model.</p><p>So without further ado, let&#8217;s get into it!</p>
      <p>
          <a href="https://www.datatinkerer.io/p/from-data-analyst-to-senior-ds-manager-at-skyscanner">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[What the Data Crowd Was Reading in October 2025]]></title><description><![CDATA[Tools, techniques and deep dives worth reading that I came across in October 2025.]]></description><link>https://www.datatinkerer.io/p/what-the-data-crowd-was-reading-in</link><guid isPermaLink="false">https://www.datatinkerer.io/p/what-the-data-crowd-was-reading-in</guid><dc:creator><![CDATA[Data Tinkerer]]></dc:creator><pubDate>Thu, 06 Nov 2025 07:22:24 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/a00481e6-bc3b-4419-9304-ed408b193853_500x500.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Fellow Data Tinkerers</p><p>It&#8217;s time for another round-up on all things data!</p><p>But before that, I wanted to share with you what you could unlock if you share Data Tinkerer with just <strong>1 more person</strong>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!OR1N!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F918f7663-13a5-41a8-bdaa-35c9ac058e66_800x402.gif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!OR1N!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F918f7663-13a5-41a8-bdaa-35c9ac058e66_800x402.gif 424w, https://substackcdn.com/image/fetch/$s_!OR1N!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F918f7663-13a5-41a8-bdaa-35c9ac058e66_800x402.gif 848w, https://substackcdn.com/image/fetch/$s_!OR1N!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F918f7663-13a5-41a8-bdaa-35c9ac058e66_800x402.gif 1272w, https://substackcdn.com/image/fetch/$s_!OR1N!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F918f7663-13a5-41a8-bdaa-35c9ac058e66_800x402.gif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!OR1N!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F918f7663-13a5-41a8-bdaa-35c9ac058e66_800x402.gif" width="800" height="402" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/918f7663-13a5-41a8-bdaa-35c9ac058e66_800x402.gif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:402,&quot;width&quot;:800,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:3369150,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/gif&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/178132882?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F918f7663-13a5-41a8-bdaa-35c9ac058e66_800x402.gif&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!OR1N!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F918f7663-13a5-41a8-bdaa-35c9ac058e66_800x402.gif 424w, https://substackcdn.com/image/fetch/$s_!OR1N!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F918f7663-13a5-41a8-bdaa-35c9ac058e66_800x402.gif 848w, https://substackcdn.com/image/fetch/$s_!OR1N!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F918f7663-13a5-41a8-bdaa-35c9ac058e66_800x402.gif 1272w, https://substackcdn.com/image/fetch/$s_!OR1N!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F918f7663-13a5-41a8-bdaa-35c9ac058e66_800x402.gif 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>There are 100+ resources to learn all things data (science, engineering, analysis). It includes videos, courses, projects and can be filtered by tech stack (Python, SQL, Spark and etc), skill level (Beginner, Intermediate and so on) provider name or free/paid. So if you know other people who like staying up to date on all things data, please share Data Tinkerer with them!</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.datatinkerer.io/leaderboard?&amp;utm_source=post&quot;,&quot;text&quot;:&quot;Refer a friend&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.datatinkerer.io/leaderboard?&amp;utm_source=post"><span>Refer a friend</span></a></p><p>Without further ado, let&#8217;s get to the round up for October.</p>
      <p>
          <a href="https://www.datatinkerer.io/p/what-the-data-crowd-was-reading-in">
              Read more
          </a>
      </p>
   ]]></content:encoded></item></channel></rss>