What the Data Crowd Was Reading in January 2026
Tools, techniques and deep dives worth reading that I came across in January 2026.
Fellow Data Tinkerers
It’s time for another round-up on all things data and AI!
But before that, I wanted to share with you what you could unlock if you share Data Tinkerer with just 1 more person.
There are 100+ resources to learn all things data (science, engineering, analysis). It includes videos, courses, projects and can be filtered by tech stack (Python, SQL, Spark and etc), skill level (Beginner, Intermediate and so on) provider name or free/paid. So if you know other people who like staying up to date on all things data, please share Data Tinkerer with them!
Without further ado, let’s get to the round up for January!
Data science & AI
Agent design patterns (8 minute read)
Anthropic engineer provides a grounded guide to designing AI agents that separates real, reliable architectures from overcomplicated agent hype that doesn’t survive production.Use Agents or Be Left Behind? A Personal Guide to Automating Your Own Work (31 minute read)
Tim Dettmers cuts through the agent hype, arguing the real value isn’t autonomous magic but practical agents that reliably coordinate tools, memory and execution.Piecewise Regression for Time Series Forecasting (7 minute read)
Rami Krispin shares a practical walkthrough of using piecewise regression on time series to detect structural breaks, regime changes and trend shifts that single global models tend to smooth over.AI is Hitting a Measurement Wall (27 minute read)
Devansh Makes the case that today’s AI benchmarks are saturated and misleading, masking the growing gap between model performance on tests and value in real applications.Drift Detection in Robust Machine Learning Systems (18 minute read)
The article shows how unnoticed drift can quietly degrade model performance and outlines practical techniques to detect it early in production.8 plots that explain the state of open models (7 minute read)
Eight charts by Interconnects AI cut through the noise to show that Chinese open models, led by Qwen, dominate real-world adoption and benchmarks, while Western challengers only compete at the very top end.LLMs as Judges: Measuring Bias, Hinting Effects, and Tier Preferences (10 minute read)
Aashi and Sayak examine when LLMs can act as evaluators, showing how bias, prompt framing and hinting can distort model-as-judge benchmarks.How to Build a Recommendation System at Scale: Insights from Instacart (10 minute read)
A practical walk-through of how large-scale recommendation systems are actually built in production by Ahsaas Bajaj , covering modeling choices and the tradeoffs that matter once you move past toy examples.
Data engineering
I spent 5 hours learning Unity Catalog. Here’s everything you need to know (10 minute read)
Vu Trinh provides a breakdown of how Databricks’ open-sourced Unity Catalog works under the hood.Databricks Lakeflow vs Apache Airflow (13 minute read)
A candid comparison by Daniel Beach showing how Databricks Lakeflow trades Airflow’s flexibility and openness for tighter platform integration, simpler ops and better defaults if you’re already all-in on Databricks.The Certifications Scam (7 minute read)
A blunt takedown of data certifications by Yordan Ivanov , arguing they mostly signal marketing and gatekeeping rather than real skills, experience or on-the-job impact.End To End Agentic Data Modeling: Using AI and OpenMetadata MCP for Impact Analysis (8 minute read)
A hands-on look at building end-to-end agentic data modeling by combining OpenMetadata with MCP-style agents to automate lineage, context sharing and model evolution across the data stack by Alejandro Aboy with Pipeline to InsightsAutofilling the Boring Semantic Layer: From Sakila to Chat-BI with dltHub (9 minute read)
Adrian Brudaru explores how LLMs can help generate and maintain semantic models on top of data pipelines, reducing manual modeling effort while keeping analytics definitions consistent.A Diary of a Data Engineer (13 minute read)
A candid, day-in-the-life reflection on what data engineering actually looks like in practice by Simon Späti, highlighting the unglamorous but essential work that keeps data systems running day to day.Database Development with AI in 2026 (11 minute read)
Brent Ozar argues that in 2026 AI will meaningfully speed up database development tasks like query writing and troubleshooting but real impact still depends on human judgment and understanding production constraints.How Uber Cut Data Lake Freshness From Hours to Minutes With Flink (11 minute read)
Uber rebuilt its data lake ingestion to move freshness from hours to minutes. This piece breaks down how they replaced batch Spark jobs with Flink streaming, cut compute by 25% and dealt with the very real problems that show up at petabyte scale.
Data analysis and visualisation
Best Data Visualization Projects of 2025 (3 minute read)
FlowingData shares the best data visualisations of 2025The book that finally taught me how to tell stories with data (12 minute read)
Jose Parreño Garcia reviews Storytelling with Data, highlighting that impact comes from framing the message and audience first, not from visualisation tricks.How to create a more accessible line chart (10 minute read)
Nicola Rennie shows how small design choices in line charts (color, contrast, labeling and annotations) dramatically improve accessibility without sacrificing clarity or insight.
5 Rules for Dashboard Filter Placement (6 minute read)
Anastasiya Kuznetsova breaks down five practical rules for placing dashboard filters so users understand what they’re controlling without adding cognitive load or breaking trust.
Other interesting reads
ONTOLOGIES - SOME PERSPECTIVES (20 minute read)
A great intro and explanation of ontologies by William Inmon (Bill Inmon) and Jessica Talisman. Really worth a read if you have heard the term a lot but are not sure what it means and how it can be appliedLessons from Building AI Agents for Financial Services (23 minute read)
Nicolas Bustamante breaks down what building AI agents actually looks like in production, separating real engineering constraints from agent hype.Introducing the AI Chip Sales Data Explorer (3 minute read)
Epoch AI introduces an interactive dataset tracking global AI chip sales, shedding light on who’s actually buying compute and how hardware demand is shaping the AI race.
Quick favor - need your take
Was there any standout article or topic from January I missed? Feel free to drop a comment or hit reply, even a quick line helps.
If you are already subscribed and enjoyed the article, please give it a like and/or share it others, really appreciate it 🙏













Thank you for the mention