What the Data Crowd Was Reading in December 2025
Tools, techniques and deep dives worth reading that I came across in December 2025.
Fellow Data Tinkerers,
It’s time for another round-up on all things data and AI!
But before that, I wanted to share with you what you could unlock if you share Data Tinkerer with just 1 more person.
There are 100+ resources to learn all things data (science, engineering, analysis). It includes videos, courses, projects and can be filtered by tech stack (Python, SQL, Spark and etc), skill level (Beginner, Intermediate and so on) provider name or free/paid. So if you know other people who like staying up to date on all things data, please share Data Tinkerer with them!
Without further ado, let’s get to the round up for December!
Data science & AI
The State Of LLMs 2025: Progress, Problems, and Predictions (34 minute read)
Sebastian Raschka, PhD provides a great recap of main developments in 2025 and a couple of predictions for 2026 (like classical RAG slowly fading away)Building a Data Cleaning Agent with LangGraph (7 minute read)
Andres Vourakis shows how to build a LangGraph-based data cleaning agent that auto-generates, executes, and fixes Python cleaning code to cut down manual data prep.Making Sense of Memory in AI Agents (10 minute read)
This post breaks down how different memory types (short-term, long-term, and structured) let AI agents retain context across steps so they can act coherently instead of responding statelessly.Exploring TabPFN: A Foundation Model Built for Tabular Data (12 minute read)
The article explores TabPFN, a foundation model pretrained on synthetic tasks that delivers strong tabular ML performance with near-zero tuning by reframing tabular prediction as conditional inference.How to Use Simple Data Contracts in Python for Data Scientists (8 minute read)
Eirik Berge walks through a lightweight data contract implementation in Python to catch schema breakages earlyHow AI is transforming work at Anthropic (35 minute read)
An interesting look at how engineers and researchers at Anthropic actually use AI day to day and which parts of their work it genuinely helps with.
We removed 80% of our agent’s tools (4 minute read)
Vercel rebuilt their text-to-SQL agent by stripping away complex tooling and giving Claude direct file-system access, discovering that fewer tools, better documentation and ‘doing less’ made the agent faster, cheaper and more reliable.
Data engineering
Opinionated Data Platforms vs. Open-Source (18 minute read)
Good article by Simon Späti breaking down the tradeoffs between open-source and ‘opinionated’ and when it makes sense to go for the latter.LLMs for {PDF} Data Pipelines (8 minute read)
Daniel Beach experiments with using LLMs as part of a data pipeline, showing that agent-style PDF-to-JSON extraction can work in practice despite slowness and may be good enough for real-world automation.Snowflake vs Databricks Is the Wrong Debate (9 minute read)
SeattleDataGuy argues that the Snowflake vs Databricks debate is a distraction, with Databricks deliberately expanding role by role to own the full data stack and compete with cloud and enterprise platforms.Data Quality Design Patterns (11 minute read)
Erfan Hesami breaks down practical data quality design patterns like WAP, AWAP, TAP and signal tables, showing how teams balance safety, cost and speed to keep bad data out of production pipelines.DuckDB: The Swiss Army Knife For Data Engineers (8 minute read)
Alejandro Aboy argues that DuckDB can replace most pandas, Spark, and Airflow workflows by letting data engineers run fast, scalable analytics and ETL directly with SQL, zero infrastructure and minimal complexity.How Snap Rebuilt Its ML Platform to Handle 10,000+ Daily Spark Jobs (14 minute read)
This post breaks down how Snap rebuilt its ML platform with a unified Spark layer to tame spiky workloads, standardise pipelines, and reliably run 10,000+ production jobs a day without blowing up clusters.
Data analysis and visualisation
Saloni’s guide to data visualization (41 minute read)
Great and comprehensive post by Saloni Dattani where she distils data visualization down to first principles, showing how to choose charts, reduce clutter and design visuals that communicate insight instead of just decorating dashboards.
Broken Chart: discover 9 visualization alternatives (10 minute read)
Dominic Royé breaks down how common chart design mistakes distort interpretation, showing why many broken charts mislead viewers and how to fix them with clearer scales, context, and visual discipline.
Other interesting reads
The Most Useful, Timeless Skill to Learn as a Data Professional (11 minute read)
Ergest Xheblati makes the case that real impact in data comes from using leverage and not just more lines of code2026 - General Thoughts on What’s Ahead (6 minute read)
Joe Reis thinks that 2026 will be a deliberately ‘boring’ year where AI hype cools off and teams are forced to focus on fundamentals that actually make AI work.The next data bottleneck (11 minute read)
Katie Bauer argues that as tools and models get better, the real constraint shifts to human bottlenecks like decision-making, ownership and organisational ability to turn data into action.
Quick favor - need your take
Was there any standout article or topic from November I missed? Feel free to drop a comment or hit reply, even a quick line helps.
If you are already subscribed and enjoyed the article, please give it a like and/or share it others, really appreciate it 🙏
Keep learning
What the Data Crowd Was Reading in November 2025
It's time for another data/AI roundup and here are the highlights from November👇
Data Science & AI
Context engineering becomes the real bottleneck for AI agents
Classic algorithms still beat most enterprise AI in ROI
A practical framework to identify true agentic use cases
Gemini 3 benefits from direct structured prompting
Data Engineering
DuckLake revives relational metadata for lakehouses
Event streaming hits market saturation
Real-world consulting lessons point to simpler pipelines over hype
Dark data hoarding kills AI signal
Data Analysis & BI
Dashboard testing gets a full end-to-end checklist
Guidance on balancing accuracy vs speed when answering business questions.
Plus: AI-coded “good enough” apps shift the buy-vs-build boundary, low-tech industries become prime AI adopters as margins flip and new benchmark analysis suggests model performance is mostly general capability with a smaller “Claudiness” axis on top.
What the Data Crowd Was Reading in October 2025
It's time for another data/AI roundup and here are the highlights from October👇
Data Science & AI
How Gradient Descent Works
Recursive Language Models
The Continual Learning Problem
Why Analytics Agents Break Differently
Data Engineering
How Kafka Works
Data Modeling for the Agentic Era
You’ll Never Have a FAANG Data Infrastructure
Getting Started with OpenMetadata
Data Analysis & BI
Jobs-to-be-Done: Designing dashboards for what users need to achieve.
From Dental Cleaning to Data Cleaning: Pivoting into healthcare analytics.
Plus: Real AI Agents and Real Work, Taking the Bitter Lesson Seriously: Let AI optimize compute, not humans, OpenAI Is a Consumer Company, Import AI 431: Technological optimism meets appropriate fear












Thanks for the mention man!