What the Data Crowd Was Reading in July 2025
Tools, techniques and deep dives worth reading that I came across last month.
Fellow Data Tinkerers!
As I mentioned last week, I’m making a few changes to the newsletter based on your feedback. Here’s what’s new:
1- Data roundup: I’m trialing a new format that focuses more on deep dives, practical how-tos, and longer-form content to help you sharpen your craft. Less hype about the latest model, more substance. Each month, I’ll share a curated roundup of the most useful pieces I’ve come across in AI and data across data science, data engineering and data analysis.
So: less news, more depth.
2- Referral prizes: I’m changing the referral prizes to:
1 share - 100+ resources to learn all things data (science, engineering, analysis). It includes videos, courses, projects and can be filtered by skill, level or free/paid
2 shares - 100+ cheatsheets for all things data (science, engineering, analysis)
5 shares - 1 month paid membership where you have access to subscriber only posts and list of Data Tinkerer articles broken down by industry, sector, tech stack and etc
As always, happy to hear your feedback. Just reply back to this email or leave a comment below to let me know what you think.
Without further ado, let’s get to the round up for July
Data science & AI
The Big LLM Architecture Comparison (30 minute read)
provides a great breakdown of 2025 open-source models and the differences in their architectures
Kimi K2 is the most important model of the year (26 minute read)
puts out great breakdowns (and great memes) in his articles, like this one about Kimi K2 and how they ditched the usual scaling formula in favor of MuonClip optimizers
The Impact of Prompt Bloat on LLM Output Quality (11 minute read)
Soham breaks down how prompt bloat actually degrades LLM output clarity, reasoning and relevance even before hitting token limits.
The evolution of Grab's machine learning feature store (12 minute read)
The Grab team go through how they rebuilt their ML feature store with a “feature table” approach using AWS Aurora - fixing atomic updates, read/write isolation and table‑centric schemas to boost performance and consistency.Context Engineering: 2025’s #1 Skill in AI (11 minute read)
breaks down why stuffing your LLM context like a Thanksgiving turkey is a terrible idea and how context engineering fixes it with memory, structure and strategy.From Linear Regression to XGBoost: A Side-by-Side Performance Comparison (6 minute read)
A solid side-by-side on linear regression vs. XGBoost, with code, charts and a clear takeaway: ensembles win when data gets messy.MLE-STAR: A state-of-the-art machine learning engineering agent (8 minute read)
Google’s MLE-STAR is like AutoML on performance enhancers cause it searches, builds, tunes, debugs and even ensemble-blends ML models without whining for a human in the loop.How to approach DS in 2025 (4 minute read)
’s guide is the “if I had to do it again” playbook for breaking into data science focusing on practical stuff and mercifully light on unnecessary theory.
Data engineering
The company that created Kafka is replacing it with a new solution (11 minute read)
Agentic AI for Dummies (12 minute read)
strips the magic from Agentic AI and shows with an example that it’s modular components stitched into a workflow with LangGraph and LangChain. (Just don’t tell him you prefer Snowflake over Databricks or there will be blood)
Boring Semantic Layer + MCP =🔥 (5 minute read)
plugs a semantic layer into Claude using MCP and ends up with a query-literate LLMSurvival Tips for Data Engineers in the Age of Generative AI (11 minute read)
provides a good survival guide for data engineers in the LLM era: stay sharp on fundamentals, evaluate AI like a product and stop giving autopilot mode a free pass.
Has Self-Serve BI Finally Arrived Thanks to AI? (19 minute read)
Simon argues that If dashboards were the first draft of self-serve BI, conversational BI powered by MCP feels like the rewrite that finally gets it right.How Bolt Reconciles €2B in Revenue Using Airflow, Spark and dbt (9 minute read)
Breakdown of Bolt’s payments pipeline: ingestion, standardisation and reconciliation done at scale with Airflow, Spark and dbt.
Data analysis and visualisation
MCP and the reshaping of data visualisation & business intelligence (7 minute read)
on why BI folks shouldn't panic yet but should definitely start learning what MCP is before it learns their job.
A thoughtful take byThe Metric Tree Trap (7 minute read)
Paul argues why Metric Trees might look smart on slides but can mislead in real-world decision-making.The Ultimate List of BI Product Metrics (16 minute read)
provides a guide to measuring its value across usage, trust and business outcomes.
Treat your BI like a product:How to Show Impact as a Data Analyst (7 minute read)
Most analysts do the work but few track what changed. This piece shows how to connect your analysis to real outcomes.
Miscellaneous
After the ChatGPT Moment: Measuring AI’s Adoption (14 minute read)
shows how AI is being adopted at breakneck speed and it’s only getting faster.
If you thought ChatGPT was just hype, this deep dive by
Who's the best CDO
created an interactive game where you can give it a shot and see if you can maximise company profits with a limited budget (welcome to real life!)
Fancy playing a Chief Data Officer game?Soham Simulator
Or you fancy being “overemployed”? Then try this game where you can be the next Soham and land job offers from multiple places at once!
Quick favor - need your take
Two things I’d love to hear from you:
What did you think of the new deep dive format?
Was there any standout article or topic from July I missed?
Feel free to drop a comment or hit reply, even a quick line helps.
If you are already subscribed and enjoyed the article, please give it a like and/or share it others, really appreciate it 🙏
Thank you very much for sharing my article 💐🙏
Great list. Thank you