Gemini Builds, Claude Browses and ChatGPT Reads Your GitHub
Google drops I/O goodies early, OpenAI adds repo-reading functionality and Anthropic makes Claude internet-smart.
Fellow Data Tinkerers!
It’s time for this week’s round-up on all things data and AI. Before that, just sharing that you can have access to 100+ cheat sheets like below, if you share Data Tinkerer with just 2 other people
So if you enjoy reading Data Tinkerer, share with friends and earn referral rewards!
Now, with that out of the way, let’s get to this week’s news round up on all things data and AI
The Buzz 🐝
Google has released an early preview of Gemini 2.5 Pro (I/O edition), enhancing its coding capabilities, particularly for building interactive web applications
Another update from Google is the upcoming launch of a standalone NotebookLM app, built around its podcast-style AI product. Currently available for waitlist on the App Store and Play Store.

OpenAI also dropped a GitHub connector for ChatGPT, letting users link their repos so ChatGPT can read through source code and PRs, then generate a detailed, citation-rich report using its Deep Research feature.
They also provided more information about the sycophantic behavior problem with GPT-4o which we highlighted last week. The update added a new reward signal based on user thumbs-up/down feedback but in doing so, OpenAI effectively put their thumb on the scale (See what I did there? My dad would be proud). You can read the full update here if interested
Anthropic has launched a Web Search API for Claude 3.7 Sonnet and 3.5 models, enabling real-time internet access for up-to-date responses with source citations. Priced at $10 per 1,000 searches plus token costs, it supports agentic multi-step queries and domain filtering for enterprise control.
Data Science & AI
Anomaly Detection in Time Series Using Statistical Analysis
Learn how Booking.com built a homegrown anomaly detection service using z-scores, percentiles, and outlier exclusion. Then wrapped it in Grafana for real-time, human-readable alerts and faster debugging.
How Walmart Automated 400+ Forecasts and Cut Runtime by Half
No model works for every SKU. Walmart's solution? An autotuning framework with rolling validation, automated feature engineering, and real-time model selection. If you want to learn more about it, check the article!Improving Search for 1B+ LinkedIn Users with GenAI
Learn how LinkedIn used OpenAI's GPT to automate search quality scoring for 1B+ users by boosting typeahead suggestion quality by 6.8% and slashing eval time from days to hours.
Data Engineering
How Canva Rebuilt Its Data Pipelines for Billions of Events per Month
Canva had to track billions of events to pay creators fairly and their old system couldn’t keep up. Curious how they rebuilt it? This article is for you
Husky: Efficient compaction at Datadog scale
Learn how Datadog’s Husky storage engine uses smart compaction strategies like size-tiering, locality-aware merging and columnar fragment pruning to cut query latency and reduce query worker usage by 30%.
We Shut Down Snowflake - And Here’s Why
dives into why they shut down Snowflake (hint: it’s not because it failed). More so because its design didn’t fit their schema-less, semi-structured pipeline
Data Analysis and Visualisation
When the Metric Becomes the Monster
Learn how Goodhart’s Law quietly wrecks your metrics. When the target becomes the game, teams start optimizing for the number, not the outcome. Here’s how to spot it and what to do instead.
Data Art: Happy Mother's Day
Beautiful visualisation by Jennifer Dawes (Happy Mother’s Day!)(Source: Tableau)
The horror … the horror!

If you enjoyed reading this week’s round up, please give this post a like and share it with others