Google's Big Moves, Fake "Real" Influencers and Self-Driving Goes Open Source
Google’s Gemini 2.0 Flash drops, AI-powered influencers hit the market and Hugging Face drops a massive self-driving dataset
Fellow Tinkerers!
It’s time for another round-up of noteworthy AI/Data stuff happening. But before we do that, I wanted to share a new free resource with you. We have reviewed, collected and curated 120+ technical articles from more than 70 companies like Netflix, Apple, Microsoft, Google, and others to cater to data scientists and data engineers. All you need to do is subscribe to Data Tinkerer and you will receive the link to the list
If there is an issue with the file, feel free to reply back to this email and I will sort it out. Now with that out of the way, let’s get to this week’s update on all things AI and data!
The Buzz 🐝
Google launched Gemini 2.0 Flash, introducing native image generation for more advanced AI creativity. Gemini Flash also allows conversational image editing
and of course some users have started putting the feature into good use, curing loneliness

Captions has launched Mirage, an AI model that generates realistic influencers to shill … I mean sell products. So we will be moving from “fake” real influencers to “real” fake influencers. Word on the street is that the max length of the video is 4 second for now (which to be fair is 3 seconds longer than what a real ‘influencer’ can come up with)
Hugging Face teaming up with Yaak have introduced Learning to Drive (L2D), the world's largest open-source self-driving dataset, collected over three years across 30 German cities using 60 electric vehicles. This dataset includes both expert and learner driver behaviors, providing a comprehensive resource for training AI models in autonomous driving scenarios

Data Science & AI
Google Gemma 3.0
Google introduced Gemma 3, their most capable model which can be run on a single GPU or TPU. They are designed to run fast and directly on devices such as phones or laptops. Google also published the technical paper on Gemma 3.0 which you can read here.

RoBERTa Model for Merchant Categorization at Square
Square implemented a RoBERTa-based machine learning model to improve how they categorize merchants by analyzing business names and descriptions. This led to a ~30% increase in categorization accuracy, allowing Square to offer more personalized product experiences, make better business decisions, and ensure correct interchange fees during payment processing

eBay’s e-Llama: AI Trained on 1 Trillion Tokens, Boosting E-Commerce Accuracy by 25%
Learn how eBay optimized AI to deliver better product matches, faster support, and more accurate pricing
Data Engineering
Solving Latency Spikes & Locking in a Distributed PostgreSQL Query
Learn how SafetyCulture optimized their distributed PostgreSQL queries by splitting a multi-shard update into two separate queries, reducing average query time from 48ms to 12ms.

Airbnb’s Platform: Real-Time Data Meets Personalisation
Discover how Airbnb handles 1 Million events per second for scalable personalisation
Data Analysis and Visualisation
What I Wish I Knew as a Data Analyst
A practical guide to connecting data analysis with business needs
Internet Access Across the World. 2003 to 2022
Interesting visualisation of internet access around the world

Happy Ending
