<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:googleplay="http://www.google.com/schemas/play-podcasts/1.0" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:spotify="https://www.spotify.com/ns/rss">
  <channel>
    <generator>Fame Host (https://fame.so)</generator>
    <title>The Data Engineering Show</title>
    <link>https://podcasts.fame.so/the-data-engineering-show</link>
    <itunes:new-feed-url>https://feeds.fame.so/the-data-engineering-show</itunes:new-feed-url>
    <description>The Data Engineering Show is a podcast for data engineering and BI practitioners to go beyond theory. Learn from the biggest influencers in tech about their practical day-to-day data challenges and solutions in a casual and fun setting.

SEASON 1 DATA BROS
Eldad and Boaz Farkash shared the same stuffed toys growing up as well as a big passion for data. After founding Sisense and building it to become a high-growth analytics unicorn, they moved on to their next venture, Firebolt, a leading high-performance cloud data warehouse.

SEASON 2 DATA BROS
In season 2 Eldad adopted a brilliant new little brother, and with their shared love for query processing, the connection was immediate. After excelling in his MS, Computer Science degree, Benjamin Wagner joined Firebolt to lead its query processing team and is a rising star in the data space. 

For inquiries contact tamar@firebolt.io
Website: https://www.firebolt.io</description>
    <copyright>© 2024 The Firebolt Data Bros</copyright>
    <language>en</language>
    <pubDate>Fri, 06 Sep 2024 07:58:15 +0000</pubDate>
    <lastBuildDate>Mon, 18 May 2026 04:36:55 +0000</lastBuildDate>
    <image>
      <url>https://content.fameapp.so/uploads/86qywn5q/caa7efb0-6c25-11ef-9489-d768d5a0a06d/caa7edf0-6c25-11ef-a69f-3da5bd01e0ea.jpg</url>
      <title>The Data Engineering Show</title>
      <link>https://podcasts.fame.so/the-data-engineering-show</link>
      <description>The Data Engineering Show is a podcast for data engineering and BI practitioners to go beyond theory. Learn from the biggest influencers in tech about their practical day-to-day data challenges and solutions in a casual and fun setting.

SEASON 1 DATA BROS
Eldad and Boaz Farkash shared the same stuffed toys growing up as well as a big passion for data. After founding Sisense and building it to become a high-growth analytics unicorn, they moved on to their next venture, Firebolt, a leading high-performance cloud data warehouse.

SEASON 2 DATA BROS
In season 2 Eldad adopted a brilliant new little brother, and with their shared love for query processing, the connection was immediate. After excelling in his MS, Computer Science degree, Benjamin Wagner joined Firebolt to lead its query processing team and is a rising star in the data space. 

For inquiries contact tamar@firebolt.io
Website: https://www.firebolt.io</description>
    </image>
    <googleplay:author>The Firebolt Data Bros</googleplay:author>
    <googleplay:image href="https://content.fameapp.so/uploads/86qywn5q/caa7efb0-6c25-11ef-9489-d768d5a0a06d/caa7edf0-6c25-11ef-a69f-3da5bd01e0ea.jpg"/>
    <itunes:category text="Technology"/>
    <itunes:category text="Business">
      <itunes:category text="Management"/>
    </itunes:category>
    <itunes:category text="Business"/>
    <googleplay:summary>The Data Engineering Show is a podcast for data engineering and BI practitioners to go beyond theory. Learn from the biggest influencers in tech about their practical day-to-day data challenges and solutions in a casual and fun setting.

SEASON 1 DATA BROS
Eldad and Boaz Farkash shared the same stuffed toys growing up as well as a big passion for data. After founding Sisense and building it to become a high-growth analytics unicorn, they moved on to their next venture, Firebolt, a leading high-performance cloud data warehouse.

SEASON 2 DATA BROS
In season 2 Eldad adopted a brilliant new little brother, and with their shared love for query processing, the connection was immediate. After excelling in his MS, Computer Science degree, Benjamin Wagner joined Firebolt to lead its query processing team and is a rising star in the data space. 

For inquiries contact tamar@firebolt.io
Website: https://www.firebolt.io</googleplay:summary>
    <googleplay:explicit>No</googleplay:explicit>
    <googleplay:block>No</googleplay:block>
    <itunes:type>episodic</itunes:type>
    <itunes:author>The Firebolt Data Bros</itunes:author>
    <itunes:image href="https://content.fameapp.so/uploads/86qywn5q/caa7efb0-6c25-11ef-9489-d768d5a0a06d/caa7edf0-6c25-11ef-a69f-3da5bd01e0ea.jpg"/>
    <itunes:summary>The Data Engineering Show is a podcast for data engineering and BI practitioners to go beyond theory. Learn from the biggest influencers in tech about their practical day-to-day data challenges and solutions in a casual and fun setting.

SEASON 1 DATA BROS
Eldad and Boaz Farkash shared the same stuffed toys growing up as well as a big passion for data. After founding Sisense and building it to become a high-growth analytics unicorn, they moved on to their next venture, Firebolt, a leading high-performance cloud data warehouse.

SEASON 2 DATA BROS
In season 2 Eldad adopted a brilliant new little brother, and with their shared love for query processing, the connection was immediate. After excelling in his MS, Computer Science degree, Benjamin Wagner joined Firebolt to lead its query processing team and is a rising star in the data space. 

For inquiries contact tamar@firebolt.io
Website: https://www.firebolt.io</itunes:summary>
    <itunes:subtitle>The Data Engineering Show is a podcast for data engineering and BI practitioners to go beyond theory. Learn from the biggest influencers in tech about their practical day-to-day data challenges and solutions in a casual and fun setting.

SEASON 1 DATA BROS
Eldad and Boaz Farkash shared the same stuffed toys growing up as well as a big passion for data. After founding Sisense and building it to become a high-growth analytics unicorn, they moved on to their next venture, Firebolt, a leading high-performance cloud data warehouse.

SEASON 2 DATA BROS
In season 2 Eldad adopted a brilliant new little brother, and with their shared love for query processing, the connection was immediate. After excelling in his MS, Computer Science degree, Benjamin Wagner joined Firebolt to lead its query processing team and is a rising star in the data space. 

For inquiries contact tamar@firebolt.io
Website: https://www.firebolt.io</itunes:subtitle>
    <itunes:keywords>data engineering, analytics, data, Firebolt, AI, cloud data, Benjamin Wagner, Computer Science,</itunes:keywords>
    <itunes:owner>
      <itunes:name>The Data Bros</itunes:name>
      <itunes:email>team-frb@fame.so</itunes:email>
    </itunes:owner>
    <itunes:complete>No</itunes:complete>
    <itunes:explicit>No</itunes:explicit>
    <itunes:block>No</itunes:block>
    <item>
      <title>AI Won't Replace Engineers, But This Framework Will Change How They Build with Rohit Girme</title>
      <link>https://podcasts.fame.so/e/x8y74778-ai-won-t-replace-engineers-but-this-framework-will-change-how-they-build-with-rohit-girme</link>
      <itunes:title>AI Won't Replace Engineers, But This Framework Will Change How They Build with Rohit Girme</itunes:title>
      <itunes:episode>56</itunes:episode>
      <itunes:block>No</itunes:block>
      <googleplay:block>No</googleplay:block>
      <guid isPermaLink="false">l04r9rr0</guid>
      <description>What if you could build AI features with confidence while moving at the pace of innovation? In this episode, Benjamin Wagner sits down with Rohit Girma, Staff Software Engineer at Airbnb, to explore how to evaluate generative AI in production, why breaking down complex problems into smaller chunks accelerates development, and the key strategies for scaling AI-powered products beyond zero-to-one. Whether you're shipping AI features or transforming your engineering workflow, this conversation offers practical insights on building reliable AI systems, leveraging LLMs as orchestration tools, and the future of software development. Tune in to discover why humans remain essential in the scaling phase and how your team can move faster without sacrificing quality.</description>
      <content:encoded><![CDATA[<div>Scaling AI from proof-of-concept to production requires more than just deploying models; it demands robust evaluation frameworks, human oversight, and a fundamental shift in how engineering teams approach development.<br><br>In this episode of The Data Engineering Show, host <a href="https://www.linkedin.com/in/wagjamin">Benjamin Wagner</a> sits down with <a href="https://www.linkedin.com/in/rohitgirme/">Rohit Girme</a>, Staff Software Engineer at Airbnb, to explore how Airbnb built a Gen AI evaluation platform to assess LLM outputs across product surfaces, from customer support bots to search and booking experiences. Rohit shares insights into Airbnb's infrastructure choices, evaluation workflows, and lessons learned about leveraging AI tools while maintaining human orchestration.<br><br></div><div><br><strong>What You'll Learn:<br></strong><br></div><div><br></div><div>- How to architect a multi-layer Gen AI evaluation platform using Python, VLLM, Kubernetes, and DAG-based workflows to systematically test LLM outputs in production<br><br></div><div>- Why splitting monolithic "virtual judges" into specialized LLM-powered metrics (content relevance, hallucination detection, policy adherence) dramatically improves evaluation accuracy and debugging<br><br></div><div>- The critical distinction between real-time evaluation (lightweight, sub-second latency) and offline evaluation (comprehensive, human-in-the-loop) and how to route outputs accordingly<br><br></div><div>- How to shift from traditional software engineering (deterministic, rule-based testing) to probabilistic AI evaluation where you validate outputs against golden datasets and human judgment benchmarks<br><br></div><div>- The framework for breaking down problems into smaller chunks and using AI tools as collaborators rather than end-to-end problem solvers—critical when working with codebases at massive scale<br><br></div><div>- Why documentation becomes infrastructure in an AI-driven workflow: LLMs need comprehensive, well-formatted docs to scale tribal knowledge across entire organizations<br><br></div><div>- The hard truth about AI and scaling: zero-to-one innovation is now commoditized, but one-to-n execution (the scaling part) still demands human judgment, orchestration, and product sense<br><br></div><div>- How to measure AI tool adoption beyond token usage instrument your development workflow to capture whether LLM suggestions actually made it into shipped code and added real value<br><br></div><div><br></div><div><strong><br>About the Guest(s)</strong></div><div><br><br>Rohit Girme is a Staff Software Engineer at Airbnb, where he has spent the last seven and a half years building infrastructure and platforms at scale. With deep expertise in search and machine learning infrastructure, Rohit leads efforts in GenAI evaluation and has pioneered Airbnb's approach to ensuring AI-powered features work reliably in production. In this episode, Rohit shares practical insights on building evaluation platforms for large language models, orchestrating AI in product workflows, and leveraging AI tools effectively in software development. His work on integrating LLMs into customer-facing products while maintaining quality and performance provides actionable strategies for engineering teams navigating the rapid adoption of AI, making this conversation essential for data engineers and platform builders looking to scale AI responsibly.<br><br></div><div><br></div><div><strong><br>Quotes</strong></div><div><br></div><div>"Zero to one is easy now, but the one to n, which is a scaling part, I think we still haven't figured that out. You still need humans for that." - Rohit<br><br>"With AI, it's a black box to us as well. We don't know how it's working underneath, so we have to figure out another way to evaluate the surface." - Rohit Girme<br><br>"Humans should be the orchestrators of these tools and not just hand off everything to these tools." - Rohit Girme<br><br>"If we hand off everything to the LLM, it will make a lot of assumptions because context is limited, and it doesn't know the code enough." - Rohit Girme<br><br>"Documentation has become even more relevant because now LLMs need to know everything so everyone can scale up." - Rohit Girme<br><br>"Measuring productivity in LLMs is not just about how many tokens people are using—you need to figure out if they're actually building something on top." - Rohit Girme<br><br>"Internet democratized information, and I think with LLMs, it's capability that would be democratized. If you have a good idea, you can build it very quickly." - Rohit Girme<br><br>"There's always going to be blind spots for every person, but with AI, it'll become even faster because you have this very short cycle of talking to the AI instead of talking to five humans." - Rohit Girme<br><br>"Shipping products or shipping features would become even faster—where earlier it took weeks or months, now it will be days." - Rohit Girme<br><br><br><br></div><div>"I have supercharged my workflow day to day either at work or at home with access to information that's so easy to get." - Rohit Girme<br><br></div><div>If you enjoyed this episode, make sure to subscribe, rate, and review it on Apple Podcasts, Spotify, and YouTube Podcasts. Instructions on how to do this are here: <a href="https://www.fame.so/follow-rate-review">https://www.fame.so/follow-rate-review&nbsp;</a></div><div><br></div><div><strong><br>Resources</strong></div><div><br></div><div><strong>LinkedIn Profiles:<br></strong><br></div><ul><li>Rohit Girme's LinkedIn: <a href="https://www.linkedin.com/in/rohitgirme/">https://www.linkedin.com/in/rohitgirme/</a></li><li>Benjamin's LinkedIn: <a href="https://www.linkedin.com/in/wagjamin">https://www.linkedin.com/in/wagjamin<br></a><br></li></ul><div><strong>Company Websites:<br></strong><br></div><ul><li>Airbnb: <a href="https://airbnb.com">airbnb.com</a></li><li>Firebolt: <a href="https://firebolt.io">firebolt.io<br></a><br></li></ul><div><strong>Tools &amp; Platforms:<br></strong><br></div><ul><li><strong>VLLM </strong>– Open source inference framework for hosting and running LLM-based inference engines</li><li><strong>Kubernetes</strong> – Container orchestration platform used for serving infrastructure</li><li><strong>Apache Airflow</strong> – DAG-based workflow orchestration tool (originated from Airbnb)</li><li><strong>GitHub Copilot</strong> – AI-powered code completion tool for software development</li><li><strong>Claude</strong> – LLM tool referenced for code generation and development assistance<br><br></li></ul><div><strong>Cloud Services:<br></strong><br></div><ul><li><strong>Azure</strong> – Hosted LLM services used at Airbnb</li><li><strong>AWS</strong> – Hosted LLM services used at Airbnb<br><br></li></ul><div>The Data Engineering Show is brought to you by <a href="https://www.firebolt.io/">firebolt.io</a> and handcrafted by our friends over at: <a href="https://www.fame.so/?utm_medium=podcast&amp;utm_source=bcast&amp;utm_campaign=masters-of-community-with-david-spinks?utm_medium=podcast&amp;utm_source=bcast&amp;utm_campaign=confessions-of-a-b2b-marketer">fame.so</a><br><br>Previous guests include: Joseph Machado of Linkedin, Metthew Weingarten of Disney, Joe Reis and Matt Housely, authors of <em>The Fundamentals of Data Engineering, </em>Zach Wilson of Eczachly Inc, Megan Lieu of Deepnote, Erik Heintare of Bolt, Lior Solomon of Vimeo, Krishna Naidu of Canva, Mike Cohen of Substack, Jens Larsson of Ark, Gunnar Tangring of Klarna, Yoav Shmaria of Similarweb and Xiaoxu Gao of Adyen.<br><br>Check out our three most downloaded episodes:</div><ul><li><a href="https://www.dataengineeringshow.com/e/6nrrrlzn">Zach Wilson on What Makes a Great Data Engineer</a></li><li><a href="https://www.dataengineeringshow.com/e/v8w441mn">Joe Reis and Matt Housley on The Fundamentals of Data Engineering</a></li><li><a href="https://www.dataengineeringshow.com/e/p8lxxy18">Bill Inmon, The Godfather of Data Warehousing</a></li></ul>]]></content:encoded>
      <pubDate>Thu, 07 May 2026 12:00:00 +0000</pubDate>
      <author>The Firebolt Data Bros</author>
      <enclosure url="https://media.fame.so/83lzr5lw.mp3" length="43823072" type="audio/mpeg"/>
      <itunes:author>The Firebolt Data Bros</itunes:author>
      <itunes:image href="https://content.fameapp.so/uploads/86qywn5q/caa7efb0-6c25-11ef-9489-d768d5a0a06d/caa7edf0-6c25-11ef-a69f-3da5bd01e0ea.jpg"/>
      <itunes:duration>1095</itunes:duration>
      <itunes:summary>What if you could build AI features with confidence while moving at the pace of innovation? In this episode, Benjamin Wagner sits down with Rohit Girma, Staff Software Engineer at Airbnb, to explore how to evaluate generative AI in production, why breaking down complex problems into smaller chunks accelerates development, and the key strategies for scaling AI-powered products beyond zero-to-one. Whether you're shipping AI features or transforming your engineering workflow, this conversation offers practical insights on building reliable AI systems, leveraging LLMs as orchestration tools, and the future of software development. Tune in to discover why humans remain essential in the scaling phase and how your team can move faster without sacrificing quality.</itunes:summary>
      <itunes:subtitle>What if you could build AI features with confidence while moving at the pace of innovation? In this episode, Benjamin Wagner sits down with Rohit Girma, Staff Software Engineer at Airbnb, to explore how to evaluate generative AI in production, why breaking down complex problems into smaller chunks accelerates development, and the key strategies for scaling AI-powered products beyond zero-to-one. Whether you're shipping AI features or transforming your engineering workflow, this conversation offers practical insights on building reliable AI systems, leveraging LLMs as orchestration tools, and the future of software development. Tune in to discover why humans remain essential in the scaling phase and how your team can move faster without sacrificing quality.</itunes:subtitle>
      <itunes:keywords/>
      <itunes:explicit>No</itunes:explicit>
      <googleplay:explicit>No</googleplay:explicit>
    </item>
    <item>
      <title>The Framework Canva Uses for 200M+ Designers with Paul Tune</title>
      <link>https://podcasts.fame.so/e/r8kl55jn-the-framework-canva-uses-for-200m-designers-with-paul-tune</link>
      <itunes:title>The Framework Canva Uses for 200M+ Designers with Paul Tune</itunes:title>
      <itunes:episode>55</itunes:episode>
      <itunes:block>No</itunes:block>
      <googleplay:block>No</googleplay:block>
      <guid isPermaLink="false">71wjkk90</guid>
      <description>In this episode of The Data Engineering Show, Benjamin sits down with Paul Tune, Staff Research Scientist at Canva, to explore the advancement of machine learning at one of the world's leading design platforms. Learn how Canva is transitioning from traditional ML like recommendation engines for templates to cutting-edge agentic workflows that allow users and AI to collaborate on complex design tasks. Whether you're interested in the infrastructure behind distributed training or the nuances of post-training LLMs for aesthetic tasks, this deep dive offers a masterclass in scaling ML for millions of creative users.</description>
      <content:encoded><![CDATA[<div>AI agents are moving beyond simple automation into collaborative design workflows requiring fundamentally different approaches to user experience, model training, and infrastructure than traditional ML systems.<br><br></div><div>In this episode of The Data Engineering Show, host <a href="https://www.linkedin.com/in/wagjamin/">Benjamin Wagner</a> sits down with <a href="https://www.linkedin.com/in/paul-tune-0ba18116">Paul Tune</a>, Staff Research Scientist at Canva, to explore how the design platform is building agentic workflows, managing multimodal data pipelines, and tackling the unique challenge of teaching machines to understand aesthetic taste alongside functional design.<br><br></div><div><br><strong>What You'll Learn:<br></strong><br></div><ul><li><strong>How to architect user experiences that match intent across expertise levels</strong> from seventh graders to professional designers by constraining uncertainty through progressive disclosure rather than forcing upfront specification</li><li><strong>Why reinforcement learning infrastructure for creative tasks demands different optimization priorities </strong>than supervised fine-tuning, with network latency to external API services often dominating compute efficiency</li><li><strong>The shift in modern ML workflows </strong>from "source data → train → deploy" to a verification and evaluation-first paradigm, especially for generative models where training cycles are measured in weeks, not hours</li><li><strong>How to split ML team responsibilities across data sourcing, </strong>supervised fine-tuning, distributed systems tuning, and evaluation with evaluation becoming the critical path as model capabilities scale</li><li><strong>&nbsp;The difference between LLM inference bottlenecks </strong>(token throughput is rarely limiting) versus image-based ML pipelines (where data movement and GPU saturation drive entirely different optimization equations)</li><li><strong>Why aesthetic evaluation remains harder than mathematical verification</strong> and how 2024 will likely see meaningful progress in applying LLMs beyond verifiable domains like coding into subjective areas like design taste<br><br></li></ul><div>If you enjoyed this episode, make sure to subscribe, rate, and review it on Apple Podcasts, Spotify, and YouTube Podcasts. Instructions on how to do this are <a href="https://www.fame.so/follow-rate-review">here</a>.</div><div><br></div><div><strong><br>About the Guest(s)</strong></div><div><br></div><div>Paul is a Staff Research Scientist at Canva, bringing nine years of experience building machine learning systems that empower millions of users worldwide. With deep expertise in large language models, reinforcement learning, and generative AI applications, Paul leads Canva's post-training efforts on LLMs designed for agentic design workflows. In this episode, Paul shares insights into how modern ML teams balance competing priorities - from data efficiency and GPU optimization to evaluation frameworks for subjective tasks like design aesthetics. His work bridging the gap between casual users and professional designers offers valuable lessons for data engineers and ML practitioners looking to scale AI systems across diverse user bases and complex product surfaces.<br><br></div><div><br></div><div><strong>Quotes</strong></div><div><br></div><div>"What Canva is is that online graphic design platform for you to be able to design and kinda have this whole end-to-end process from the designing, the brainstorming, and using all sorts of tools in order to create a graphic design." - Paul Tune<br><br>"The whole vision really is to empower the world to design, and what that entails is to then have this entire end-to-end experience of designing on the platform." - Paul Tune<br><br>"I think a lot of it has to do with matching intent, so even for yourself, if you're using cloud code, some folks go down the side of, I want to plan very specific things about my design." - Paul Tune<br><br>"Whereas for a more casual user, they probably do come in without really having an idea of what they actually do want in the first place, and I think having a few options to kind of show, okay, these are kind of like a few designs that you might like as part of that, so that sort of helps to then eventually narrow down the intent." - Paul Tune<br><br>"I think there's a lot of very strong momentum around generative tools right now, and as part of that, Canva is also experimenting with adding generative tools within the product." - Paul Tune<br><br>"I think one of the bigger trends this year is there's been quite a bit of buzz around agents in particular, and Canva is no different in that aspect—we are working towards agentic workflows." - Paul Tune<br><br>"I think for us, the biggest challenge is that every time we do a rollout by an RL algorithm where we do have a sample that needs to be scored and then some level of feedback goes back to the model to then update its weights, we have to heat up specific APIs within different services at Canva." - Paul Tune<br><br>"I think the change has definitely shifted from when we started work on machine learning, where you kind of source the data and then train, to really like how do you evaluate because large language models have so many capabilities." - Paul Tune<br><br>"I think I try to keep focus because I don't think it's very feasible for me to cover every paper out there, even though there are lots and lots of exciting things that happen every day." - Paul Tune<br><br>"I think what I'm particularly excited about is applying these sorts of models into domains that are beyond what is very strongly verifiable, like mathematics and coding, because progress outside these domains has been a bit slower, but I do see at least some progress over time." - Paul Tune<br><br></div><div><br><strong>Resources</strong></div><div><br></div><div><strong>Connect on LinkedIn:<br></strong><br></div><ul><li>Paul Tune - <a href="https://www.linkedin.com/in/paul-tune-0ba18116">https://www.linkedin.com/in/paul-tune-0ba18116</a></li><li>Benjamin Wagner - <a href="https://www.linkedin.com/in/wagjamin/">https://www.linkedin.com/in/wagjamin<br></a><br></li></ul><div><strong>Websites:</strong></div><ul><li><strong>Canva: </strong><a href="https://www.canva.com">https://www.canva.com</a></li><li><strong>Canva Engineering Blog: </strong><a href="https://www.canva.dev/blog/">https://www.canva.dev/blog/</a></li></ul><div><br></div><div><strong>Tools &amp; Platforms:</strong></div><ul><li>Ray – Distributed training framework for machine learning on Kubernetes clusters</li><li>Argo – Workflow orchestration tool for managing data pipelines and model training</li><li>Snowflake – Data warehouse for structured data storage and event management</li><li>AWS S3 – Object storage for media files and unstructured data</li><li>Kubernetes – Container orchestration platform for managing distributed training clusters</li><li>RDS with MySQL – Relational database service for backing services</li><li>Canva Magic Studio – Generative AI tools suite within Canva, including image generation and LLM-powered writing assistance</li></ul><div><br><br></div><div>The Data Engineering Show is brought to you by <a href="https://www.firebolt.io/">firebolt.io</a> and handcrafted by our friends over at: <a href="https://www.fame.so/?utm_medium=podcast&amp;utm_source=bcast&amp;utm_campaign=masters-of-community-with-david-spinks?utm_medium=podcast&amp;utm_source=bcast&amp;utm_campaign=confessions-of-a-b2b-marketer">fame.so</a><br><br>Previous guests include: Joseph Machado of Linkedin, Metthew Weingarten of Disney, Joe Reis and Matt Housely, authors of <em>The Fundamentals of Data Engineering, </em>Zach Wilson of Eczachly Inc, Megan Lieu of Deepnote, Erik Heintare of Bolt, Lior Solomon of Vimeo, Krishna Naidu of Canva, Mike Cohen of Substack, Jens Larsson of Ark, Gunnar Tangring of Klarna, Yoav Shmaria of Similarweb and Xiaoxu Gao of Adyen.<br><br>Check out our three most downloaded episodes:</div><ul><li><a href="https://www.dataengineeringshow.com/e/6nrrrlzn">Zach Wilson on What Makes a Great Data Engineer</a></li><li><a href="https://www.dataengineeringshow.com/e/v8w441mn">Joe Reis and Matt Housley on The Fundamentals of Data Engineering</a></li><li><a href="https://www.dataengineeringshow.com/e/p8lxxy18">Bill Inmon, The Godfather of Data Warehousing</a></li></ul>]]></content:encoded>
      <pubDate>Tue, 28 Apr 2026 11:27:00 +0000</pubDate>
      <author>The Firebolt Data Bros</author>
      <enclosure url="https://media.fame.so/wrjn1z7w.mp3" length="53180080" type="audio/mpeg"/>
      <itunes:author>The Firebolt Data Bros</itunes:author>
      <itunes:image href="https://content.fameapp.so/uploads/86qywn5q/caa7efb0-6c25-11ef-9489-d768d5a0a06d/caa7edf0-6c25-11ef-a69f-3da5bd01e0ea.jpg"/>
      <itunes:duration>1329</itunes:duration>
      <itunes:summary>In this episode of The Data Engineering Show, Benjamin sits down with Paul Tune, Staff Research Scientist at Canva, to explore the advancement of machine learning at one of the world's leading design platforms. Learn how Canva is transitioning from traditional ML like recommendation engines for templates to cutting-edge agentic workflows that allow users and AI to collaborate on complex design tasks. Whether you're interested in the infrastructure behind distributed training or the nuances of post-training LLMs for aesthetic tasks, this deep dive offers a masterclass in scaling ML for millions of creative users.</itunes:summary>
      <itunes:subtitle>In this episode of The Data Engineering Show, Benjamin sits down with Paul Tune, Staff Research Scientist at Canva, to explore the advancement of machine learning at one of the world's leading design platforms. Learn how Canva is transitioning from traditional ML like recommendation engines for templates to cutting-edge agentic workflows that allow users and AI to collaborate on complex design tasks. Whether you're interested in the infrastructure behind distributed training or the nuances of post-training LLMs for aesthetic tasks, this deep dive offers a masterclass in scaling ML for millions of creative users.</itunes:subtitle>
      <itunes:keywords/>
      <itunes:explicit>No</itunes:explicit>
      <googleplay:explicit>No</googleplay:explicit>
    </item>
    <item>
      <title>Llama 2 &amp; 3 Safety: Soumya Batra on Agentic AI Training</title>
      <link>https://podcasts.fame.so/e/28xzrww8-llama-2-3-safety-soumya-batra-on-agentic-ai-training</link>
      <itunes:title>Llama 2 &amp; 3 Safety: Soumya Batra on Agentic AI Training</itunes:title>
      <itunes:episode>54</itunes:episode>
      <itunes:block>No</itunes:block>
      <googleplay:block>No</googleplay:block>
      <guid isPermaLink="false">60mkq880</guid>
      <description>What if the expertise that built foundation models could reshape how you think about AI's future? In this episode, Benjamin sits down with Soumya Batra, founder and CEO of WisePort AI and former safety lead on Llama 2 and Llama 3 at Meta, to explore how foundation models evolved from traditional NLP, why post-training holds the highest leverage for safety and controllability, and what natively agentic AI means for the next frontier of AI development. Whether you're curious about the model training lifecycle or wondering what comes after large language models, this conversation unpacks the technical strategies and vision shaping tomorrow's AI systems.</description>
      <content:encoded><![CDATA[<div>In this episode of The Data Engineering Show, host <a href="https://www.linkedin.com/in/wagjamin">Benjamin Wagner</a> sits down with <a href="https://in.linkedin.com/in/soumyabatra">Soumya Batra</a>, founder and CEO of WisePort AI and former tech lead at Meta where she led safety efforts for Llama 2 and Llama 3, to explore the evolution of NLP, the complete lifecycle of foundation model training, and why the next AI frontier lies in natively agentic systems rather than simply scaling larger transformers.<br><br></div><div><br><strong>What You'll Learn:<br></strong><br></div><ul><li><strong>Why historical NLP work becomes obsolete with each paradigm shift: </strong>Understand how Bayesian networks, RNNs, and LSTMs each dominated until replaced - and why current transformer-scaling dogma will likely face the same fate</li><li><strong>How to structure the foundation model training lifecycle for safety:</strong> Learn the three critical phases - pretraining (data mix optimization), supervised fine-tuning (instruction alignment), and reinforcement learning (human preference integration)—and where safety interventions deliver maximum leverage</li><li><strong>The counterintuitive data strategy for pretraining safety:</strong> Discover why removing all toxic content actually weakens model robustness, and how maintaining a precise balance preserves the model's ability to classify and refuse harmful requests</li><li><strong>How dual reward models maximize both helpfulness and safety:</strong> See why combining helpfulness and safety objectives (as done in Llama 3) ensures every training sample reinforces both capabilities simultaneously rather than creating trade-offs</li><li><strong>What "natively agentic" means and why it matters more than LLM-powered agents: </strong>Learn how foundational agentic models dynamically explore action spaces at inference time instead of relying on fixed developer-defined scaffolding, unlocking domain-agnostic workflows</li><li><strong>How to build a foundational AI startup without massive training datasets: </strong>Understand why synthetic data generation, deterministic task validation, and deep domain expertise can substitute for Internet-scale language corpora in the agentic space<br><br></li></ul><div>If you enjoyed this episode, make sure to subscribe, rate, and review it on Apple Podcasts, Spotify, and YouTube Podcasts. Instructions on how to do this are <a href="https://www.fame.so/follow-rate-review">here</a>.</div><div><br></div><div><strong><br>About the Guest(s)</strong></div><div><br></div><div>Soumya Batra is the Founder and CEO of WisePort AI, a foundational AI company specializing in agentic AI systems. With over twelve years of expertise in NLP and machine learning, she previously served as a Tech Lead and Applied Research Scientist at Meta, where she led safety and controllability efforts for both Llama 2 and Llama 3. Her career spans foundational work at Carnegie Mellon University, Microsoft, and Meta, establishing her as a pioneering voice in conversational AI and foundation model development. In this episode, Soumya demystifies the journey from traditional NLP to large language models, revealing how safety and controllability are embedded across the entire model lifecycle—from pretraining through reinforcement learning. Her insights on the future of agentic AI and the limitations of current scaling-only approaches provide essential perspective for data engineers and ML practitioners navigating the rapidly evolving AI landscape.<br><br></div><div><br></div><div><strong>Quotes</strong></div><div><br></div><div>"I did not know then that this would become my career for the next decade." - Soumya<br><br></div><div>"Whatever work that I've done in the past becomes irrelevant all of a sudden." - Soumya<br><br></div><div>"There is always a notion of, yes, this is the big thing, and then no, it's not anymore." - Soumya<br><br></div><div>"I really think that we are going to be proven wrong once again about scaling transformers being the only way to achieve general intelligence." - Soumya<br><br></div><div>"Safety was an issue even back then, even though we were training in such controlled settings." - Soumya<br><br></div><div>"If you don't put some toxic content there, then it will lose the ability to classify it and it'll be much easier to break the safety later on." - Soumya<br><br></div><div>"In the post training phase, we are giving it that ability to be able to answer users' questions." - Soumya<br><br></div><div>"The next unlock will now come from foundational agent models that are natively agentic, which will unlock use cases that look unimaginable to us right now." - Soumya<br><br></div><div>"Natively agentic means the foundational model itself needs to dynamically explore the action space, rather than scaffolding around existing LLMs." - Soumya<br><br></div><div>"The real unlock comes from creating your own use cases, creating your own synthetic data, and going deep into a few workflows." - Soumya<br><br></div><div><br></div><div><strong>Resources</strong></div><div><br></div><div><strong>Connect on LinkedIn:<br></strong><br></div><ul><li>Soumya Batra - <a href="https://in.linkedin.com/in/soumyabatra">https://in.linkedin.com/in/soumyabatra</a></li><li>Benjamin Wagner - <a href="https://www.linkedin.com/in/wagjamin/">https://www.linkedin.com/in/wagjamin<br></a><br></li></ul><div><strong>Websites:</strong></div><ul><li><strong>WisePort AI </strong>– <a href="https://www.wiseport.ai/">https://www.wiseport.ai</a></li><li><strong>Firebolt </strong>- <a href="https://www.firebolt.io/">https://www.firebolt.io</a></li></ul><div><br></div><div><strong>Articles &amp; Research Papers:</strong></div><ul><li><strong>LLaMA: Open and Efficient Foundation Language Models – </strong>Meta AI Research</li><li><strong>Lima: Less Is More for Alignment – </strong>Stanford &amp; Meta AI Research</li></ul><div><br></div><div><strong>Educational Institutions:</strong></div><ul><li><strong>Carnegie Mellon University - </strong>Language Technologies Institute (ATI)</li></ul><div><br></div><div>The Data Engineering Show is brought to you by <a href="https://www.firebolt.io/">firebolt.io</a> and handcrafted by our friends over at: <a href="https://www.fame.so/?utm_medium=podcast&amp;utm_source=bcast&amp;utm_campaign=masters-of-community-with-david-spinks?utm_medium=podcast&amp;utm_source=bcast&amp;utm_campaign=confessions-of-a-b2b-marketer">fame.so</a><br><br>Previous guests include: Joseph Machado of Linkedin, Metthew Weingarten of Disney, Joe Reis and Matt Housely, authors of <em>The Fundamentals of Data Engineering, </em>Zach Wilson of Eczachly Inc, Megan Lieu of Deepnote, Erik Heintare of Bolt, Lior Solomon of Vimeo, Krishna Naidu of Canva, Mike Cohen of Substack, Jens Larsson of Ark, Gunnar Tangring of Klarna, Yoav Shmaria of Similarweb and Xiaoxu Gao of Adyen.<br><br>Check out our three most downloaded episodes:</div><ul><li><a href="https://www.dataengineeringshow.com/e/6nrrrlzn">Zach Wilson on What Makes a Great Data Engineer</a></li><li><a href="https://www.dataengineeringshow.com/e/v8w441mn">Joe Reis and Matt Housley on The Fundamentals of Data Engineering</a></li><li><a href="https://www.dataengineeringshow.com/e/p8lxxy18">Bill Inmon, The Godfather of Data Warehousing</a></li></ul>]]></content:encoded>
      <pubDate>Wed, 08 Apr 2026 09:59:00 +0000</pubDate>
      <author>The Firebolt Data Bros</author>
      <enclosure url="https://media.fame.so/8yqyqr58.mp3" length="54012864" type="audio/mpeg"/>
      <itunes:author>The Firebolt Data Bros</itunes:author>
      <itunes:image href="https://content.fameapp.so/uploads/86qywn5q/caa7efb0-6c25-11ef-9489-d768d5a0a06d/caa7edf0-6c25-11ef-a69f-3da5bd01e0ea.jpg"/>
      <itunes:duration>1350</itunes:duration>
      <itunes:summary>What if the expertise that built foundation models could reshape how you think about AI's future? In this episode, Benjamin sits down with Soumya Batra, founder and CEO of WisePort AI and former safety lead on Llama 2 and Llama 3 at Meta, to explore how foundation models evolved from traditional NLP, why post-training holds the highest leverage for safety and controllability, and what natively agentic AI means for the next frontier of AI development. Whether you're curious about the model training lifecycle or wondering what comes after large language models, this conversation unpacks the technical strategies and vision shaping tomorrow's AI systems.</itunes:summary>
      <itunes:subtitle>What if the expertise that built foundation models could reshape how you think about AI's future? In this episode, Benjamin sits down with Soumya Batra, founder and CEO of WisePort AI and former safety lead on Llama 2 and Llama 3 at Meta, to explore how foundation models evolved from traditional NLP, why post-training holds the highest leverage for safety and controllability, and what natively agentic AI means for the next frontier of AI development. Whether you're curious about the model training lifecycle or wondering what comes after large language models, this conversation unpacks the technical strategies and vision shaping tomorrow's AI systems.</itunes:subtitle>
      <itunes:keywords>Soumya Batra, Benjamin Wagner, foundation models, LLM safety, Llama 2, Llama 3, agentic AI, reinforcement learning from human feedback, supervised fine-tuning, pretraining, post-training, conversational AI, AI controllability</itunes:keywords>
      <itunes:explicit>No</itunes:explicit>
      <googleplay:explicit>No</googleplay:explicit>
    </item>
    <item>
      <title>The Data Fusion Secret &amp; Why Custom Query Engines Fail with Nikita Lapkov</title>
      <link>https://podcasts.fame.so/e/28xz46q8-data-fusion-secret-custom-query-engines-nikita-lapkov</link>
      <itunes:title>The Data Fusion Secret &amp; Why Custom Query Engines Fail with Nikita Lapkov</itunes:title>
      <itunes:episode>53</itunes:episode>
      <itunes:block>No</itunes:block>
      <googleplay:block>No</googleplay:block>
      <guid isPermaLink="false">60mkrv30</guid>
      <description>What if building a distributed SQL engine meant rethinking everything about how query execution works at scale? In this episode, Benjamin sits down with Nikita, Senior Software Engineer at Cloudflare, to explore how R2 SQL leverages object storage and distributed computing to power analytics across 300 global locations, why backward compatibility becomes critical when you can't control infrastructure rollouts, and the key strategies for handling joins and adaptive query execution in a stateless, point-to-point network architecture. Whether you're designing distributed systems or curious about how Cloudflare processes petabytes of data, this conversation reveals the real-world engineering challenges and innovations shaping the future of cloud data platforms.</description>
      <content:encoded><![CDATA[<div>In this episode of The Data Engineering Show, host <a href="https://www.linkedin.com/in/wagjamin">Benjamin Wagner</a> sits down with <a href="https://www.linkedin.com/in/nikitalapkov">Nikita Lapkov</a>, Senior Software Engineer at Cloudflare, to explore the architecture, design decisions, and future roadmap of R2 SQL- Cloudflare's new R2-based distributed query engine launched in September 2024.<br><br><br><strong>What You'll Learn:<br><br></strong><br></div><ul><li><strong>How to leverage existing query engines strategically</strong>: Why Cloudflare chose Apache Data Fusion for single-node query processing rather than building an analytical engine from scratch, freeing engineering resources for distributed orchestration challenges.</li><li><strong>The stateless architecture pattern for global infrastructure</strong>: How to design compute nodes that hold zero persistent state by storing all metadata in a distributed catalog (Iceberg), enabling per-query worker provisioning across 300+ geographically dispersed data centers.</li><li><strong>Why filter pushdown and metadata-driven pruning are non-negotiable optimizations</strong>: How to reduce data scanned from object storage before query execution begins by leveraging catalog statistics and range filtering - the foundation of R2 SQL's performance gains.</li><li><strong>How to solve version compatibility at infrastructure scale</strong>: Why backward compatibility matters more than cross-version support when you can't control individual node upgrade timing, and how this constraint drives architectural decisions.</li><li><strong>The shuffle strategy for point-to-point distributed joins</strong>: How to implement in-memory and disk-based shuffles within ephemeral worker clusters using network-addressable worker IDs, allowing stateless workers to forget completely after query completion.</li><li><strong>Why adaptive query execution is the next frontier for petabyte-scale analytics</strong>: How collecting runtime data distribution statistics mid-query execution enables mid-flight plan reconfiguration - a technique worth the overhead investment when queries run for minutes or hours rather than milliseconds.<br><br></li></ul><div>If you enjoyed this episode, make sure to subscribe, rate, and review it on Apple Podcasts, Spotify, and YouTube Podcasts. Instructions on how to do this are here: <a href="https://www.fame.so/follow-rate-review">https://www.fame.so/follow-rate-review</a></div><div><br></div><div><strong><br>About the Guest(s)<br></strong><br></div><div><br></div><div>Nikita is a Senior Software Engineer at Cloudflare, specializing in distributed query engines and data platform architecture. With extensive experience in database internals gained through roles at ClickHouse, Yandex, and MongoDB, Nikita has developed deep expertise in query optimization and system design at scale. At Cloudflare, he leads the development of R2 SQL, a distributed analytical query engine built on Apache Data Fusion, serving as a critical component of Cloudflare's data platform. In this episode, Nikita discusses the architecture, design decisions, and technical challenges of building a stateless, distributed SQL engine across Cloudflare's unique 300-location infrastructure, offering valuable insights for engineers working on large-scale data systems. Their work demonstrates how thoughtful architectural choices and infrastructure constraints drive innovation in distributed database systems.<br><br></div><div><br></div><div><strong>Quotes<br></strong><br></div><div><br></div><div>"It was my crash course into OS engineering. We encouraged every possible bug in this project. It was very painful and very hard." - Nikita Lapkov<br><br>"Collecting a stack trace is very hidden, especially if you're not writing in C or C++. It is actually a very complicated and involved process." - Nikita Lapkov<br><br>"What excites me is that it has free egress. Usually, you would pay per gigabyte to load your data. You don't have that with R2." - Nikita Lapkov<br><br>"What we explicitly wanted to avoid when building R2 SQL is building an analytical query engine again. We would much rather use something off the shelf and work on the interesting distributed parts." - Nikita Lapkov<br><br>"No matter how complex the query is, you can make a case that, with extreme cases, the throughput for a single load operation is relatively constant, no matter how complex the query is." - Nikita Lapkov<br><br>"We try to be as stateless as possible. All our state lives in the catalog itself, so we only need what's in the catalog and the query that comes from the request." - Nikita Lapkov<br><br>"The shuffles cannot really be reused unless you do some very fancy heuristics. Once we have picked the workers for a particular query, we can think of them as our little cluster." - Nikita Lapkov<br><br>"Joins consume your entire roadmap, and this is pretty much what will be happening with us at some point. We need to make sure that distributed joins work really well, no matter what your data distribution is like." - Nikita Lapkov<br><br>"We have potentially minutes to spare, and optimizing some even subparts of the query is worthy investigation because it could shave hours or something like that." - Nikita Lapkov<br><br>"Finding the safe points for replanning and doing this distributed coordination while we have 50 different workers working on different parts of the query is definitely the area we want to look at in the coming year." - Nikita Lapkov<br><br></div><div><br></div><div><strong>Resources<br></strong><br></div><div><br></div><div><strong>Connect on LinkedIn:<br></strong><br></div><ul><li><strong>Nikita Lapkov</strong> - <a href="https://www.linkedin.com/in/nikitalapkov">https://www.linkedin.com/in/nikitalapkov</a></li><li><strong>Benjamin Wagner</strong> - <a href="https://www.linkedin.com/in/wagjamin/">https://www.linkedin.com/in/wagjamin<br></a><br></li></ul><div><strong>Websites:<br></strong><br></div><ul><li><strong>Firebolt </strong>– <a href="http://firebolt.io">firebolt.io</a></li><li><strong>Cloudflare </strong>–<strong> </strong><a href="http://cloudflare.com">cloudflare.com</a></li><li><strong>Apache Arrow DataFusion </strong>–<strong> </strong><a href="http://datafusion.apache.org">datafusion.apache.org</a></li></ul><div><br></div><div><strong>Tools &amp; Platforms:<br></strong><br></div><ul><li><strong>R2 SQL</strong> – Cloudflare's R2-based query engine for analytical queries</li><li><strong>Apache Arrow DataFusion</strong> – Analytical query engine used for single-node number crunching</li><li><strong>Arroyo </strong>– Rust-based streaming solution built on DataFusion</li><li><strong>R2</strong> – S3-compatible object storage with free egress</li><li><strong>Apache Iceberg</strong> – Catalog system for state management</li></ul><div>The Data Engineering Show is brought to you by <a href="https://www.firebolt.io/">firebolt.io</a> and handcrafted by our friends over at: <a href="https://www.fame.so/?utm_medium=podcast&amp;utm_source=bcast&amp;utm_campaign=masters-of-community-with-david-spinks?utm_medium=podcast&amp;utm_source=bcast&amp;utm_campaign=confessions-of-a-b2b-marketer">fame.so</a><br><br>Previous guests include: Joseph Machado of Linkedin, Metthew Weingarten of Disney, Joe Reis and Matt Housely, authors of <em>The Fundamentals of Data Engineering, </em>Zach Wilson of Eczachly Inc, Megan Lieu of Deepnote, Erik Heintare of Bolt, Lior Solomon of Vimeo, Krishna Naidu of Canva, Mike Cohen of Substack, Jens Larsson of Ark, Gunnar Tangring of Klarna, Yoav Shmaria of Similarweb and Xiaoxu Gao of Adyen.<br><br>Check out our three most downloaded episodes:</div><ul><li><a href="https://www.dataengineeringshow.com/e/6nrrrlzn">Zach Wilson on What Makes a Great Data Engineer</a></li><li><a href="https://www.dataengineeringshow.com/e/v8w441mn">Joe Reis and Matt Housley on The Fundamentals of Data Engineering</a></li><li><a href="https://www.dataengineeringshow.com/e/p8lxxy18">Bill Inmon, The Godfather of Data Warehousing</a></li></ul>]]></content:encoded>
      <pubDate>Tue, 24 Mar 2026 11:00:00 +0000</pubDate>
      <author>The Firebolt Data Bros</author>
      <enclosure url="https://media.fame.so/87p9pn6w.mp3" length="43652701" type="audio/mpeg"/>
      <itunes:author>The Firebolt Data Bros</itunes:author>
      <itunes:image href="https://content.fameapp.so/uploads/86qywn5q/caa7efb0-6c25-11ef-9489-d768d5a0a06d/caa7edf0-6c25-11ef-a69f-3da5bd01e0ea.jpg"/>
      <itunes:duration>1091</itunes:duration>
      <itunes:summary>What if building a distributed SQL engine meant rethinking everything about how query execution works at scale? In this episode, Benjamin sits down with Nikita, Senior Software Engineer at Cloudflare, to explore how R2 SQL leverages object storage and distributed computing to power analytics across 300 global locations, why backward compatibility becomes critical when you can't control infrastructure rollouts, and the key strategies for handling joins and adaptive query execution in a stateless, point-to-point network architecture. Whether you're designing distributed systems or curious about how Cloudflare processes petabytes of data, this conversation reveals the real-world engineering challenges and innovations shaping the future of cloud data platforms.</itunes:summary>
      <itunes:subtitle>What if building a distributed SQL engine meant rethinking everything about how query execution works at scale? In this episode, Benjamin sits down with Nikita, Senior Software Engineer at Cloudflare, to explore how R2 SQL leverages object storage and distributed computing to power analytics across 300 global locations, why backward compatibility becomes critical when you can't control infrastructure rollouts, and the key strategies for handling joins and adaptive query execution in a stateless, point-to-point network architecture. Whether you're designing distributed systems or curious about how Cloudflare processes petabytes of data, this conversation reveals the real-world engineering challenges and innovations shaping the future of cloud data platforms.</itunes:subtitle>
      <itunes:keywords>Nikita Lapkov, BenjaminCloudflare, R2 SQL, Nikita Lapkov, Data Fusion, database internals, distributed systems, SQL query engine, serverless storage, zero-egress, ClickHouse, Rust, Apache Arrow, data engineering, cloud architecture, adaptive query execution</itunes:keywords>
      <itunes:explicit>No</itunes:explicit>
      <googleplay:explicit>No</googleplay:explicit>
    </item>
    <item>
      <title>How Zipline AI Turns Weeks of Engineering Into Minutes of SQL Queries ft. Nikhil Simha</title>
      <link>https://podcasts.fame.so/e/l8qwvkz8-zipline-ai-minutes-of-sql-ft-nikhil-simha</link>
      <itunes:title>How Zipline AI Turns Weeks of Engineering Into Minutes of SQL Queries ft. Nikhil Simha</itunes:title>
      <itunes:episode>52</itunes:episode>
      <itunes:block>No</itunes:block>
      <googleplay:block>No</googleplay:block>
      <guid isPermaLink="false">80nvr7n0</guid>
      <description>What if you could deploy ML features and real-time data pipelines without building complex infrastructure from scratch? 

In this episode, host Benjamin sits down with Nikhil Simha, CTO at Zipline AI and co-author of Chronon AI, to explore how Chronon, an open-source system that generates data infrastructure from simple queries, is transforming feature engineering at companies like OpenAI and Airbnb. Learn why iteration speed matters for fraud detection, how to serve thousands of signals at a massive scale, and what the future of analytical databases looks like in an AI-first world. Whether you're scaling real-time ML systems or building customer-facing analytics, this conversation is packed with practical insights on bridging the gap between data scientists and ML engineers.</description>
      <content:encoded><![CDATA[<div>In this episode of The Data Engineering Show, host <a href="https://www.linkedin.com/in/wagjamin/">Benjamin</a> sits down with <a href="https://www.linkedin.com/in/nikhilsimha/">Nikhil Simha</a>, CTO of Zipline AI and co-author of Chronon, to explore how a declarative feature platform solves the speed-vs-scale paradox in modern ML infrastructure, from fraud detection at Airbnb to powering OpenAI's recommendation systems.<br><br></div><div><br></div><div><strong>What You'll Learn:<br></strong><br></div><div><br></div><ul><li><strong>How to eliminate the data scientist-to-ML engineer bottleneck</strong> by generating Spark, Flink, and orchestration pipelines automatically from simple SQL queries, enabling data scientists to ship features independently without waiting for engineering resources</li></ul><div><br></div><ul><li><strong>Why fraud detection demands real-time feature iteration: </strong>The adversarial nature of fraud requires companies to build and deploy new detection models in days, not months- a timeline impossible with manual pipeline engineering</li></ul><div><br></div><ul><li><strong>The "precompute everything" optimization principle for serving latency: </strong>Chronon minimizes query response time by batching feature computation upstream through stream and batch processing, then delivering pre-aggregated signals to models in milliseconds</li></ul><div><br></div><ul><li><strong>How to safely ship feature versions in production</strong> using dual-write strategies that keep old and new feature versions running simultaneously, enabling A/B testing and instant rollbacks without service disruption</li></ul><div><br></div><ul><li><strong>Why context engineering, not just RAG, powers modern LLM applications:</strong> ML model predictions (fraud risk scores, user signals, embeddings) feed directly into LLM prompts as structured context, improving decision quality for both human and AI agents</li></ul><div><br></div><ul><li><strong>The critical gap in open-source data infrastructure: </strong>Modern systems need query engines that scale seamlessly from single-machine to distributed clusters - today's choice between lightweight tools (DuckDB) and heavyweight platforms (Spark) leaves mid-scale and product-embedded analytics underserved</li></ul><div><br><br></div><div>If you enjoyed this episode, make sure to subscribe, rate, and review it on <a href="https://podcasts.apple.com/gb/podcast/the-data-engineering-show/id1561927688">Apple Podcasts</a>, <a href="https://open.spotify.com/show/6hMdnrFKlPbia2k6MkFs8U">Spotify</a>, and <a href="https://www.youtube.com/@thedataengineeringshow">YouTube Podcasts</a>. Instructions on how to do this are here: <a href="https://www.fame.so/follow-rate-review">https://www.fame.so/follow-rate-review</a></div><div><br></div><div><strong><br>About the Guest(s)<br></strong><br></div><div><br>Nikhil Simha is the CTO at Zipline AI, bringing extensive experience from leadership roles at Airbnb and Facebook. He is a co-author of Chronon, an open-source feature engineering platform that automates the generation of ML infrastructure from declarative queries. With deep expertise in real-time data systems, fraud detection, and feature engineering at scale, Nikhil has architected solutions powering recommendation systems and risk detection across billions of user interactions. In this episode, he shares insights on building scalable ML infrastructure, integrating LLMs with real-time feature contexts, and the evolving data engineering landscape. His work has directly impacted how organizations from early-stage startups to Fortune 500 companies approach feature engineering and real-time ML serving, making this conversation essential for engineers building production AI systems.</div><div><br></div><div><strong><br>Quotes<br></strong><br></div><div><br>"Fraud is adversarial. Right? Like, someone comes up with a new way to do fraud somewhere around the world, and people at Airbnb need to react to it very quickly." - Nikhil</div><div><br>"Chronon, at its core, generates these systems from queries. So users write queries on Chronon, and we generate all of these under the hood." - Nikhil</div><div><br>"Chronon allows data scientists to operate independently." - Nikhil</div><div><br>"The main problem there was that the traditional model of data scientists writing some logic and ML engineers going and billing system out for that logic, that was too slow for fraud detection." - Nikhil</div><div><br>"They have to come up with a new model in a matter of days. They don't have, like, this three to five month period where they can sit and create the new model, build all of these pipelines." - Nikhil</div><div><br>"There is a real gap in the industry for an engine that goes all the way from single machine scale to thousands of machine scale seamlessly." - Nikhil</div><div><br>"Most people, for ninety-five percent of their queries, don't need Spark in RPA. Right? But there is that 5% usually, like, a lot of ML falls into that." - Nikhil</div><div><br>"We are handling query fragments. Right? We take query fragments, generate very specialized logic for that, and run that through Spark's distributed processing topologies." - Nikhil</div><div><br>"The new trend in the industry would be, like, towards these engines that can work at any scale and be useful for interactive and large processing workloads." - Nikhil</div><div><br>"I think Iceberg is great that way because you're not fragmenting to different proprietary data formats, different proprietary engines." - Nikhil</div><div><br></div><div><strong><br>Resources</strong></div><div><strong>&nbsp;<br></strong><br></div><div><strong>Connect on LinkedIn:<br></strong><br></div><ul><li>Nikhil Simha - <a href="https://www.linkedin.com/in/nikhilsimha/">https://www.linkedin.com/in/nikhilsimha</a></li><li>Benjamin Wagner - <a href="https://www.linkedin.com/in/wagjamin/">https://www.linkedin.com/in/wagjamin<br></a><br></li></ul><div><br></div><div><strong>Websites:<br></strong><br></div><ul><li><strong>Zipline AI</strong> – <a href="http://zipline.ai">zipline.ai</a></li><li><strong>Firebolt</strong> – <a href="http://firebolt.io">firebolt.io</a></li></ul><div><br><strong><br>Tools &amp; Platforms:<br></strong><br></div><ul><li><strong>Chronon </strong>– Feature engineering and real-time ML infrastructure platform for generating data pipelines from queries</li><li><strong>Apache Spark </strong>– Distributed data processing engine for batch and large-scale processing workloads</li><li><strong>Apache Flink</strong> – Stream processing engine for real-time data transformations</li><li><strong>Redis</strong> – In-memory key-value store for feature serving</li><li><strong>Apache Iceberg</strong> – Open table format for data lake storage</li><li><strong>Airflow</strong> – Workflow orchestration platform for pipeline scheduling</li><li><strong>DuckDB</strong> – Open-source analytical database for single-machine to moderate-scale processing</li><li><strong>BigQuery</strong> – Google Cloud data warehouse</li><li><strong>Snowflake</strong> – Cloud-based data warehouse platform</li><li><strong>Kubernetes</strong> – Container orchestration platform</li></ul><div><br></div><div>The Data Engineering Show is brought to you by <a href="https://www.firebolt.io/">firebolt.io</a> and handcrafted by our friends over at: <a href="https://www.fame.so/?utm_medium=podcast&amp;utm_source=bcast&amp;utm_campaign=masters-of-community-with-david-spinks?utm_medium=podcast&amp;utm_source=bcast&amp;utm_campaign=confessions-of-a-b2b-marketer">fame.so</a><br><br>Previous guests include: Joseph Machado of Linkedin, Metthew Weingarten of Disney, Joe Reis and Matt Housely, authors of <em>The Fundamentals of Data Engineering, </em>Zach Wilson of Eczachly Inc, Megan Lieu of Deepnote, Erik Heintare of Bolt, Lior Solomon of Vimeo, Krishna Naidu of Canva, Mike Cohen of Substack, Jens Larsson of Ark, Gunnar Tangring of Klarna, Yoav Shmaria of Similarweb and Xiaoxu Gao of Adyen.<br><br>Check out our three most downloaded episodes:</div><ul><li><a href="https://www.dataengineeringshow.com/e/6nrrrlzn">Zach Wilson on What Makes a Great Data Engineer</a></li><li><a href="https://www.dataengineeringshow.com/e/v8w441mn">Joe Reis and Matt Housley on The Fundamentals of Data Engineering</a></li><li><a href="https://www.dataengineeringshow.com/e/p8lxxy18">Bill Inmon, The Godfather of Data Warehousing</a></li></ul>]]></content:encoded>
      <pubDate>Tue, 10 Mar 2026 11:00:00 +0000</pubDate>
      <author>The Firebolt Data Bros</author>
      <enclosure url="https://media.fame.so/w16712p8.mp3" length="58340831" type="audio/mpeg"/>
      <itunes:author>The Firebolt Data Bros</itunes:author>
      <itunes:image href="https://content.fameapp.so/uploads/86qywn5q/caa7efb0-6c25-11ef-9489-d768d5a0a06d/caa7edf0-6c25-11ef-a69f-3da5bd01e0ea.jpg"/>
      <itunes:duration>1458</itunes:duration>
      <itunes:summary>What if you could deploy ML features and real-time data pipelines without building complex infrastructure from scratch? 

In this episode, host Benjamin sits down with Nikhil Simha, CTO at Zipline AI and co-author of Chronon AI, to explore how Chronon, an open-source system that generates data infrastructure from simple queries, is transforming feature engineering at companies like OpenAI and Airbnb. Learn why iteration speed matters for fraud detection, how to serve thousands of signals at a massive scale, and what the future of analytical databases looks like in an AI-first world. Whether you're scaling real-time ML systems or building customer-facing analytics, this conversation is packed with practical insights on bridging the gap between data scientists and ML engineers.</itunes:summary>
      <itunes:subtitle>What if you could deploy ML features and real-time data pipelines without building complex infrastructure from scratch? 

In this episode, host Benjamin sits down with Nikhil Simha, CTO at Zipline AI and co-author of Chronon AI, to explore how Chronon, an open-source system that generates data infrastructure from simple queries, is transforming feature engineering at companies like OpenAI and Airbnb. Learn why iteration speed matters for fraud detection, how to serve thousands of signals at a massive scale, and what the future of analytical databases looks like in an AI-first world. Whether you're scaling real-time ML systems or building customer-facing analytics, this conversation is packed with practical insights on bridging the gap between data scientists and ML engineers.</itunes:subtitle>
      <itunes:keywords>Chronon, feature engineering, real-time ML pipelines, Zipline AI, fraud detection, online feature serving, data engineering infrastructure, feature store, query engine scalability, open-source data infrastructure, iceberg adoption, Nick Simha, Benjamin Wagner</itunes:keywords>
      <itunes:explicit>No</itunes:explicit>
      <googleplay:explicit>No</googleplay:explicit>
    </item>
    <item>
      <title>The Geo-Data Problem Nobody Talks About And How Voi Solved It ft. Magnus Dahlbäck</title>
      <link>https://podcasts.fame.so/e/q80v4wp8-the-geo-data-problem-nobody-talks-about-and-how-voi-solved-it-ft-magnus-dahlbaeck</link>
      <itunes:title>The Geo-Data Problem Nobody Talks About And How Voi Solved It ft. Magnus Dahlbäck</itunes:title>
      <itunes:episode>51</itunes:episode>
      <itunes:block>No</itunes:block>
      <googleplay:block>No</googleplay:block>
      <guid isPermaLink="false">p0knm8v1</guid>
      <description>What if your data platform could power both critical business decisions and real-time product features at scale? In this episode, host Benjamin sits down with Magnus Dahlbäck, Senior Director of Data and Platform at Voi, to explore how a metrics-first approach and semantic layers transform data accessibility, why traditional ML and LLMs require different strategies for different problems, and how to balance FinOps costs while processing billions of IoT events daily. Whether you're building data infrastructure for a high-growth company or rethinking how your organization consumes data, this conversation is packed with practical strategies for unlocking data value and preparing your platform for AI. Tune in to discover how Voi ditched traditional BI tools and revolutionized their approach to enterprise analytics.</description>
      <content:encoded><![CDATA[<div><br>In this episode of The Data Engineering Show, host <a href="https://www.linkedin.com/in/wagjamin/">Benjamin</a> sits down with <a href="https://www.linkedin.com/in/magnusdahlback/">Magnus Dahlbäck</a>, Senior Director of Data and Platform at Voi, to explore how a rapidly scaling European e-scooter company transformed its data infrastructure, adopted a metrics-first approach to analytics, and is now leveraging AI to solve real-time operational challenges across 150 cities and 150,000 vehicles.<br><br></div><div><br></div><div><strong>What You'll Learn:</strong></div><div><br></div><ul><li><strong>How to escape the "dashboard chaos" trap</strong> by adopting a metrics-first architecture with a semantic layer, reducing confusion from hundreds of conflicting dashboards to a single source of truth across the organization</li></ul><div><br></div><ul><li><strong>Why replacing Tableau with Steep (a metrics-centric BI tool) unlocked self-service analytics</strong> for non-technical users, empowering teams to answer their own data questions without waiting months for custom dashboard builds</li></ul><div><br></div><ul><li><strong>The real-world cost optimization challenge</strong> of managing Snowflake expenses that scale 1:1 with ride volume—and why data leaders must constantly rethink architecture to control FinOps in high-growth environments</li></ul><div><br></div><ul><li><strong>How to architect for IoT at scale:</strong> processing billions of daily events from connected vehicles using micro-batch pipelines (5-minute intervals) while keeping real-time machine learning inference separate through cross-functional product teams</li></ul><div><br></div><ul><li><strong>The decision framework for choosing traditional ML vs. LLMs:</strong> use traditional methods for accuracy-critical workloads (supply-demand forecasting for vehicle positioning) and LLMs for pattern discovery where 100% precision isn't required (analyzing rider feedback)</li></ul><div><br></div><ul><li>&nbsp;<strong>How to build proactive customer support powered by data and AI:</strong> leverage sensor data and ride telemetry to detect poor user experiences and reach out before customers complain, rather than waiting for refund requests</li></ul><div><br></div><div>If you enjoyed this episode, make sure to subscribe, rate, and review it on <a href="https://podcasts.apple.com/gb/podcast/the-data-engineering-show/id1561927688">Apple Podcasts</a>, <a href="https://open.spotify.com/show/6hMdnrFKlPbia2k6MkFs8U">Spotify</a>, and <a href="https://www.youtube.com/@thedataengineeringshow">YouTube Podcasts</a>. Instructions on how to do this are <a href="https://www.fame.so/follow-rate-review">here</a>: https://www.fame.so/follow-rate-review.<br><br></div><div><strong><br>About the Guest(s)<br></strong><br></div><div><br>Magnus Dahlbäck is Senior Director of Data and Platform at Voi, a leading European micro-mobility company, where he oversees the data analytics team, platform infrastructure, and AI initiatives. With over four years at Voi, Magnus has scaled the data organization from three people to a comprehensive team of platform engineers, data analysts, and data scientists while architecting a modern data stack centered on metrics-first analytics and semantic layers. In this episode, Magnus shares insights on building scalable data platforms for IoT-heavy, real-world products, including strategies for managing billions of daily events, implementing self-service analytics, and balancing traditional machine learning with large language models. His work at Voi—where the data platform powers both internal analytics and customer-facing product features—demonstrates how thoughtful data architecture drives measurable business impact, making this conversation essential for data leaders navigating AI integration and data democratization.</div><div><br></div><div><strong><br>Quotes<br></strong><br></div><div><br>"There are hundreds of dashboards, and I'm looking for some data, some metrics, and there are 10 dashboards that contain that, and they all show different numbers." - Magnus</div><div><br>"Metrics is a very natural way of interacting with data rather than dashboards that are named something randomly." - Magnus</div><div><br>"We're basically throwing man hours on slicing and dicing data, trying to find patterns, anomalies that we often miss, right, because it just takes too much time." - Magnus</div><div><br>"The way we work with data hasn't really changed that much in the last ten, twenty years to be completely fair, but now we're seeing new technologies, new approaches to it." - Magnus</div><div><br>"It comes down to the use case. What's the accuracy we need?" - Magnus</div><div><br>"We can see from the sensor data, from the IoT, from other data points during your ride if it was a good or bad experience, so why don't we reach out to you?" - Magnus</div><div><br>"Building software around physical objects is really cool when you're a techie guy like me, working at a company where it's a combination of software, B to C, hardware, IoT." - Magnus</div><div><br>"The biggest dataset that we process is IoT data—billions of events every day, basically, that we process." - Magnus</div><div><br>"We have cross functional teams where all the product teams have everything from back end to front end to data people, designers, and so on." - Magnus</div><div><br>"Metrics is kind of the business language that we use—we talk about rides, average ride charge, active vehicles—so metrics is a very natural way of interacting with data." - Magnus</div><div><br></div><div><strong><br>Resources</strong></div><div><strong>&nbsp;<br></strong><br></div><div><strong>Connect on LinkedIn:<br></strong><br></div><ul><li>Magnus Dahlbäck - <a href="https://www.linkedin.com/in/magnusdahlback/">https://www.linkedin.com/in/magnusdahlback/</a></li><li>Benjamin Wagner - <a href="https://www.linkedin.com/in/wagjamin/">https://www.linkedin.com/in/wagjamin/<br></a><br></li></ul><div><strong><br>Websites:</strong></div><ul><li>Guest's Company: Voi Technologies Website <a href="http://voi.com">(voi.com</a>)</li><li>Host's Company: Firebolt Website (<a href="http://firebolt.io">firebolt.io</a>)</li></ul><div><strong><br>Tools &amp; Platforms:</strong></div><ul><li>Snowflake – Data warehouse for analytics and machine learning workloads</li><li>DBT (Data Build Tool) – Data transformation and modeling</li><li>Apache Airflow – Workflow orchestration</li><li>Steep – Metrics-first BI tool with semantic layer (Swedish startup)</li><li>GCP Vertex AI – Machine learning platform for model training and deployment</li></ul><div><br></div><div>The Data Engineering Show is brought to you by <a href="https://www.firebolt.io/">firebolt.io</a> and handcrafted by our friends over at: <a href="https://www.fame.so/?utm_medium=podcast&amp;utm_source=bcast&amp;utm_campaign=masters-of-community-with-david-spinks?utm_medium=podcast&amp;utm_source=bcast&amp;utm_campaign=confessions-of-a-b2b-marketer">fame.so</a><br><br>Previous guests include: Joseph Machado of Linkedin, Metthew Weingarten of Disney, Joe Reis and Matt Housely, authors of <em>The Fundamentals of Data Engineering, </em>Zach Wilson of Eczachly Inc, Megan Lieu of Deepnote, Erik Heintare of Bolt, Lior Solomon of Vimeo, Krishna Naidu of Canva, Mike Cohen of Substack, Jens Larsson of Ark, Gunnar Tangring of Klarna, Yoav Shmaria of Similarweb and Xiaoxu Gao of Adyen.<br><br>Check out our three most downloaded episodes:</div><ul><li><a href="https://www.dataengineeringshow.com/e/6nrrrlzn">Zach Wilson on What Makes a Great Data Engineer</a></li><li><a href="https://www.dataengineeringshow.com/e/v8w441mn">Joe Reis and Matt Housley on The Fundamentals of Data Engineering</a></li><li><a href="https://www.dataengineeringshow.com/e/p8lxxy18">Bill Inmon, The Godfather of Data Warehousing</a></li></ul>]]></content:encoded>
      <pubDate>Thu, 19 Feb 2026 11:00:00 +0000</pubDate>
      <author>The Firebolt Data Bros</author>
      <enclosure url="https://media.fame.so/wyqj26lw.mp3" length="38676897" type="audio/mpeg"/>
      <itunes:author>The Firebolt Data Bros</itunes:author>
      <itunes:image href="https://content.fameapp.so/uploads/86qywn5q/caa7efb0-6c25-11ef-9489-d768d5a0a06d/caa7edf0-6c25-11ef-a69f-3da5bd01e0ea.jpg"/>
      <itunes:duration>966</itunes:duration>
      <itunes:summary>What if your data platform could power both critical business decisions and real-time product features at scale? In this episode, host Benjamin sits down with Magnus Dahlbäck, Senior Director of Data and Platform at Voi, to explore how a metrics-first approach and semantic layers transform data accessibility, why traditional ML and LLMs require different strategies for different problems, and how to balance FinOps costs while processing billions of IoT events daily. Whether you're building data infrastructure for a high-growth company or rethinking how your organization consumes data, this conversation is packed with practical strategies for unlocking data value and preparing your platform for AI. Tune in to discover how Voi ditched traditional BI tools and revolutionized their approach to enterprise analytics.</itunes:summary>
      <itunes:subtitle>What if your data platform could power both critical business decisions and real-time product features at scale? In this episode, host Benjamin sits down with Magnus Dahlbäck, Senior Director of Data and Platform at Voi, to explore how a metrics-first approach and semantic layers transform data accessibility, why traditional ML and LLMs require different strategies for different problems, and how to balance FinOps costs while processing billions of IoT events daily. Whether you're building data infrastructure for a high-growth company or rethinking how your organization consumes data, this conversation is packed with practical strategies for unlocking data value and preparing your platform for AI. Tune in to discover how Voi ditched traditional BI tools and revolutionized their approach to enterprise analytics.</itunes:subtitle>
      <itunes:keywords>Voi, Magnus Dahlbäck, Data Engineering, IoT, Snowflake, DBT, Airflow, BI, Semantic Layer, Metrics-First, Mobility, AI, Machine Learning, Vertex AI, Urban Tech, FinOps, Data Scalability, Startup Growth</itunes:keywords>
      <itunes:explicit>No</itunes:explicit>
      <googleplay:explicit>No</googleplay:explicit>
    </item>
    <item>
      <title>Why 99% of Data Teams Give Up on Real-Time And How Artie Changes That</title>
      <link>https://podcasts.fame.so/e/28xzmj68-why-99-of-data-teams-give-up-on-real-time-and-how-artie-changes-that</link>
      <itunes:title>Why 99% of Data Teams Give Up on Real-Time And How Artie Changes That</itunes:title>
      <itunes:episode>50</itunes:episode>
      <itunes:block>No</itunes:block>
      <googleplay:block>No</googleplay:block>
      <guid isPermaLink="false">60mk5pv0</guid>
      <description>What happens when a team of seven engineers spends a year trying to build a production-ready CDC connector and fails? For Artie CTO and co-founder Robin Tang, it was the spark needed to build a platform that makes data streaming accessible. In this episode, Robin joins Benjamin to discuss the "DFS" (Deep First Search) approach to data sources, the engineering hurdles of real-time Postgres-to-Snowflake pipelines, and why "theoretically correct" architectures often fail in practice.</description>
      <content:encoded><![CDATA[<div>In this episode of The Data Engineering Show, <a href="https://www.linkedin.com/in/wagjamin/">Benjamin</a> sits down with Artie CTO and co-founder <a href="https://www.linkedin.com/in/tang8330/">Robin Tang</a>, to explore the complexities of high-performance data movement. Robin shares his journey from building Maxwell at Zendesk to scaling data systems at Open Door, highlighting the gap between business-oriented SaaS connectors and the rigorous demands of production database replication.<br><br></div><div>Robin dives deep into Artie’s architecture, explaining how they leverage a split-plane model (Control Plane and Data Plane) to provide a "Bring Your Own Cloud" (BYOC) experience that engineering teams actually trust. You’ll hear about the technical nuances of CDC, from handling Postgres TOAST columns to the "economy of scale" challenges of processing billions of rows for Substack, Artie’s first customer. Whether you're struggling with real-time ingestion costs or curious about the future of platform-agnostic partitioning, this conversation provides a masterclass in modern data movement.<br><br></div><div><br></div><div><strong>What You'll Learn:<br></strong><br></div><ul><li><strong>Why the data movement market is bifurcating:</strong> Managed vendors like Fivetran excel at SaaS integrations (hundreds of connectors), while specialized vendors like Artie focus on production databases at high volume - a fundamentally different job to be done requiring expertise in failure recovery, observability, and advanced use cases.</li><li><strong>How to design CDC architecture that doesn't break production databases:</strong> Use online backfill strategies (DB log framework) instead of long-running transactions that hold write locks; implement table-level parallelism so a single table error doesn't halt the entire pipeline.</li><li><strong>The split-plane architecture pattern for flexible deployment models:</strong> Build control plane and data plane separation from day one, allowing customers to choose between fully managed cloud deployments or bring-your-own-cloud (BYOC) without compromising UX or architecture.</li><li><strong>Why database-specific expertise matters more than breadth:</strong> SQL Server CDC requires reverse engineering undocumented code; Postgres has TOAST columns; MongoDB allows invalid timestamp values - each data source has hidden complexity that justifies deep specialization over connector sprawl.</li><li><strong>How to build trust with early-stage customers on mission-critical workloads:</strong> Walk prospects through architecture and failure modes before implementation; encourage them to stress-test with real data volumes; establish deep engineering partnerships where both teams debug problems together (not sales-driven relationships).</li><li><strong>The platform-specific optimization trap and how to solve it:</strong> Instead of requiring customers to understand nuances of BigQuery time partitioning vs. Snowflake's lack thereof, build platform-agnostic features (like soft partitioning) that work consistently across destinations while handling platform-specific optimizations under the hood.</li></ul><div><br></div><div>If you enjoyed this episode, make sure to subscribe, rate, and review it on <a href="https://podcasts.apple.com/gb/podcast/the-data-engineering-show/id1561927688">Apple Podcasts</a>, <a href="https://open.spotify.com/show/6hMdnrFKlPbia2k6MkFs8U">Spotify</a>, and <a href="https://www.youtube.com/@thedataengineeringshow">YouTube Podcasts</a>. Instructions on how to do this are <a href="https://www.fame.so/follow-rate-review">here</a>.</div><div><br><strong><br>About the Guest(s)<br></strong><br></div><div><br>Robin is the CTO and cofounder of Artie, a data movement platform built for high-volume, low-latency production database replication. With over a decade of experience building large-scale data systems, including early work on Maxwell (an open-source CDC framework at Zendesk) and database architecture at venture-backed startups, Robin identified a critical gap: existing tools optimize for SaaS integrations, not production databases at scale. In this episode, Robin shares hard-won lessons from building mission-critical infrastructure, including architectural innovations that prevent data loss and failure modes that only surface under real-world production load. His work at Artie has powered reliable data replication for companies like Substack, making this conversation essential for engineering teams building or evaluating real-time data movement solutions.</div><div><br></div><div><strong><br>Quotes<br></strong><br></div><div><br>“Artie helps companies make data streaming accessible." - Robin</div><div><br></div><div><br>"I didn't want to make any sort of compromises and it just turned out to be a really hard problem, so then we started a company around this." - Robin</div><div><br></div><div><br>"The complexity is not just at the destination level, the complexity is also at the source level." - Robin</div><div><br></div><div><br>"Every pipeline that we touch is mission critical for customers, or else they would just use either their existing pipeline or a managed vendor that's out there." - Robin</div><div><br></div><div><br>"We handle the whole thing, whereas other vendors more or less provide a component and expect engineers to either build or attach additional pieces." - Robin</div><div><br></div><div><br>"I think the biggest bottleneck for real time right now is accessibility. When people think about real time, they immediately think it's not worth it because they implicitly have a cost associated with it." - Robin</div><div><br></div><div><br>"We use Kafka transactions, so we do not commit offsets until the destination tells us the data has actually been flushed." - Robin</div><div><br></div><div><br>"There's so much nuance with every single data source that it becomes a whack-a-mole problem." - Robin</div><div><br></div><div><br>"When there's sufficient pain on the other side and they buy into your vision, it's easier to overcome obstacles during technical implementation." - Robin</div><div><br></div><div><br>"We're spending more time developing platform-agnostic solutions so customers don't have to understand platform nuances." - Robin</div><div><br><br><br><strong>Resources</strong></div><div><strong>&nbsp;<br></strong><br></div><div><strong>Connect on LinkedIn:</strong></div><ul><li>Robin Tang - <a href="https://www.linkedin.com/in/tang8330/">https://www.linkedin.com/in/tang8330/</a></li><li>Benjamin Wagner - <a href="https://www.linkedin.com/in/wagjamin/">https://www.linkedin.com/in/wagjamin/</a> <a href="https://www.linkedin.com/in/wagjamin/"><br></a><br></li></ul><div><strong><br>Websites:</strong></div><ul><li>Artie: <a href="https://www.artie.com/">https://www.artie.com/</a></li><li>Fivetran: <a href="https://www.fivetran.com">https://www.fivetran.com</a></li><li>Estuary: <a href="https://www.estuary.dev">https://www.estuary.dev</a></li><li>Airbyte: <a href="https://airbyte.com">https://airbyte.com</a></li><li>Debezium: <a href="https://debezium.io">https://debezium.io</a></li></ul><div><strong><br>Tools &amp; Platforms:</strong></div><ul><li>Maxwell – Open source CDC framework for MySQL to read binlog into Kafka</li><li>Kafka – Distributed event streaming platform for data movement</li><li>WarpStream – Cost-optimized Kafka alternative using object storage</li><li>Streamsy – Kubernetes-native Kafka deployment tool</li><li>Apache Iceberg – Open table format for data lakehouse architecture</li><li>Delta Live Tables – Databricks' data movement and transformation tool</li><li>ClickPipes – ClickHouse's native data ingestion platform</li><li>Snowpipe Streaming – Snowflake's real-time data ingestion service</li><li>Google Datastream – Google Cloud's CDC and data movement service</li><li>AWS MSK Tiered Storage – Amazon managed Kafka with tiered storage capabilities</li></ul><div><br></div><div>The Data Engineering Show is brought to you by <a href="https://www.firebolt.io/">firebolt.io</a> and handcrafted by our friends over at: <a href="https://www.fame.so/?utm_medium=podcast&amp;utm_source=bcast&amp;utm_campaign=masters-of-community-with-david-spinks?utm_medium=podcast&amp;utm_source=bcast&amp;utm_campaign=confessions-of-a-b2b-marketer">fame.so</a><br><br>Previous guests include: Joseph Machado of Linkedin, Metthew Weingarten of Disney, Joe Reis and Matt Housely, authors of <em>The Fundamentals of Data Engineering, </em>Zach Wilson of Eczachly Inc, Megan Lieu of Deepnote, Erik Heintare of Bolt, Lior Solomon of Vimeo, Krishna Naidu of Canva, Mike Cohen of Substack, Jens Larsson of Ark, Gunnar Tangring of Klarna, Yoav Shmaria of Similarweb and Xiaoxu Gao of Adyen.<br><br>Check out our three most downloaded episodes:</div><ul><li><a href="https://www.dataengineeringshow.com/e/6nrrrlzn">Zach Wilson on What Makes a Great Data Engineer</a></li><li><a href="https://www.dataengineeringshow.com/e/v8w441mn">Joe Reis and Matt Housley on The Fundamentals of Data Engineering</a></li><li><a href="https://www.dataengineeringshow.com/e/p8lxxy18">Bill Inmon, The Godfather of Data Warehousing</a></li></ul>]]></content:encoded>
      <pubDate>Tue, 03 Feb 2026 02:46:00 +0000</pubDate>
      <author>The Firebolt Data Bros</author>
      <enclosure url="https://media.fame.so/wj0m45vw.mp3" length="56254078" type="audio/mpeg"/>
      <itunes:author>The Firebolt Data Bros</itunes:author>
      <itunes:image href="https://content.fameapp.so/uploads/86qywn5q/caa7efb0-6c25-11ef-9489-d768d5a0a06d/caa7edf0-6c25-11ef-a69f-3da5bd01e0ea.jpg"/>
      <itunes:duration>1757</itunes:duration>
      <itunes:summary>What happens when a team of seven engineers spends a year trying to build a production-ready CDC connector and fails? For Artie CTO and co-founder Robin Tang, it was the spark needed to build a platform that makes data streaming accessible. In this episode, Robin joins Benjamin to discuss the "DFS" (Deep First Search) approach to data sources, the engineering hurdles of real-time Postgres-to-Snowflake pipelines, and why "theoretically correct" architectures often fail in practice.</itunes:summary>
      <itunes:subtitle>What happens when a team of seven engineers spends a year trying to build a production-ready CDC connector and fails? For Artie CTO and co-founder Robin Tang, it was the spark needed to build a platform that makes data streaming accessible. In this episode, Robin joins Benjamin to discuss the "DFS" (Deep First Search) approach to data sources, the engineering hurdles of real-time Postgres-to-Snowflake pipelines, and why "theoretically correct" architectures often fail in practice.</itunes:subtitle>
      <itunes:keywords>Robin Tang, Benjamin Wagner, data engineering show, data movement platform, CDC (Change Data Capture), real-time data ingestion, Postgres to Snowflake connector, data streaming, Artie data platform, mission-critical pipelines, low-latency data sync, database replication, data warehouse integration</itunes:keywords>
      <itunes:explicit>No</itunes:explicit>
      <googleplay:explicit>No</googleplay:explicit>
    </item>
    <item>
      <title>The $100M Problem: How Lyft's Data Platform Prevents ML Failures with Ritesh Varyani at Lyft</title>
      <link>https://podcasts.fame.so/e/489mzy78-the-100m-problem-how-lyft-s-data-platform-prevents-ml-failures-with-ritesh-varyani-at-lyft</link>
      <itunes:title>The $100M Problem: How Lyft's Data Platform Prevents ML Failures with Ritesh Varyani at Lyft</itunes:title>
      <itunes:episode>49</itunes:episode>
      <itunes:block>No</itunes:block>
      <googleplay:block>No</googleplay:block>
      <guid isPermaLink="false">v17rky60</guid>
      <description>What if your data platform could serve AI-native workloads while scaling reliably across your entire organization? In this episode, Benjamin sits down with Ritesh, Staff Engineer at Lyft, to explore how to build a unified data stack with Spark, Trino, and ClickHouse, why AI is reshaping infrastructure decisions, and the strategies powering one of the industry's most sophisticated data platforms. Whether you're architecting data systems at scale or integrating AI into your analytics workflow, this conversation delivers actionable insights into reliability, modernization, and the future of data engineering. Tune in to discover how Lyft is balancing open-source investments with cutting-edge AI capabilities to unlock better insights from data.</description>
      <content:encoded><![CDATA[<div>In this episode of the Data Engineering Show, host <a href=" https://www.linkedin.com/in/wagjamin/">Benjamin Wagner</a> sits down with <a href="https://www.linkedin.com/in/riteshvaryani/">Ritesh Varyani</a>, Staff Software Engineer at Lyft, to explore how the company manages a sophisticated multi-engine data stack serving thousands of engineers, while simultaneously integrating AI across infrastructure and user-facing analytics.<br><br></div><div><strong>What You'll Learn:<br></strong><br></div><ul><li><strong>How to architect a polyglot data platform</strong> that serves fundamentally different workloads, Spark for ML training and massive parallel processing, Trino for dashboarding and medium-scale ETL, and ClickHouse for sub-second OLAP queries without creating operational chaos</li><li><strong>Why unification matters more than expansion:</strong> Lyft's 2026 strategy prioritizes consolidating and simplifying the data stack rather than adding new tools, reducing maintenance burden and improving reliability for end users</li><li><strong>The dual-layer AI strategy</strong> that simultaneously enhances user analytics (semantic layer v2 with AI-native support) while automating platform operations (intelligent job failure diagnosis, adaptive resource allocation, and agentic workflow optimization)</li><li><strong>How to fund innovation from the bottom-up</strong>: Lyft's model encourages individual engineers to experiment with AI on their own time, prove business value through POCs, and secure leadership buy-in through demonstrated alignment with company strategy</li><li><strong>Why vendor selection now includes AI explainability and debuggability</strong> as standard RFP requirements, even when AI isn't the primary driver of a purchasing decision</li><li><strong>The framework for deciding open-source investment vs. managed services</strong>: Prioritize business-critical goals first, then determine whether in-house ownership or vendor solutions accelerate that mission, AI becomes the accelerant, not the decision driver<br><br></li></ul><div>If you enjoyed this episode, make sure to subscribe, rate, and review it on <a href="https://podcasts.apple.com/gb/podcast/the-data-engineering-show/id1561927688">Apple Podcasts</a>, <a href="https://open.spotify.com/show/6hMdnrFKlPbia2k6MkFs8U">Spotify</a>, and <a href="https://www.youtube.com/@thedataengineeringshow">YouTube Podcasts</a>. Instructions on how to do this are <a href="https://www.fame.so/follow-rate-review">here</a>.</div><div><strong><br><br>About the Guest(s)</strong></div><div><br><br>Ritesh is a Staff Software Engineer at Lyft, bringing six years of experience architecting and scaling the company's data platform. With a background spanning Microsoft's data and cloud infrastructure, including work on Hadoop, Azure, and SaaS products. Ritesh leads Lyft's critical data systems including Trino, Spark, and ClickHouse. In this episode, Ritesh shares insights on building scalable, AI-native data platforms that serve diverse organizational needs, from batch processing and analytics to real-time marketplace operations. His strategic approach to unifying complex data stacks while integrating AI-driven reliability and user experience improvements provides actionable guidance for data engineers and platform leaders navigating infrastructure modernization at scale.</div><div><strong><br><br>Quotes</strong></div><div><br><br>"The goal of our platform is to give our users access to the data as fast as possible so that they can drive the meaning from the data that they are getting and take better data driven decisions." - <strong>Ritesh</strong></div><div><br>"We are a Hive format shop. We are going to be moving to other open table formats in the future, but at this point, we are a hive table format." - <strong>Ritesh</strong></div><div><br>"Our main goal at this point is primarily understanding how we see the data platform running five years from now, three years from now, and how we are able to future proof it." - <strong>Ritesh</strong></div><div><br>"In this world of AI, we should not be falling behind in any way, and bringing AI in the right places within our platform." - <strong>Ritesh</strong></div><div><br>"We want to make our semantic layer ready for the AI native side of things so that our teams are able to drive the best meaning possible from the data that they see." - <strong>Ritesh</strong></div><div><br>"Big data systems are distributed systems by nature, and where AI can help you is very clearly understand how the patterns are changing and what is a good action to take." - <strong>Ritesh</strong></div><div><br>"Rather than thinking of this as an AI versus an open source thing, it's about a question of what work is the most business critical and how do you go 100% behind it." - <strong>Ritesh</strong></div><div><br>"Not everybody is working on AI initiatives at this point, but where it makes sense according to our business strategy, if it aligns with it, then obviously we go and invest." - <strong>Ritesh</strong></div><div><br>"If you are the one who's going to take on the initiative, probably spend a few hours outside of what you're already working on, and that is how you will discover AI and the tooling for it." - <strong>Ritesh</strong></div><div><br>"We are trying to consolidate into a single direction of providing different kinds of models so that you are easily able to integrate and focus on the value you want to provide to your customers." - <strong>Ritesh</strong></div><div><br></div><div><br><strong>Resources</strong></div><div><strong>&nbsp;<br><br>Connect on LinkedIn:<br></strong><br></div><ul><li>Ritesh Varyani - <a href="https://www.linkedin.com/in/riteshvaryani/">https://www.linkedin.com/in/riteshvaryani/</a></li><li>Benjamin Wagner - <a href="https://www.linkedin.com/in/wagjamin/">https://www.linkedin.com/in/wagjamin/</a></li><li>Eldad Farkash - <a href="https://www.linkedin.com/in/eldadfarkash/">https://www.linkedin.com/in/eldadfarkash/<br></a><br></li></ul><div><strong>Websites:</strong></div><ul><li>Lyft -&nbsp;<a href="https://www.lyft.com">https://www.lyft.com</a></li></ul><div><strong><br>Tools &amp; Platforms:</strong></div><ul><li><strong>Apache Spark</strong> – Batch processing engine for ML training jobs, large-scale data processing, and GDPR operations</li><li><strong>Trino</strong> – Query engine for BI dashboarding, ETL workflows, and SQL-based data access</li><li><strong>ClickHouse</strong> – Columnar database for sub-second query latency and real-time analytics</li><li><strong>Amazon S3</strong> – Data lake storage for parquet tables and offline data processing</li><li><strong>AWS EKS</strong> (Elastic Kubernetes Service) – Kubernetes infrastructure for hosting Spark and Trino</li><li><strong>ClickHouse Cloud</strong> – Managed ClickHouse offering used by Lyft</li><li><strong>Hive Table Format</strong> – Current table format for organizing parquet files in S3</li><li><strong>Kubernetes Operators</strong> – Infrastructure for managing ClickHouse deployments</li></ul><div><br></div><div>The Data Engineering Show is brought to you by <a href="https://www.firebolt.io/">firebolt.io</a> and handcrafted by our friends over at: <a href="https://www.fame.so/?utm_medium=podcast&amp;utm_source=bcast&amp;utm_campaign=masters-of-community-with-david-spinks?utm_medium=podcast&amp;utm_source=bcast&amp;utm_campaign=confessions-of-a-b2b-marketer">fame.so</a><br><br>Previous guests include: Joseph Machado of Linkedin, Metthew Weingarten of Disney, Joe Reis and Matt Housely, authors of <em>The Fundamentals of Data Engineering, </em>Zach Wilson of Eczachly Inc, Megan Lieu of Deepnote, Erik Heintare of Bolt, Lior Solomon of Vimeo, Krishna Naidu of Canva, Mike Cohen of Substack, Jens Larsson of Ark, Gunnar Tangring of Klarna, Yoav Shmaria of Similarweb and Xiaoxu Gao of Adyen.<br><br>Check out our three most downloaded episodes:</div><ul><li><a href="https://www.dataengineeringshow.com/e/6nrrrlzn">Zach Wilson on What Makes a Great Data Engineer</a></li><li><a href="https://www.dataengineeringshow.com/e/v8w441mn">Joe Reis and Matt Housley on The Fundamentals of Data Engineering</a></li><li><a href="https://www.dataengineeringshow.com/e/p8lxxy18">Bill Inmon, The Godfather of Data Warehousing</a></li></ul>]]></content:encoded>
      <pubDate>Tue, 16 Dec 2025 11:00:00 +0000</pubDate>
      <author>The Firebolt Data Bros</author>
      <enclosure url="https://media.fame.so/8x9yn37w.mp3" length="61848554" type="audio/mpeg"/>
      <itunes:author>The Firebolt Data Bros</itunes:author>
      <itunes:image href="https://content.fameapp.so/uploads/86qywn5q/caa7efb0-6c25-11ef-9489-d768d5a0a06d/caa7edf0-6c25-11ef-a69f-3da5bd01e0ea.jpg"/>
      <itunes:duration>1546</itunes:duration>
      <itunes:summary>What if your data platform could serve AI-native workloads while scaling reliably across your entire organization? In this episode, Benjamin sits down with Ritesh, Staff Engineer at Lyft, to explore how to build a unified data stack with Spark, Trino, and ClickHouse, why AI is reshaping infrastructure decisions, and the strategies powering one of the industry's most sophisticated data platforms. Whether you're architecting data systems at scale or integrating AI into your analytics workflow, this conversation delivers actionable insights into reliability, modernization, and the future of data engineering. Tune in to discover how Lyft is balancing open-source investments with cutting-edge AI capabilities to unlock better insights from data.</itunes:summary>
      <itunes:subtitle>What if your data platform could serve AI-native workloads while scaling reliably across your entire organization? In this episode, Benjamin sits down with Ritesh, Staff Engineer at Lyft, to explore how to build a unified data stack with Spark, Trino, and ClickHouse, why AI is reshaping infrastructure decisions, and the strategies powering one of the industry's most sophisticated data platforms. Whether you're architecting data systems at scale or integrating AI into your analytics workflow, this conversation delivers actionable insights into reliability, modernization, and the future of data engineering. Tune in to discover how Lyft is balancing open-source investments with cutting-edge AI capabilities to unlock better insights from data.</itunes:subtitle>
      <itunes:keywords>Lyft, Ritesh Varyani, Data Platform, AI, Apache Spark, Trino, ClickHouse, Data Engineering, Semantic Layer, Kubernetes, AWS EKS, Future-Proofing, Reliability, Open Source, Managed Services</itunes:keywords>
      <itunes:explicit>No</itunes:explicit>
      <googleplay:explicit>No</googleplay:explicit>
    </item>
    <item>
      <title>60 Billion Predictions Daily: Inside Credit Karma’s Agentic Data Layer with Maddie Daianu</title>
      <link>https://podcasts.fame.so/e/286qwwyn-60-billion-predictions-daily-inside-credit-karmas-agentic-data-layer</link>
      <itunes:title>60 Billion Predictions Daily: Inside Credit Karma’s Agentic Data Layer with Maddie Daianu</itunes:title>
      <itunes:episode>48</itunes:episode>
      <itunes:block>No</itunes:block>
      <googleplay:block>No</googleplay:block>
      <guid isPermaLink="false">j12rjjw1</guid>
      <description>What does MLOps look like when you are deploying  60 billion machine learning predictions a day? Maddie Daianu, Head of Data and AI at Intuit Credit Karma, joins the Data Bros to pull back the curtain on one of the most high-volume data environments in FinTech. With a 100-person team serving 140 million members, standard data practices break down. Maddie shares how her team manages terabytes of daily data on Google Cloud and explains the massive strategic pivot they are undertaking right now: The move from "Information" to "Agency."</description>
      <content:encoded><![CDATA[<div>What does MLOps look like when you are deploying<strong> 60 billion machine learning predictions a day</strong>?<br><br></div><div><a href="https://www.linkedin.com/in/maddie-daianu/">Maddie Daianu</a>, Head of Data and AI at Intuit Credit Karma, joins the Data Bros to pull back the curtain on one of the most high-volume data environments in FinTech. With a 100-person team serving 140 million members, standard data practices break down.<br><br></div><div>Maddie shares how her team manages terabytes of daily data on Google Cloud and explains the massive strategic pivot they are undertaking right now: The move from "Information" to "Agency."<br><br></div><div><strong>What You'll Learn:<br></strong><br></div><ul><li><strong>Extreme Scale:</strong> How to architect a system that handles 80 billion daily predictions without latency.</li><li><strong>The Unified Consumer Profile:</strong> The hackathon project that unlocked real-time personalization across Credit Karma and TurboTax.</li><li><strong>The "Done-For-You" Future:</strong> Why they are building an "Agentic Data Layer" to move from recommending financial products to actively managing them for the user.<br><br></li></ul><div>If you want to know what the future of high-scale AI infrastructure looks like, this is the blueprint.<br><br></div><div>If you enjoyed this episode, make sure to subscribe, rate, and review it on <a href="https://podcasts.apple.com/gb/podcast/the-data-engineering-show/id1561927688">Apple Podcasts</a>, <a href="https://open.spotify.com/show/6hMdnrFKlPbia2k6MkFs8U">Spotify</a>, and <a href="https://www.youtube.com/@thedataengineeringshow">YouTube Podcasts</a>. Instructions on how to do this are <a href="https://www.fame.so/follow-rate-review">here</a>.</div><div><strong><br>About the Guest(s)</strong></div><div><br>Maddie Daianu is the Head of Data and AI at Intuit Credit Karma, where she leads the teams responsible for AI science, machine learning engineering, data engineering, and the experimentation platform. She brings a background that spans academic research in biomedical engineering and machine learning, and experience at both smaller companies and Meta. Her current focus is on building the data and AI infrastructure that drives highly personalized financial experiences for Credit Karma's 140 million members and contributes to Intuit's broader consumer ecosystem.</div><div><strong><br>Quotes</strong></div><div><br>"The key elements and ingredients of making this app successful is data and AI." - Maddie</div><div><br>"We have and we process and transform multiple terabytes of information daily for our 140,000,000 members every single day." - Maddie</div><div><br>"We have our models that essentially,&nbsp; lead to almost 60,000,000,000 daily predictions for our 140,000,000 member base every single day." - Maddie</div><div><br>"We want to take this to the next level. So Intuit as a whole believes... in creating done for you experiences for our users." - Maddie</div><div><br>"If you don't structure your data in a semantically, well structured way, you are not likely able to provide the most highly relevant and personalized experiences for users." - Maddie</div><div><br>"One thing that we've been building,&nbsp; in the last year or so it's called the unified consumer profile." - Maddie</div><div><br>"Intuit has been investing in tremendously over the last, few years... the generative AI operating system... to move fast and continuously disrupt ourselves, especially in the age of AI." - Maddie</div><div><br></div><div><strong>Resources</strong></div><div><strong>&nbsp;</strong></div><div><strong>Connect on LinkedIn:<br></strong><br></div><ul><li><a href="https://www.linkedin.com/in/maddie-daianu/">Maddie Daianu</a></li><li><a href="https://www.linkedin.com/in/wagjamin/">Benjamin Wagner</a>&nbsp;</li><li><a href="https://www.linkedin.com/in/eldadfarkash/">Eldad Farkash</a>&nbsp;</li></ul><div><br></div><div><strong>Websites:<br></strong><br></div><ul><li><a href="https://www.creditkarma.com/">Credit Karma</a></li></ul><div><br></div><div><strong>Tools &amp; Platforms:<br></strong><br></div><ul><li>BigQuery – Data warehouse for processing multiple terabytes of information daily</li><li>Bigtable – Operational serving layer for real-time data access</li><li>Vertex AI – Machine learning platform for model training and deployment</li><li>Alchemy – Feature online feature store for real-time transformations and aggregations</li><li>Generative AI Operating System – Centralized platform for democratizing Gen AI adoption across Intuit products</li></ul><div><br></div><div><strong>Products &amp; Services Mentioned:<br></strong><br></div><ul><li><a href="https://turbotax.intuit.com/">TurboTax</a> – Tax preparation and filing software</li><li>Debt Agent – AI-powered tool for debt consolidation and management assistance</li><li>Unified Consumer Profile – Semantic graph depicting financial journey across Credit Karma and TurboTax</li></ul><div><br></div><div>The Data Engineering Show is brought to you by <a href="https://www.firebolt.io/">firebolt.io</a> and handcrafted by our friends over at: <a href="https://www.fame.so/?utm_medium=podcast&amp;utm_source=bcast&amp;utm_campaign=masters-of-community-with-david-spinks?utm_medium=podcast&amp;utm_source=bcast&amp;utm_campaign=confessions-of-a-b2b-marketer">fame.so</a><br><br>Previous guests include: Joseph Machado of Linkedin, Metthew Weingarten of Disney, Joe Reis and Matt Housely, authors of <em>The Fundamentals of Data Engineering, </em>Zach Wilson of Eczachly Inc, Megan Lieu of Deepnote, Erik Heintare of Bolt, Lior Solomon of Vimeo, Krishna Naidu of Canva, Mike Cohen of Substack, Jens Larsson of Ark, Gunnar Tangring of Klarna, Yoav Shmaria of Similarweb and Xiaoxu Gao of Adyen.<br><br>Check out our three most downloaded episodes:</div><ul><li><a href="https://www.dataengineeringshow.com/e/6nrrrlzn">Zach Wilson on What Makes a Great Data Engineer</a></li><li><a href="https://www.dataengineeringshow.com/e/v8w441mn">Joe Reis and Matt Housley on The Fundamentals of Data Engineering</a></li><li><a href="https://www.dataengineeringshow.com/e/p8lxxy18">Bill Inmon, The Godfather of Data Warehousing</a></li></ul>]]></content:encoded>
      <pubDate>Wed, 19 Nov 2025 11:00:00 +0000</pubDate>
      <author>The Firebolt Data Bros</author>
      <enclosure url="https://media.fame.so/8qym2lq8.mp3" length="15777431" type="audio/mpeg"/>
      <itunes:author>The Firebolt Data Bros</itunes:author>
      <itunes:image href="https://content.fameapp.so/uploads/86qywn5q/caa7efb0-6c25-11ef-9489-d768d5a0a06d/caa7edf0-6c25-11ef-a69f-3da5bd01e0ea.jpg"/>
      <itunes:duration>1195</itunes:duration>
      <itunes:summary>What does MLOps look like when you are deploying  60 billion machine learning predictions a day? Maddie Daianu, Head of Data and AI at Intuit Credit Karma, joins the Data Bros to pull back the curtain on one of the most high-volume data environments in FinTech. With a 100-person team serving 140 million members, standard data practices break down. Maddie shares how her team manages terabytes of daily data on Google Cloud and explains the massive strategic pivot they are undertaking right now: The move from "Information" to "Agency."</itunes:summary>
      <itunes:subtitle>What does MLOps look like when you are deploying  60 billion machine learning predictions a day? Maddie Daianu, Head of Data and AI at Intuit Credit Karma, joins the Data Bros to pull back the curtain on one of the most high-volume data environments in FinTech. With a 100-person team serving 140 million members, standard data practices break down. Maddie shares how her team manages terabytes of daily data on Google Cloud and explains the massive strategic pivot they are undertaking right now: The move from "Information" to "Agency."</itunes:subtitle>
      <itunes:keywords>Maddie Daianu, Benjamin Wagner, Eldad Farkash, Intuit Credit Karma, Intuit, BigQuery, Bigtable, Data Engineering, AI, Machine Learning, Generative AI, GenAI, Consumer Ecosystem, Financial Services, Data Stack, Personalization, Unified Consumer Profile, Feature Store, Vertex AI</itunes:keywords>
      <itunes:explicit>No</itunes:explicit>
      <googleplay:explicit>No</googleplay:explicit>
    </item>
    <item>
      <title>Block Bad Data Before the Write with Nike’s Ashok Singamaneni</title>
      <link>https://podcasts.fame.so/e/xnvlwy4n-block-bad-data-before-the-write-with-nike-s-ashok-singamaneni</link>
      <itunes:title>Block Bad Data Before the Write with Nike’s Ashok Singamaneni</itunes:title>
      <itunes:episode>47</itunes:episode>
      <itunes:block>No</itunes:block>
      <googleplay:block>No</googleplay:block>
      <guid isPermaLink="false">71ypv4j0</guid>
      <description>Nike’s Principal Data Engineer Ashok Singamaneni joins Benjamin and Eldad to discuss his open-source data quality framework, Spark Expectations. Ashok explains how the tool, which was inspired by Databricks DLT Expectations, shifts data quality checks to before the data is written to a final table. This proactive approach uses row-level, aggregation-level, and query data quality checks to fail jobs, drop bad records, or alert teams - ultimately saving huge costs on recompute and engineering effort in mission-critical data pipelines.</description>
      <content:encoded><![CDATA[<div>In this episode of The Data Engineering Show, <a href="https://www.linkedin.com/in/wagjamin/">Benjamin</a> and <a href="https://www.linkedin.com/in/eldadfarkash/">Eldad</a> are joined by <a href="https://www.linkedin.com/in/ashok-singamaneni-193b1a32/">Ashok Singamaneni</a>, a Principal Data Engineer at Nike. Ashok dives deep into his work on the open-source projects BrickFlow and Spark Expectations. He shares his journey from mechanical engineering to data engineering and the lessons learned over a decade of tackling production data quality issues that lead to costly recomputes.</div><div><br></div><div>Ashok explains the philosophy behind Spark Expectations: treating the ingestion and transformation layers of a data pipeline (Bronze/Silver) as a software product rather than just a data engineering product. This means implementing rigorous checks like data quality, unit testing, and integration testing before the data is written to the final layer. He details the implementation using a Python decorator pattern within Spark jobs, allowing engineers to define rules that check for everything from basic column validation to complex referential integrity and aggregation consistency. The discussion also covers the trade-offs of using generative AI tools like Cursor for data engineering and the growing industry trend of prioritizing upfront data quality due to the rise of AI-powered analytics and direct leadership access to data.</div><div><br><strong>What You'll Learn:<br></strong><br></div><ul><li>Why the ingestion and transformation layers (Bronze/Silver) of a data pipeline should be treated as a software product with rigorous testing.</li><li>How Spark Expectations moves data quality checks to before data is written to the final tables to prevent mission-critical failures and recomputes.</li><li>The three types of checks in Spark Expectations: row-level, aggregation-level, and query DQ (for referential integrity).</li><li>How the tool handles failures with options to ignore, drop the record, or fail the entire job.</li><li>Why big data quality is becoming a prime focus across the industry due to AI integrations and direct executive-level access to data.</li><li>Ashok’s lessons on using Generative AI tools (like Cursor/Cloud Code) in data engineering projects and the necessity of restrictive permissions.<br><br></li></ul><div>If you enjoyed this episode, make sure to subscribe, rate, and review it on <a href="https://podcasts.apple.com/gb/podcast/the-data-engineering-show/id1561927688">Apple Podcasts</a>, <a href="https://open.spotify.com/show/6hMdnrFKlPbia2k6MkFs8U">Spotify</a>, and <a href="https://www.youtube.com/@thedataengineeringshow">YouTube Podcasts</a>. Instructions on how to do this are <a href="https://www.fame.so/follow-rate-review">here</a>.</div><div><strong><br>About the Guest(s)</strong></div><div><br>Ashok Singamaneni is a Principal Data Engineer at Nike, with over twelve years of experience in the data space across the banking, healthcare, and retail domains. He is the creator of the popular open-source frameworks Spark Expectations and BrickFlow, which focus on improving data quality and pipeline reliability. Ashok advocates for treating data ingestion and transformation as a software product, ensuring checks and balances are in place early in the pipeline. He holds a background in mechanical engineering.</div><div><strong><br>Quotes</strong></div><div><br>"DLT expectations gave an idea to the industry that you can do data quality before actually writing the data into your final tables." - Ashok</div><div><br>"I think over the time, in my experience, what I learned is this ingestion layer and the transformation layer, you should treat that as a software product, not like a data engineering product." - Ashok</div><div><br>"If it's mission critical, then you fail the job, not process the data, and don't put that data into the final table so that you don't need to recompute that again." - Ashok</div><div><br>"As the scale of the product increases, it becomes even more difficult for us to find exactly where the issue went wrong... it takes time for you to debug and see, like, lot of human effort also involved." - Ashok</div><div><br>"Data observability and quality is becoming prime because of AI integrations that are happening." - Ashok</div><div><br>"Ultimately, at the end of the day, you are responsible when you're checking in the code. It's not Claude or Karsar that will be blamed if something goes wrong." - Ashok</div><div><br>"The leadership is directly looking at the data and if there is something wrong in the data, then there can be some serious repercussions happening on the business decisions." - Ashok</div><div><br>"Rather than having bad data in the tables and then recomputing or reclarifying things, let's not put that data first in the first place." - Ashok</div><div><br>"You can drop the record and put that in an error table and give that alert to the engineering team that there is some error in the error table you can look at." - Ashok</div><div><br>"The road eq checks that happens are very fast. It should happen as a pretty standard checks that happens on the scale." - Ashok</div><div><strong><br>Resources</strong></div><div><strong><br>Projects:<br></strong><br></div><ul><li>Spark Expectations - Data quality framework</li><li>BrickFlow - Open source project for data pipelines<br><br></li></ul><div><strong>Tools &amp; Technologies:<br></strong><br></div><ul><li>Apache Spark</li><li>Databricks DLT (Delta Live Tables)</li><li>Great Expectations - Post-processing data quality tool</li><li>Cursor / Cloud Code - Generative AI coding tools</li><li>SQLMesh<br><br></li></ul><div><strong>For Feedback &amp; Discussions on Firebolt Core:<br></strong><br></div><ul><li><a href="https://discord.com/invite/UpMPDHActM?utm_medium=podcast&amp;utm_source=famehost&amp;utm_campaign=the-data-engineering-show">Join Firebolt Discord Community</a></li><li><a href="https://github.com/firebolt-db/firebolt-core/discussions?utm_medium=podcast&amp;utm_source=famehost&amp;utm_campaign=the-data-engineering-show">Join Firebolt GitHub Discussions</a></li><li><a href="https://github.com/firebolt-db/firebolt-core?utm_medium=podcast&amp;utm_source=famehost&amp;utm_campaign=the-data-engineering-show">Firebolt Core Github Repository</a>&nbsp;</li><li><a href="mailto:Benjamin@Firebolt.io">Benjamin@Firebolt.io</a></li><li><a href="mailto:Eldad@Firebolt.io">Eldad@Firebolt.io</a></li></ul><div><br></div><div><strong>&nbsp;Primary Speakers:<br></strong><br></div><ul><li><a href="https://www.linkedin.com/in/ashok-singamaneni-193b1a32/">Ashok Singamaneni</a>&nbsp;</li><li><a href="https://www.linkedin.com/in/wagjamin/">Benjamin Wagner</a>&nbsp;</li><li><a href="https://www.linkedin.com/in/eldadfarkash/">Eldad Farkash</a>&nbsp; <a href="https://www.linkedin.com/in/eldadfarkash/"><br></a><br></li></ul><div>The Data Engineering Show is brought to you by <a href="https://www.firebolt.io/">firebolt.io</a> and handcrafted by our friends over at: <a href="https://www.fame.so/?utm_medium=podcast&amp;utm_source=bcast&amp;utm_campaign=masters-of-community-with-david-spinks?utm_medium=podcast&amp;utm_source=bcast&amp;utm_campaign=confessions-of-a-b2b-marketer">fame.so</a><br><br>Previous guests include: Joseph Machado of Linkedin, Metthew Weingarten of Disney, Joe Reis and Matt Housely, authors of <em>The Fundamentals of Data Engineering, </em>Zach Wilson of Eczachly Inc, Megan Lieu of Deepnote, Erik Heintare of Bolt, Lior Solomon of Vimeo, Krishna Naidu of Canva, Mike Cohen of Substack, Jens Larsson of Ark, Gunnar Tangring of Klarna, Yoav Shmaria of Similarweb and Xiaoxu Gao of Adyen.<br><br>Check out our three most downloaded episodes:</div><ul><li><a href="https://www.dataengineeringshow.com/e/6nrrrlzn">Zach Wilson on What Makes a Great Data Engineer</a></li><li><a href="https://www.dataengineeringshow.com/e/v8w441mn">Joe Reis and Matt Housley on The Fundamentals of Data Engineering</a></li><li><a href="https://www.dataengineeringshow.com/e/p8lxxy18">Bill Inmon, The Godfather of Data Warehousing</a></li></ul>]]></content:encoded>
      <pubDate>Tue, 07 Oct 2025 11:00:00 +0000</pubDate>
      <author>The Firebolt Data Bros</author>
      <enclosure url="https://media.fame.so/8yqj6yl8.mp3" length="48817631" type="audio/mpeg"/>
      <itunes:author>The Firebolt Data Bros</itunes:author>
      <itunes:image href="https://content.fameapp.so/uploads/86qywn5q/caa7efb0-6c25-11ef-9489-d768d5a0a06d/caa7edf0-6c25-11ef-a69f-3da5bd01e0ea.jpg"/>
      <itunes:duration>1220</itunes:duration>
      <itunes:summary>Nike’s Principal Data Engineer Ashok Singamaneni joins Benjamin and Eldad to discuss his open-source data quality framework, Spark Expectations. Ashok explains how the tool, which was inspired by Databricks DLT Expectations, shifts data quality checks to before the data is written to a final table. This proactive approach uses row-level, aggregation-level, and query data quality checks to fail jobs, drop bad records, or alert teams - ultimately saving huge costs on recompute and engineering effort in mission-critical data pipelines.</itunes:summary>
      <itunes:subtitle>Nike’s Principal Data Engineer Ashok Singamaneni joins Benjamin and Eldad to discuss his open-source data quality framework, Spark Expectations. Ashok explains how the tool, which was inspired by Databricks DLT Expectations, shifts data quality checks to before the data is written to a final table. This proactive approach uses row-level, aggregation-level, and query data quality checks to fail jobs, drop bad records, or alert teams - ultimately saving huge costs on recompute and engineering effort in mission-critical data pipelines.</itunes:subtitle>
      <itunes:keywords>Ashok Singamaneni, Nike, Spark Expectations, BrickFlow, data quality, data engineering, Apache Spark, Databricks, open source, ELT, data governance, data observability, Gen AI, software engineering, medallion architecture, data pipeline, big data</itunes:keywords>
      <itunes:explicit>No</itunes:explicit>
      <googleplay:explicit>No</googleplay:explicit>
    </item>
    <item>
      <title>Postgres vs. Elasticsearch: The Unexpected Winner in High-Stakes Search for Instacart with Ankit Mittal</title>
      <link>https://podcasts.fame.so/e/l8qw9wj8-postgres-vs-elasticsearch-the-unexpected-winner-in-high-stakes-search-for-instacart-with-ankit-mittal</link>
      <itunes:title>Postgres vs. Elasticsearch: The Unexpected Winner in High-Stakes Search for Instacart with Ankit Mittal</itunes:title>
      <itunes:episode>46</itunes:episode>
      <itunes:block>No</itunes:block>
      <googleplay:block>No</googleplay:block>
      <guid isPermaLink="false">80nvlvz0</guid>
      <description>Modernizing Search Infrastructure: How Instacart Transitioned from Elasticsearch to PostgreSQL for Enhanced Performance and Simplicity. In this episode of The Data Engineering Show, host Benjamin Wagner speaks with Ankit Mittal, former senior engineer at Instacart, about the company's innovative approach to modernizing their search infrastructure by transitioning from Elasticsearch to PostgreSQL for single-retailer search functionality.</description>
      <content:encoded><![CDATA[<div>In this episode of The Data Engineering Show, <a href="https://www.linkedin.com/in/wagjamin/">Benjamin Wagner</a> sits down with <a href="https://www.linkedin.com/in/ankitml/">Ankit Mittal</a>, former Senior Engineer at Instacart, to explore how they revolutionized their search infrastructure by transitioning from Elasticsearch to PostgreSQL. Learn how Instacart tackled the unique challenges of fast-moving grocery inventory, achieved high-performance search capabilities, and leveraged PostgreSQL extensions for complex retrieval operations. Whether you're scaling search functionality or optimizing database performance, this deep dive offers valuable insights into building robust, production-ready search systems using PostgreSQL.</div><div><br></div><ul><li>Discover why Instacart moved from Elasticsearch to PostgreSQL for retailer search</li><li>Learn about handling real-time inventory updates and search optimization</li><li>Explore PostgreSQL extensions, sharding strategies, and data flow architecture</li><li>Understand the trade-offs between different search infrastructure approaches</li></ul><div><br></div><div><strong>What You'll Learn:<br></strong><br></div><ul><li>How Instacart managed fast-moving grocery inventory data by consolidating search, ranking, and filtering into a single PostgreSQL cluster</li><li>Why pushing compute closer to the data layer can significantly improve search performance and reduce network calls</li><li>The architecture decisions behind using PostgreSQL extensions like PG Vector and custom solutions for search functionality</li><li>How to implement efficient data ingestion through S3-based pipelines and bulk writes instead of real-time updates</li><li>Why table maintenance operations like PGD pack are crucial for optimizing read throughput in production environments</li><li>The trade-offs between traditional search engines and relational databases for complex search implementations</li><li>&nbsp;The challenges of maintaining self-hosted PostgreSQL in a predominantly cloud-managed environment<br><br></li></ul><div>If you enjoyed this episode, make sure to subscribe, rate, and review it on <a href="https://podcasts.apple.com/gb/podcast/the-data-engineering-show/id1561927688">Apple Podcasts</a>, <a href="https://open.spotify.com/show/6hMdnrFKlPbia2k6MkFs8U">Spotify</a>, and <a href="https://www.youtube.com/@thedataengineeringshow">YouTube Podcasts</a>. Instructions on how to do this are <a href="https://www.fame.so/follow-rate-review">here</a>.</div><div><strong><br>About the Guest(s)</strong></div><div><br>Ankit is a Software Engineer at ParadeDB and former Senior Engineer at Instacart, where he specialized in PostgreSQL infrastructure and search systems. With extensive experience in database optimization and search architecture, he played a key role in modernizing Instacart's search infrastructure by transitioning from Elasticsearch to a custom PostgreSQL solution. In this episode, Ankit shares deep insights into building and scaling high-performance search systems for e-commerce, particularly focusing on the unique challenges of grocery retail's fast-moving inventory. His work at Instacart revolutionized their single-retailer search functionality, demonstrating how traditional relational databases can be adapted for complex search operations. His expertise in database systems and their practical applications in high-scale environments makes this conversation particularly valuable for engineers interested in modern search architecture and database optimization.</div><div><strong><br>Quotes</strong></div><div><br>"Think about it. If there's a lot of things that you can get the database to do, then the applications become simpler." - Ankit</div><div><br>"My non-Instacart experience has largely been in pre-PMF startups where the approach of abuse your database to its absolute limits works wonders." - Ankit</div><div><br>"Almost everything that we got retrieved had to be filtered out. So we go back to Elasticsearch again." - Ankit</div><div><br></div><div><br>"We traded off the quality of retrieval, hardcore core retrieval, with the whole system reducing the network calls." - Ankit</div><div><br>"It's a place to go to find what item is available, in what store, what item is available, at what price, including full product taxonomy graph and product and ontology." - Ankit</div><div><br>"The grand theme here is that we wanted more control over the cluster, how to spin it off, what kind of disks it would have." - Ankit</div><div><br>"We tell teams who want to have their data in this cluster, create an s3 home, create either a bucket or a home, whatever they want to do, and tell us that we would sync ourselves." - Ankit</div><div><br>"What we found is that the read throughput, we can throw more data if the tables are repacked nicely." - Ankit</div><div><br>"Most engineers who want to work on search, they are more used to the Elasticsearch shape of the query." - Ankit</div><div><br>"The relevance is better because they could join more things in the database. They also saw the cost of the normalized data reduced." - Ankit</div><div><strong><br>Resources</strong></div><div><br>Company Websites:</div><div><br>- Instacart - Grocery delivery platform</div><div><br>- ParadeDB - Database technology company</div><div><br>- <a href="https://www.firebolt.io/">Firebolt </a>- Cloud data warehouse (<a href="https://www.firebolt.io/">firebolt.io</a>)</div><div><br></div><div><strong>Tools &amp; Technologies:<br></strong><br></div><div>- PostgreSQL - Database system<br><br></div><div>- Elasticsearch - Search engine<br><br></div><div>- PG Cat/PG Dog - PostgreSQL proxy tools<br><br></div><div>- PG Vector - PostgreSQL vector extension<br><br></div><div>- PG Repack - PostgreSQL table repacking tool<br><br></div><div>- ClickHouse - Column-oriented DBMS<br><br></div><div>- TantiVy - Rust-based search engine library<br><br></div><div><strong>Articles:<br></strong><br></div><div>- Instacart Search Modernization Blog Posts (Series on hybrid retrieval)<br><br></div><div>- Target's AlloyDB Migration Blog Post<br><br></div><div><br></div><div><strong>For Feedback &amp; Discussions on Firebolt Core:<br></strong><br></div><ul><li><a href="https://discord.com/invite/UpMPDHActM?utm_medium=podcast&amp;utm_source=famehost&amp;utm_campaign=the-data-engineering-show">Join Firebolt Discord Community</a></li><li><a href="https://github.com/firebolt-db/firebolt-core/discussions?utm_medium=podcast&amp;utm_source=famehost&amp;utm_campaign=the-data-engineering-show">Join Firebolt GitHub Discussions</a></li><li><a href="https://github.com/firebolt-db/firebolt-core?utm_medium=podcast&amp;utm_source=famehost&amp;utm_campaign=the-data-engineering-show">Firebolt Core Github Repository</a>&nbsp;</li><li>Benjamin@Firebolt.io</li></ul><div><br></div><div><strong>&nbsp;Primary Speakers:<br></strong><br></div><ul><li><a href="https://www.linkedin.com/in/ankitml/">Ankit Mittal </a>&nbsp;</li><li><a href="https://www.linkedin.com/in/wagjamin/">Benjamin Wagner</a>&nbsp;<br><br></li></ul><div>The Data Engineering Show is brought to you by <a href="https://www.firebolt.io/">firebolt.io</a> and handcrafted by our friends over at: <a href="https://www.fame.so/?utm_medium=podcast&amp;utm_source=bcast&amp;utm_campaign=masters-of-community-with-david-spinks?utm_medium=podcast&amp;utm_source=bcast&amp;utm_campaign=confessions-of-a-b2b-marketer">fame.so</a><br><br>Previous guests include: Joseph Machado of Linkedin, Metthew Weingarten of Disney, Joe Reis and Matt Housely, authors of <em>The Fundamentals of Data Engineering, </em>Zach Wilson of Eczachly Inc, Megan Lieu of Deepnote, Erik Heintare of Bolt, Lior Solomon of Vimeo, Krishna Naidu of Canva, Mike Cohen of Substack, Jens Larsson of Ark, Gunnar Tangring of Klarna, Yoav Shmaria of Similarweb and Xiaoxu Gao of Adyen.<br><br>Check out our three most downloaded episodes:</div><ul><li><a href="https://www.dataengineeringshow.com/e/6nrrrlzn">Zach Wilson on What Makes a Great Data Engineer</a></li><li><a href="https://www.dataengineeringshow.com/e/v8w441mn">Joe Reis and Matt Housley on The Fundamentals of Data Engineering</a></li><li><a href="https://www.dataengineeringshow.com/e/p8lxxy18">Bill Inmon, The Godfather of Data Warehousing</a></li></ul>]]></content:encoded>
      <pubDate>Wed, 17 Sep 2025 16:32:00 +0000</pubDate>
      <author>The Firebolt Data Bros</author>
      <enclosure url="https://media.fame.so/8j0mk048.mp3" length="51954415" type="audio/mpeg"/>
      <itunes:author>The Firebolt Data Bros</itunes:author>
      <itunes:image href="https://content.fameapp.so/uploads/86qywn5q/caa7efb0-6c25-11ef-9489-d768d5a0a06d/caa7edf0-6c25-11ef-a69f-3da5bd01e0ea.jpg"/>
      <itunes:duration>1298</itunes:duration>
      <itunes:summary>Modernizing Search Infrastructure: How Instacart Transitioned from Elasticsearch to PostgreSQL for Enhanced Performance and Simplicity. In this episode of The Data Engineering Show, host Benjamin Wagner speaks with Ankit Mittal, former senior engineer at Instacart, about the company's innovative approach to modernizing their search infrastructure by transitioning from Elasticsearch to PostgreSQL for single-retailer search functionality.</itunes:summary>
      <itunes:subtitle>Modernizing Search Infrastructure: How Instacart Transitioned from Elasticsearch to PostgreSQL for Enhanced Performance and Simplicity. In this episode of The Data Engineering Show, host Benjamin Wagner speaks with Ankit Mittal, former senior engineer at Instacart, about the company's innovative approach to modernizing their search infrastructure by transitioning from Elasticsearch to PostgreSQL for single-retailer search functionality.</itunes:subtitle>
      <itunes:keywords>Benjamin Wagner, Instacart, Postgres, Elasticsearch, Ankit Mittal, data engineering, search, hybrid retrieval, databases, ParadeDB, software engineering, data stack, dev productivity, scalability, search indexing, modernization</itunes:keywords>
      <itunes:explicit>No</itunes:explicit>
      <googleplay:explicit>No</googleplay:explicit>
    </item>
    <item>
      <title>Is Self-Service BI a False Promise? Lei Tang of Fabi.ai Thinks So</title>
      <link>https://podcasts.fame.so/e/m84lw638-is-self-service-bi-a-false-promise-lei-tang-of-fabi-ai-thinks-so</link>
      <itunes:title>Is Self-Service BI a False Promise? Lei Tang of Fabi.ai Thinks So</itunes:title>
      <itunes:episode>45</itunes:episode>
      <itunes:block>No</itunes:block>
      <googleplay:block>No</googleplay:block>
      <guid isPermaLink="false">x16rvk81</guid>
      <description>AI is reshaping business intelligence by enabling true self-service analytics and transforming how organizations interact with their data through natural language processing. In this episode of The Data Engineering Show, host Benjamin interviews Lei, Co-founder and CTO of Fabi.ai, to explore how AI-native BI platforms are reshaping data analytics and empowering non-technical users to derive meaningful insights from complex datasets.</description>
      <content:encoded><![CDATA[<div>Explore the future of AI-powered business intelligence with <a href="https://www.linkedin.com/in/lei-tang-ai/">Lei Tang</a>, CTO and Co-founder of <a href="https://www.fabi.ai/">Fabi.ai</a>, as he discusses the evolution from traditional self-service BI to "Vibe-analytics." Learn how AI is transforming data accessibility, enabling anyone to perform sophisticated analytics without deep technical expertise. From building trust in AI-generated insights to creating intelligent semantic layers, discover how modern BI platforms are bridging the gap between data teams and business stakeholders. Tune in to understand why static dashboards are becoming obsolete and how AI agents will soon proactively surface business opportunities and insights.</div><div><br></div><div><strong>Key points:</strong></div><div><br></div><ul><li>The limitations of traditional self-service BI and how AI is addressing them</li><li>Building secure, context-aware AI systems for data analysis</li><li>The future of human-AI interaction in business intelligence</li><li>Technical insights into modern BI platform architecture</li><li>Vision for proactive, AI-driven business insights</li></ul><div><br></div><div><strong>What You'll Learn:<br></strong><br></div><ul><li>Why traditional self-service BI has failed to deliver on its promises and how AI can bridge the gap</li><li>How to build an AI-native BI platform that combines SQL, Python, and natural language processing</li><li>The framework for implementing "Vibe-analytics" - a new paradigm of AI-powered visual analytics</li><li>Why context engineering and semantic understanding are crucial for accurate AI-driven analysis</li><li>How to balance security and accessibility when deploying AI-powered analytics tools</li><li>The future of BI platforms as proactive insight generators rather than passive dashboards</li><li>Why caching and stateful environments are essential for responsive AI-powered analytics</li><li>How to leverage AI to translate business questions into accurate technical queries while maintaining data integrity</li></ul><div><br></div><div><strong>About the Guest(s)</strong></div><div><br></div><div>Lei is the Co-founder and CTO of Fabi.ai, where he leads the development of AI-native business intelligence solutions. With a PhD in machine learning and over a decade of experience in the data domain, Lei has held significant roles, including positions at Yahoo, Walmart, Lyft (as Director of Data Science), and Clari (as Chief Data Scientist). His expertise spans machine learning, data engineering, and business analytics, with a particular focus on making data analysis more accessible and efficient. In this episode, Lei shares insights on the evolution of self-service BI and how AI is transforming business intelligence, drawing from his experience building Fabi.ai, a platform that combines SQL, Python, and AI to democratize data analysis. His work in developing "Vibe AI" (AI-powered BI) represents a significant advancement in making complex data analysis accessible to non-technical users while maintaining data accuracy and trust.<br><br>If you enjoyed this episode, make sure to subscribe, rate, and review it on <a href="https://podcasts.apple.com/gb/podcast/the-data-engineering-show/id1561927688">Apple Podcasts</a>, <a href="https://open.spotify.com/show/6hMdnrFKlPbia2k6MkFs8U">Spotify</a>, and <a href="https://www.youtube.com/@thedataengineeringshow">YouTube Podcasts</a>. Instructions on how to do this are <a href="https://www.fame.so/follow-rate-review">here</a>.<br><br></div><div><strong>Quotes</strong></div><div><br>"For the past decade, it's really difficult to make sure the self-service BI can work. And then now with AI, the worst part is that it can run properly, but the numbers are wrong." - Lei</div><div><br>"If you talk to anybody working in the BI space, like self-service BI, that has been termed for maybe for the past decade. But I have to say that is a false promise." - Lei</div><div><br>"We're saying that we really want those data team to be able to, like, say, what type of data is exposed to, like, say, less technical folks." - Lei</div><div><br>"In order to build AI native BI, I would say the focus should be how human interact with AI." - Lei</div><div><br>"We believe that, essentially, this BI system or, like, AI BI system would be more like a agent, and then it'll actually looking for, like, business opportunities and insight and surface to you." - Lei</div><div><br>"The one common theme I have been experiencing is that normally would work with other business stakeholders, could be marketing, could be operations, could be sales." - Lei</div><div><br>"We strongly believe that BI should be stored as code." - Lei</div><div><br>"Enterprise data tends to be very noisy, very complex." - Lei</div><div><br>"The semantics of itself becomes part of the context for the AI engine." - Lei</div><div><br>"Most organizations, the data, like the schema, the kind of business, like metrics and logic, has been constantly evolving." - Lei</div><div><strong><br>Resources<br></strong><br></div><ul><li><a href="https://www.fabi.ai/">Fabi.ai</a> - AI-native BI platform</li><li><a href="https://www.firebolt.io/">Firebolt</a> (<a href="https://www.firebolt.io/">firebolt.io</a>) - Cloud data warehouse platform<br><br></li></ul><div><strong>Tools &amp; Technologies:<br></strong><br></div><ul><li><a href="https://www.firebolt.io/core">Firebolt Core</a> - Free self-hosted query engine</li><li>Looker - BI Platform</li><li>Tableau - BI Platform</li><li>Sisense - BI Platform</li><li>Snowflake - Data Warehouse</li><li>BigQuery - Data Warehouse</li><li>PostgreSQL - Database</li><li>SQL Alchemy - Database toolkit</li><li>Pandas - Data analysis library<br><br></li></ul><div><strong>For Feedback &amp; Discussions on Firebolt Core:<br></strong><br></div><ul><li><a href="https://discord.com/invite/UpMPDHActM?utm_medium=podcast&amp;utm_source=famehost&amp;utm_campaign=the-data-engineering-show">Join Firebolt Discord Community</a></li><li><a href="https://github.com/firebolt-db/firebolt-core/discussions?utm_medium=podcast&amp;utm_source=famehost&amp;utm_campaign=the-data-engineering-show">Join Firebolt GitHub Discussions</a></li><li><a href="https://github.com/firebolt-db/firebolt-core?utm_medium=podcast&amp;utm_source=famehost&amp;utm_campaign=the-data-engineering-show">Firebolt Core Github Repository</a>&nbsp;</li><li>Benjamin@Firebolt.io</li></ul><div><br></div><div><strong>&nbsp;Primary Speakers:<br></strong><br></div><ul><li><a href="https://www.linkedin.com/in/lei-tang-ai/">Lei Tang</a> &nbsp;</li><li><a href="https://www.linkedin.com/in/wagjamin/">Benjamin Wagner </a><br><br></li></ul><div>The Data Engineering Show is brought to you by <a href="https://www.firebolt.io/">firebolt.io</a> and handcrafted by our friends over at: <a href="https://www.fame.so/?utm_medium=podcast&amp;utm_source=bcast&amp;utm_campaign=masters-of-community-with-david-spinks?utm_medium=podcast&amp;utm_source=bcast&amp;utm_campaign=confessions-of-a-b2b-marketer">fame.so</a><br><br>Previous guests include: Joseph Machado of Linkedin, Metthew Weingarten of Disney, Joe Reis and Matt Housely, authors of <em>The Fundamentals of Data Engineering, </em>Zach Wilson of Eczachly Inc, Megan Lieu of Deepnote, Erik Heintare of Bolt, Lior Solomon of Vimeo, Krishna Naidu of Canva, Mike Cohen of Substack, Jens Larsson of Ark, Gunnar Tangring of Klarna, Yoav Shmaria of Similarweb and Xiaoxu Gao of Adyen.<br><br>Check out our three most downloaded episodes:</div><ul><li><a href="https://www.dataengineeringshow.com/e/6nrrrlzn">Zach Wilson on What Makes a Great Data Engineer</a></li><li><a href="https://www.dataengineeringshow.com/e/v8w441mn">Joe Reis and Matt Housley on The Fundamentals of Data Engineering</a></li><li><a href="https://www.dataengineeringshow.com/e/p8lxxy18">Bill Inmon, The Godfather of Data Warehousing</a></li></ul>]]></content:encoded>
      <pubDate>Thu, 28 Aug 2025 11:00:00 +0000</pubDate>
      <author>The Firebolt Data Bros</author>
      <enclosure url="https://media.fame.so/895jq948.mp3" length="50692178" type="audio/mpeg"/>
      <itunes:author>The Firebolt Data Bros</itunes:author>
      <itunes:image href="https://content.fameapp.so/uploads/86qywn5q/caa7efb0-6c25-11ef-9489-d768d5a0a06d/caa7edf0-6c25-11ef-a69f-3da5bd01e0ea.jpg"/>
      <itunes:duration>1267</itunes:duration>
      <itunes:summary>AI is reshaping business intelligence by enabling true self-service analytics and transforming how organizations interact with their data through natural language processing. In this episode of The Data Engineering Show, host Benjamin interviews Lei, Co-founder and CTO of Fabi.ai, to explore how AI-native BI platforms are reshaping data analytics and empowering non-technical users to derive meaningful insights from complex datasets.</itunes:summary>
      <itunes:subtitle>AI is reshaping business intelligence by enabling true self-service analytics and transforming how organizations interact with their data through natural language processing. In this episode of The Data Engineering Show, host Benjamin interviews Lei, Co-founder and CTO of Fabi.ai, to explore how AI-native BI platforms are reshaping data analytics and empowering non-technical users to derive meaningful insights from complex datasets.</itunes:subtitle>
      <itunes:keywords>Lei Tang, Benjamin Wagner, Firebolt, Firebolt Core, AI for Business Intelligence, Data Analytics, BI Platform, AI-native BI, Fabi.ai CTO, Lei Tang, Vibe Analytics, Proactive BI, Human-AI Interaction, Semantic Understanding, Data Warehouse, SQL, Python, Data Accessibility, BI Agents</itunes:keywords>
      <itunes:explicit>No</itunes:explicit>
      <googleplay:explicit>No</googleplay:explicit>
    </item>
    <item>
      <title>Building Uber's AI Assistant: How Genie Revolutionizes On-Call Support with Paarth Chothani from Uber</title>
      <link>https://podcasts.fame.so/e/x812ymzn-building-uber-s-ai-assistant-how-genie-revolutionizes-on-call-support-with-paarth-chothani-from-uber</link>
      <itunes:title>Building Uber's AI Assistant: How Genie Revolutionizes On-Call Support with Paarth Chothani from Uber</itunes:title>
      <itunes:episode>44</itunes:episode>
      <itunes:block>No</itunes:block>
      <googleplay:block>No</googleplay:block>
      <guid isPermaLink="false">713rqk50</guid>
      <description>In this episode of The Data Engineering Show, the bros speak with Paarth, a Staff Engineer at Uber, about his work on Genie - an innovative AI assistant that revolutionizes on-call support by combining RAG (Retrieval Augmented Generation) with agent-based automation to help engineers find solutions faster.</description>
      <content:encoded><![CDATA[<div>Journey inside Uber's innovative AI assistant "Genie" with <a href="https://www.linkedin.com/in/paarthchothani">Paarth Chotani</a>, Staff Engineer at <a href="https://www.uber.com/">Uber</a>, as he shares how they're revolutionizing on-call support using LLMs and vector search. From processing massive amounts of internal documentation to building scalable RAG pipelines, discover how Uber tackles the challenges of implementing AI assistants at scale. Get insights into the evolution from traditional chatbots to agent-based solutions, and learn practical lessons about staying current in the rapidly evolving AI landscape. Whether you're building AI-powered tools or scaling data infrastructure, this episode offers valuable perspectives on balancing innovation with real-world implementation.</div><div><br></div><div>• Building and scaling RAG pipelines at enterprise scale</div><div>• Evolution from traditional chatbots to AI agents</div><div>• Practical insights on data processing and vector search implementation</div><div>• Leveraging open-source technologies in production environments</div><div>• Navigating rapid technological changes in AI development</div><div><br></div><div><strong>What You'll Learn:<br></strong><br></div><ul><li>How Uber transformed its on-call support system by building an AI assistant that searches across internal documentation, wikis, and code</li><li>Why combining multiple data sources with vector databases creates more accurate and contextual responses for enterprise support</li><li>The evolution from basic RAG implementation to agent-based architecture for handling complex support scenarios</li><li>How to scale AI processing pipelines using Apache Spark for large-scale data chunking and embedding generation</li><li>Why customization and internal data sources are crucial for enterprise AI assistant effectiveness</li><li>The future of AI assistants: moving from documentation lookup to automated problem resolution through multi-agent systems</li><li>How to balance rapid AI innovation with setting realistic customer expectations in fast-moving tech environments<br><br></li></ul><div>Paarth is a Staff Engineer at Uber, where he works on Michelangelo, Uber's machine learning platform. With over four years at Uber, he specializes in feature store development, online serving at scale, and GenAI implementations. He has been instrumental in developing Genie, an AI-powered on-call assistant that revolutionizes how Uber's engineering teams handle support requests and documentation access. In this episode, Paarth shares valuable insights on building and scaling RAG-based systems, vector search implementations, and the evolution of AI assistants from traditional chatbots to sophisticated agent-based solutions. His experience spanning both AWS chatbot development and current GenAI innovations at Uber offers listeners a unique perspective on the rapid advancement of AI-powered enterprise solutions.<br><br>If you enjoyed this episode, make sure to subscribe, rate, and review it on <a href="https://podcasts.apple.com/gb/podcast/the-data-engineering-show/id1561927688">Apple Podcasts</a>, <a href="https://open.spotify.com/show/6hMdnrFKlPbia2k6MkFs8U">Spotify</a>, and <a href="https://www.youtube.com/@thedataengineeringshow">YouTube Podcasts</a>. Instructions on how to do this are <a href="https://www.fame.so/follow-rate-review">here</a>.<br><br></div><div><strong>Quotes</strong></div><div><br></div><div>"Think of Genie as your on-call assistant. Different infra teams have their Slack channels, and because these technologies are widely used, you have to wait a lot." - Paarth<br><br>"What we realized is for our engineers to really get help, data sources really should be internal only because we customize lot of these open source engines for making it work at Uber scale." - Paarth<br><br>"Instead of building a mega scale pipeline that just ingest all data sources and then keeps a central data source solution, we instead are giving users the flexibility to ingest what data sources they want." - Paarth<br><br>"We had to scale our you can say the whole infrared layer to chunk data faster to be able to create embedding set scale." - Paarth<br><br>"It almost felt like they're doing what EMR was doing. You have your Hadoop and big data technology, and we needed these pipelines to basically process all this data quickly." - Paarth<br><br>"We've even evolved from just giving you the right documentation to starting to evolve into a situation where we'll also start taking actions on your behalf." - Paarth<br><br>"That intuition that comes from building this kind of bot, I feel like that intuition came again as we were starting to see this technology come, and we're like, hey, this looks like where you can pretty much fit all these pieces together." - Paarth<br><br>"What we have seen with several use cases is agentic genie works well when designed well, when you've analyzed the problem of which type of subproblems the bot should resolve per channel, per use case." - Paarth<br><br>"I think having a problem in mind always helps that way, the energy is little bit focused and directed." - Paarth<br><br>"Whatever you're building is not enough because the expectation has already gone to the next level, so the pace is too fast right now." - Paarth<br><br><br></div><div><strong>Resources</strong></div><ul><li>Companies &amp; Platforms:</li><li><a href="https://www.uber.com/">Uber</a> - ML Platform &amp; Engineering</li><li><a href="https://www.firebolt.io/">Firebolt</a> - Cloud Data Warehouse (firebolt.io)<br><br></li></ul><div><strong>Tools &amp; Technologies:<br></strong><br></div><ul><li>Michelangelo - Uber's ML Platform&nbsp;</li><li>Genie - Uber's On-Call Assistant Bot</li><li>Cursor - Developer IDE</li><li>OpenSearch - Vector Database</li><li>LangGraph - Agent Framework<br><br></li></ul><div><strong>Notable Projects Mentioned:<br></strong><br></div><ul><li>MetaMate (Meta)</li><li>Query Copilot (Uber)</li><li>Scale at AI (Meta Meetup)<br><br></li></ul><div><strong>Company Blogs:<br></strong><br></div><ul><li><a href="https://www.uber.com/en-ID/blog/engineering">Uber Engineering Blog</a> - Genie and Query Optimization articles<br><br></li></ul><div><strong>&nbsp;Primary Speakers:<br></strong><br></div><ul><li><a href="https://www.linkedin.com/in/paarthchothani">Paarth Chotani </a>- Staff Engineer, <a href="https://www.uber.com/">Uber</a></li><li>Benjamin - <a href="https://www.firebolt.io/">Firebolt</a></li><li>Eldad - <a href="https://www.firebolt.io/">Firebolt</a></li></ul><div><br><strong>For Feedback &amp; Discussions on Firebolt Core:</strong><br><br></div><ul><li><a href="https://discord.com/invite/UpMPDHActM">Join Firebolt Discord Community</a></li><li><a href="https://github.com/firebolt-db/firebolt-core/discussions">Join Firebolt GitHub Discussions</a></li><li><a href="https://github.com/firebolt-db/firebolt-core">Firebolt Core Github Repository</a>&nbsp;</li><li>Benjamin@Firebolt.io</li></ul><div><br></div><div>The Data Engineering Show is brought to you by <a href="https://www.firebolt.io/">firebolt.io</a> and handcrafted by our friends over at: <a href="https://www.fame.so/?utm_medium=podcast&amp;utm_source=bcast&amp;utm_campaign=masters-of-community-with-david-spinks?utm_medium=podcast&amp;utm_source=bcast&amp;utm_campaign=confessions-of-a-b2b-marketer">fame.so</a><br><br>Previous guests include: Joseph Machado of Linkedin, Metthew Weingarten of Disney, Joe Reis and Matt Housely, authors of <em>The Fundamentals of Data Engineering, </em>Zach Wilson of Eczachly Inc, Megan Lieu of Deepnote, Erik Heintare of Bolt, Lior Solomon of Vimeo, Krishna Naidu of Canva, Mike Cohen of Substack, Jens Larsson of Ark, Gunnar Tangring of Klarna, Yoav Shmaria of Similarweb and Xiaoxu Gao of Adyen.<br><br>Check out our three most downloaded episodes:</div><ul><li><a href="https://www.dataengineeringshow.com/e/6nrrrlzn">Zach Wilson on What Makes a Great Data Engineer</a></li><li><a href="https://www.dataengineeringshow.com/e/v8w441mn">Joe Reis and Matt Housley on The Fundamentals of Data Engineering</a></li><li><a href="https://www.dataengineeringshow.com/e/p8lxxy18">Bill Inmon, The Godfather of Data Warehousing</a></li></ul>]]></content:encoded>
      <pubDate>Tue, 22 Jul 2025 12:00:00 +0000</pubDate>
      <author>The Firebolt Data Bros</author>
      <enclosure url="https://media.fame.so/8vyzlyqw.mp3" length="49014628" type="audio/mpeg"/>
      <itunes:author>The Firebolt Data Bros</itunes:author>
      <itunes:image href="https://content.fameapp.so/uploads/86qywn5q/caa7efb0-6c25-11ef-9489-d768d5a0a06d/caa7edf0-6c25-11ef-a69f-3da5bd01e0ea.jpg"/>
      <itunes:duration>1531</itunes:duration>
      <itunes:summary>In this episode of The Data Engineering Show, the bros speak with Paarth, a Staff Engineer at Uber, about his work on Genie - an innovative AI assistant that revolutionizes on-call support by combining RAG (Retrieval Augmented Generation) with agent-based automation to help engineers find solutions faster.</itunes:summary>
      <itunes:subtitle>In this episode of The Data Engineering Show, the bros speak with Paarth, a Staff Engineer at Uber, about his work on Genie - an innovative AI assistant that revolutionizes on-call support by combining RAG (Retrieval Augmented Generation) with agent-based automation to help engineers find solutions faster.</itunes:subtitle>
      <itunes:keywords/>
      <itunes:explicit>No</itunes:explicit>
      <googleplay:explicit>No</googleplay:explicit>
    </item>
    <item>
      <title>From Zero to 100M Users: Inside Notion’s Data Stack and AI Strategy with Sumit Gupta</title>
      <link>https://podcasts.fame.so/e/1n2rz14n-from-zero-to-100m-users-inside-notion-s-data-stack-and-ai-strategy-with-sumit-gupta</link>
      <itunes:title>From Zero to 100M Users: Inside Notion’s Data Stack and AI Strategy with Sumit Gupta</itunes:title>
      <itunes:episode>43</itunes:episode>
      <itunes:block>No</itunes:block>
      <googleplay:block>No</googleplay:block>
      <guid isPermaLink="false">2199vp71</guid>
      <description>Dive into the future of data engineering with Sumit Gupta, Lead BI Engineer at Notion, as he shares insights with the bros on navigating the AI revolution in modern data stacks. From leveraging tools like Snowflake and dbt to automating content creation with AI, discover how traditional technical skills are evolving alongside the rise of AI. Whether you're a seasoned data professional or just starting your journey, learn why embracing AI isn't optional and how to balance technical expertise with crucial soft skills in this rapidly changing landscape. Get an insider's perspective on working at tech giants like Notion, Snowflake, and Dropbox, while exploring practical applications of AI in both professional and personal contexts.</description>
      <content:encoded><![CDATA[<div>AI's transformative impact on data engineering and analytics is reshaping how professionals create value, shifting focus from technical skills to strategic thinking and communication.</div><div><br></div><div>In this episode of The Data Engineering Show, the bros talk with <a href="https://www.linkedin.com/in/sumonigupta/">Sumit Gupta</a>, Lead BI Engineer at Notion, about his journey through prominent tech companies, modern data stacks, and how AI is revolutionizing data workflows and professional development.</div><div><br></div><div><strong>What You'll Learn:</strong></div><div><br></div><ul><li>How modern data stacks are evolving with tools like Snowflake, dbt, Iceberg, and Hex</li><li>Why transferable skills are becoming more crucial than technical expertise in the AI era</li><li>How to leverage AI tools strategically</li><li>The framework for automating content creation workflows using AI tools and APIs</li><li>Why this is "the worst AI will ever be" and how to prepare for accelerating change</li><li>How to balance AI automation with authentic human connection in content creation</li><li>Why modern data professionals must embrace AI while maintaining ethical considerations</li><li>How companies like Notion are implementing AI for improved customer insights and engagement</li></ul><div><br></div><div>This episode offers valuable insights into the practical application of AI in data workflows, content creation, and professional development, while addressing both the opportunities and challenges in the evolving tech landscape.</div><div><br><strong>Highlights</strong>:<br><br>[03:19] - Modern Data Stack Evolution for Scale<br><br>[10:05] - AI-Powered Customer Intelligence Platform<br><br>[15:17] - Future of Data Careers in the AI Era<br><br>[18:49] - Automated Content Creation Workflow<br><br>If you enjoyed this episode, make sure to subscribe, rate, and review it on <a href="https://podcasts.apple.com/gb/podcast/the-data-engineering-show/id1561927688">Apple Podcasts</a>, <a href="https://open.spotify.com/show/6hMdnrFKlPbia2k6MkFs8U">Spotify</a>, and <a href="https://www.youtube.com/@thedataengineeringshow">YouTube Podcasts</a>. Instructions on how to do this are <a href="https://www.fame.so/follow-rate-review">here</a>.<br><br></div><div><strong>About the Guest</strong></div><div><br></div><div>Sumit Gupta is a Lead BI Engineer at Notion, where he spearheads reporting and dashboarding initiatives for marketing and sales teams. With over a decade of experience in data and analytics, including notable roles at industry leaders like Snowflake and Dropbox, he brings deep expertise in modern data stack implementation and AI integration. In this episode, Sumit shares valuable insights on the evolution of data engineering, the impact of AI on analytics workflows, and how he leverages various AI tools to enhance productivity both professionally and as a content creator with 21,000+ Instagram followers. His unique perspective on balancing technical expertise with transferable skills in the age of AI, combined with his experience at "Bay Area Darlings" like Notion, Snowflake, and Dropbox, makes this conversation particularly relevant for data professionals navigating the rapidly evolving tech landscape.</div><div><br></div><div><strong>Quotes</strong></div><div><br></div><div>"The scariest part about the whole AI boom is this is the worst AI will ever be." - Sumit</div><div><br></div><div>"If you are someone who's starting new in data field, the value of your technical skills that used to be very valuable until 2021 is not as much - your transferable skills or soft skills comes into picture." - Sumit</div><div><br></div><div>"Every bit is expensive - all the servers are cheap, but when you're dealing with hundred million users and trillions of rows of data a day, you have to find that one percent saving." - Sumit</div><div><br></div><div>"AI has made me a lot more productive, but at the same time, it has also made me dumber." - Sumit</div><div><br></div><div>"If you are someone new, especially in data, scared of AI or skeptic of AI, I would say jump in - if you don't jump onto the bandwagon right now, you might be left out in a year or so." - Sumit</div><div><br></div><div><strong>Resources</strong></div><div><br></div><ul><li><a href="https://www.linkedin.com/in/sumonigupta/">Sumit Gupta LinkedIn</a></li><li><a href="https://www.notion.com/">Notion Website</a></li><li><a href="https://www.instagram.com/sumonigupta/">Sumit Gupta Instagram</a></li><li><a href="https://www.firebolt.io/">Firebolt Website</a></li></ul><div><br><strong>For Feedback &amp; Discussions on Firebolt Core:</strong><br><br></div><ul><li><a href="https://discord.com/invite/UpMPDHActM">Join Firebolt Discord Community</a></li><li><a href="https://github.com/firebolt-db/firebolt-core/discussions">Join Firebolt GitHub Discussions</a></li><li><a href="https://github.com/firebolt-db/firebolt-core">Firebolt Core Github Repository</a>&nbsp;</li><li>Benjamin@Firebolt.io</li></ul><div><br></div><div>The Data Engineering Show is brought to you by <a href="https://www.firebolt.io/">firebolt.io</a> and handcrafted by our friends over at: <a href="https://www.fame.so/?utm_medium=podcast&amp;utm_source=bcast&amp;utm_campaign=masters-of-community-with-david-spinks?utm_medium=podcast&amp;utm_source=bcast&amp;utm_campaign=confessions-of-a-b2b-marketer">fame.so</a><br><br>Previous guests include: Joseph Machado of Linkedin, Metthew Weingarten of Disney, Joe Reis and Matt Housely, authors of <em>The Fundamentals of Data Engineering, </em>Zach Wilson of Eczachly Inc, Megan Lieu of Deepnote, Erik Heintare of Bolt, Lior Solomon of Vimeo, Krishna Naidu of Canva, Mike Cohen of Substack, Jens Larsson of Ark, Gunnar Tangring of Klarna, Yoav Shmaria of Similarweb and Xiaoxu Gao of Adyen.<br><br>Check out our three most downloaded episodes:</div><ul><li><a href="https://www.dataengineeringshow.com/e/6nrrrlzn">Zach Wilson on What Makes a Great Data Engineer</a></li><li><a href="https://www.dataengineeringshow.com/e/v8w441mn">Joe Reis and Matt Housley on The Fundamentals of Data Engineering</a></li><li><a href="https://www.dataengineeringshow.com/e/p8lxxy18">Bill Inmon, The Godfather of Data Warehousing</a></li></ul>]]></content:encoded>
      <pubDate>Tue, 10 Jun 2025 11:00:00 +0000</pubDate>
      <author>The Firebolt Data Bros</author>
      <enclosure url="https://media.fame.so/80v4yrv8.mp3" length="53342040" type="audio/mpeg"/>
      <itunes:author>The Firebolt Data Bros</itunes:author>
      <itunes:image href="https://content.fameapp.so/uploads/86qywn5q/caa7efb0-6c25-11ef-9489-d768d5a0a06d/caa7edf0-6c25-11ef-a69f-3da5bd01e0ea.jpg"/>
      <itunes:duration>1333</itunes:duration>
      <itunes:summary>Dive into the future of data engineering with Sumit Gupta, Lead BI Engineer at Notion, as he shares insights with the bros on navigating the AI revolution in modern data stacks. From leveraging tools like Snowflake and dbt to automating content creation with AI, discover how traditional technical skills are evolving alongside the rise of AI. Whether you're a seasoned data professional or just starting your journey, learn why embracing AI isn't optional and how to balance technical expertise with crucial soft skills in this rapidly changing landscape. Get an insider's perspective on working at tech giants like Notion, Snowflake, and Dropbox, while exploring practical applications of AI in both professional and personal contexts.</itunes:summary>
      <itunes:subtitle>Dive into the future of data engineering with Sumit Gupta, Lead BI Engineer at Notion, as he shares insights with the bros on navigating the AI revolution in modern data stacks. From leveraging tools like Snowflake and dbt to automating content creation with AI, discover how traditional technical skills are evolving alongside the rise of AI. Whether you're a seasoned data professional or just starting your journey, learn why embracing AI isn't optional and how to balance technical expertise with crucial soft skills in this rapidly changing landscape. Get an insider's perspective on working at tech giants like Notion, Snowflake, and Dropbox, while exploring practical applications of AI in both professional and personal contexts.</itunes:subtitle>
      <itunes:keywords>Sumit Gupta, data engineering, Notion, AI implementation, modern data stack, business intelligence, data analytics, firel bolt</itunes:keywords>
      <itunes:explicit>No</itunes:explicit>
      <googleplay:explicit>No</googleplay:explicit>
    </item>
    <item>
      <title>How Rising Wave Is Redefining Real-Time Data with Postgres Power</title>
      <link>https://podcasts.fame.so/e/58z733pn-how-rising-wave-is-redefining-real-time-data-with-postgres-power</link>
      <itunes:title>How Rising Wave Is Redefining Real-Time Data with Postgres Power</itunes:title>
      <itunes:episode>42</itunes:episode>
      <itunes:block>No</itunes:block>
      <googleplay:block>No</googleplay:block>
      <guid isPermaLink="false">81zn55r1</guid>
      <description>In this episode of The Data Engineering Show, the bros sit with Yingjun Wu, founder and CEO of Rising Wave, to explore the innovative world of stream processing systems. Yingjun shares his journey from academic research to creating a Postgres-compatible streaming system that drastically reduces resource usage. They discuss how Rising Wave's S3-based architecture and Postgres compatibility provide advantages over traditional systems like Flink, and explore the increasing role of Apache Iceberg in data pipelines.</description>
      <content:encoded><![CDATA[<div><br>In this episode of The Data Engineering Show, host Benjamin and co-host Eldad sit with <a href="https://www.linkedin.com/in/yingjun-wu">Yingjun Wu</a>, founder and CEO of <a href="https://risingwave.com/">Rising Wave</a>, to explore the evolution of stream processing systems and the innovations his company is bringing to the space.</div><div><br><strong>What you’ll learn:</strong><br><br></div><ul><li>Yingjun's journey from academic research in stream processing to founding Rising Wave, and the challenges of building trust in a new database system.<br><br></li><li>How Rising Wave's architecture, using S3 as primary storage, delivers second-level scalability, while other systems can take hours to scale.<br><br></li><li>The competitive landscape of stream processing, with Rising Wave's Postgres compatibility providing a significant advantage in ease of use.<br><br></li><li>How one major company reduced its CPU requirements from 20,000 to just 600 by switching from a traditional stream processing system to Rising Wave.<br><br></li><li>The rising importance of Apache Iceberg as a destination for stream processing output, helping companies avoid vendor lock-in.<br><br></li><li>How streaming systems fit into modern data stacks, especially as companies seek to avoid being locked into proprietary systems.<br><br></li></ul><div>Yingjun Wu is the founder and CEO of Rising Wave, a stream processing system built in Rust and designed with a cloud-native architecture. With a PhD focused on stream processing and database systems, Yingjun previously worked at Redshift and IBM Research before founding Rising Wave. His company has developed a system that achieves significant performance and resource efficiency advantages over traditional stream processing solutions, while maintaining Postgres compatibility for ease of use.<br><br></div><div><strong>Episode Highlights:<br><br>The Origins of Rising Wave (00:30)<br></strong><br></div><div>Yingjun shares his background in stream processing from his PhD days and explains how his experience at Redshift revealed the need for better stream processing solutions, especially since many data warehouse workloads involve data ingested from streaming sources like Kinesis or Kafka.<br><br><strong>Building a System from Scratch (04:10)<br></strong><br></div><div>Yingjun describes the challenging first 2-3 years of developing Rising Wave without customers, highlighting how trust is a major barrier for new database systems. After 2.5 years, they secured their first customers, including a startup and several larger companies, which helped establish Rising Wave's credibility.<br><br><strong>The Current Stream Processing Landscape (07:47)<br></strong><br></div><div>Benjamin asks about the current stream processing space, with Yingjun positioning Rising Wave as a leader, particularly for SQL-based workloads. He highlights several key advantages of Rising Wave, including its Rust-based implementation and S3-based storage architecture.<br><br><strong>S3 as Primary Storage (10:27)<br></strong><br></div><div>Yingjun explains their decision to use S3 as primary storage from day one, despite its slowness and expense. He discusses how they've optimized for these challenges and would still make the same architectural choice today due to benefits like simplified state management and superior elastic scaling.<br><br><strong>The Business Model (13:52)<br></strong><br></div><div>Rising Wave offers open-source, cloud, and on-premise versions of its product. Yingjun notes that many highly regulated industries require on-premise deployment, including customers in the banking and aerospace sectors.<br><br><strong>Typical Users and Competitive Advantages (15:01)<br></strong><br></div><div>When asked about their typical users, Yingjun explains they directly compete with Flink but have advantages in ease of use due to Postgres compatibility. Their users are either new to stream processing or are migrating from systems like Spark Streaming or Flink due to performance issues or development complexity.<br><br><strong>Apache Iceberg Integration (19:25)<br></strong><br></div><div>Yingjun discusses how Apache Iceberg is emerging as an important destination for Rising Wave output, as companies seek to avoid vendor lock-in with proprietary data warehouses. He explains how Rising Wave typically performs ETL functions before data is sent to Iceberg tables.<br><br><strong>The Future of Data Management (32:06)<br></strong><br></div><div>The conversation concludes with a discussion about Iceberg becoming a "single source of truth" for data, with multiple specialized query engines potentially accessing the same data. Yingjun and Eldad share perspectives on how this shift away from proprietary data lock-in is changing the data ecosystem.<br><br>If you enjoyed this episode, make sure to subscribe, rate, and review it on <a href="https://podcasts.apple.com/gb/podcast/the-data-engineering-show/id1561927688">Apple Podcasts</a>, <a href="https://open.spotify.com/show/6hMdnrFKlPbia2k6MkFs8U">Spotify</a>, and <a href="https://www.youtube.com/@thedataengineeringshow">YouTube Podcasts</a>. Instructions on how to do this are <a href="https://www.fame.so/follow-rate-review">here</a>.<br><br></div><div><strong>Episode Resources:<br></strong><br></div><ul><li><a href="https://risingwave.com/">Rising Wave Website</a></li><li><a href="https://www.linkedin.com/in/yingjun-wu">Yingjun Wu LinkedIn</a></li></ul><div><br><strong>For Feedback &amp; Discussions on Firebolt Core:</strong><br><br></div><ul><li><a href="https://discord.com/invite/UpMPDHActM">Join Firebolt Discord Community</a></li><li><a href="https://github.com/firebolt-db/firebolt-core/discussions">Join Firebolt GitHub Discussions</a></li><li><a href="https://github.com/firebolt-db/firebolt-core">Firebolt Core Github Repository</a>&nbsp;</li><li>Benjamin@Firebolt.io</li></ul><div><br></div><div><br></div><div>The Data Engineering Show is brought to you by <a href="https://www.firebolt.io/">firebolt.io</a> and handcrafted by our friends over at: <a href="https://www.fame.so/?utm_medium=podcast&amp;utm_source=bcast&amp;utm_campaign=masters-of-community-with-david-spinks?utm_medium=podcast&amp;utm_source=bcast&amp;utm_campaign=confessions-of-a-b2b-marketer">fame.so</a><br><br>Previous guests include: Joseph Machado of Linkedin, Metthew Weingarten of Disney, Joe Reis and Matt Housely, authors of <em>The Fundamentals of Data Engineering, </em>Zach Wilson of Eczachly Inc, Megan Lieu of Deepnote, Erik Heintare of Bolt, Lior Solomon of Vimeo, Krishna Naidu of Canva, Mike Cohen of Substack, Jens Larsson of Ark, Gunnar Tangring of Klarna, Yoav Shmaria of Similarweb and Xiaoxu Gao of Adyen.<br><br>Check out our three most downloaded episodes:</div><ul><li><a href="https://www.dataengineeringshow.com/e/6nrrrlzn">Zach Wilson on What Makes a Great Data Engineer</a></li><li><a href="https://www.dataengineeringshow.com/e/v8w441mn">Joe Reis and Matt Housley on The Fundamentals of Data Engineering</a></li><li><a href="https://www.dataengineeringshow.com/e/p8lxxy18">Bill Inmon, The Godfather of Data Warehousing</a></li></ul>]]></content:encoded>
      <pubDate>Wed, 07 May 2025 11:00:00 +0000</pubDate>
      <author>The Firebolt Data Bros</author>
      <enclosure url="https://media.fame.so/8l4xn468.mp3" length="75841827" type="audio/mpeg"/>
      <itunes:author>The Firebolt Data Bros</itunes:author>
      <itunes:image href="https://content.fameapp.so/uploads/86qywn5q/caa7efb0-6c25-11ef-9489-d768d5a0a06d/caa7edf0-6c25-11ef-a69f-3da5bd01e0ea.jpg"/>
      <itunes:duration>1895</itunes:duration>
      <itunes:summary>In this episode of The Data Engineering Show, the bros sit with Yingjun Wu, founder and CEO of Rising Wave, to explore the innovative world of stream processing systems. Yingjun shares his journey from academic research to creating a Postgres-compatible streaming system that drastically reduces resource usage. They discuss how Rising Wave's S3-based architecture and Postgres compatibility provide advantages over traditional systems like Flink, and explore the increasing role of Apache Iceberg in data pipelines.</itunes:summary>
      <itunes:subtitle>In this episode of The Data Engineering Show, the bros sit with Yingjun Wu, founder and CEO of Rising Wave, to explore the innovative world of stream processing systems. Yingjun shares his journey from academic research to creating a Postgres-compatible streaming system that drastically reduces resource usage. They discuss how Rising Wave's S3-based architecture and Postgres compatibility provide advantages over traditional systems like Flink, and explore the increasing role of Apache Iceberg in data pipelines.</itunes:subtitle>
      <itunes:keywords>Yingjun Wu, Stream Processing, Apache Iceberg, PostgreSQL Compatible Database, S3-based Architecture, Rust Database Systems, Real-time Data Processing, Elastic Scaling, Data Warehousing, ETL vs ELT, Vendor Lock-in.</itunes:keywords>
      <itunes:explicit>No</itunes:explicit>
      <googleplay:explicit>No</googleplay:explicit>
    </item>
    <item>
      <title>Revolutionizing Data Governance with DataStrato’s Unified Open Source Approach</title>
      <link>https://podcasts.fame.so/e/r87yxv58-revolutionizing-data-governance-with-datastrato-unified-open-source-approach</link>
      <itunes:title>Revolutionizing Data Governance with DataStrato’s Unified Open Source Approach</itunes:title>
      <itunes:episode>41</itunes:episode>
      <itunes:block>No</itunes:block>
      <googleplay:block>No</googleplay:block>
      <guid isPermaLink="false">k08m3n91</guid>
      <description>In this episode of The Data Engineering Show, the bros sit with Lisa Cao, Product Manager at DataStrato, to explore data catalogs and Apache Gravitino, a unified metadata lake used to manage access and perform data governance for all data sources. They discuss data catalogs and how they refine the data management process.</description>
      <content:encoded><![CDATA[<div>In this episode of The Data Engineering Show, the bros sit with Lisa Cao, Product Manager at DataStrato, to explore data catalogs and Apache Gravitino, a unified metadata lake used to manage access and perform data governance for all data sources.&nbsp;</div><div><br></div><div><strong>What You’ll Learn:</strong></div><div><br></div><ul><li>How Apache Gravitino differs from others like Unity catalog and Polaris by being able to support multiple catalog systems.</li><li>What the “Push-Down Permission Management” security model is and how to implement it across different data systems.&nbsp;</li><li>How to maintain consistent governance across various query engines like Spark, Trino, and Flink.</li><li>Why interoperability, flexibility and open source ecosystem are becoming an important dynamics of data infrastructure rather than performance benchmarking.</li><li>How to evaluate new data tools based on their real-world adoption rather than the social media hype.</li></ul><div><br></div><div>If you enjoyed this episode, make sure to subscribe, rate, and review it on Apple Podcasts, Spotify, and YouTube Podcasts instructions on how to do this here [insert link].<br><br></div><div>Lisa Cao is a Product Manager at DataStrato, specializing in AI/ML product partnerships and developer relations. With deep expertise in data catalog technologies and open-source ecosystems, she plays a key role in developing Apache Gravitino, an ASF incubating project that provides a unified governance and security layer for diverse data systems. Her work in developing extensible catalog frameworks has helped organizations manage complex data environments across multiple platforms.</div><div><br><strong>Episode Highlights:</strong></div><div><br></div><ul><li>What is Apache Gravitino? (01:24)</li></ul><div>Apache Gravitino is a meta-catalog that serves as a unified data governance and security layer used to manage different data systems. Lisa shares that Gravitino was the first to release an iceberg rest catalog and ended up open sourcing for the general community to use and as time passed, Polaris and Unity Catalog were also announced in open source. She highlights that although Gravitino, Polaris and Unity Catalog are very similar, Gravitino differs in that it is able to support multiple catalogs.</div><div><br></div><ul><li>Unifying AI/ML and Big Data Stack (03:15)</li></ul><div>One of the interesting things about Gravitino is that it offers more than just a catalog of data models and these model catalogs are the first step into looking at how to merge two worlds of AI and ML catalogs. Lisa shares the goal of effective management, that is, creating a system that can store and manage different types of data models, track changes to the models, and control access to the models.</div><div><br></div><ul><li>Simplifying Data Governance (10:49)</li></ul><div>Think of Gravitino as a “traffic cop” that helps to manage and secure data from multiple sources. It is crucial to have a system that provides unified access control across all data sources, allowing teams to manage access and data governance so that ML teams don't have to worry about access. Lisa says that Apache Gravitino is the system that makes data accessible to different teams and users while making sure that it is secure and governed appropriately.&nbsp;</div><div><br></div><ul><li>The Gravitino’s Query Engine Solution (21:34)</li></ul><div>Every query engine has its own way of managing data, which makes it difficult to switch between engines - you have to reconfigure everything. Lisa highlights that Gravitino solves the problem by providing a single layer of data governance that works across multiple query engines.</div><div><br></div><ul><li>Navigating the Fast-Paced World of Data Engineering (24:41)</li></ul><div><br></div><div>Lisa talks about how fast the data engineering space is moving and shares some insights to catching up;</div><div><br></div><ul><li>Don’t try to learn everything at once.</li><li>Don't get too deep into every tool</li><li>Look for real-world adoption</li></ul><div><br></div><div>She warns against the social media hype that can amplify the messaging around new tools, making it seem everyone is using it, when in reality, that can’t be easily seen.<br><br>If you enjoyed this episode, make sure to subscribe, rate, and review it on <a href="https://podcasts.apple.com/gb/podcast/the-data-engineering-show/id1561927688">Apple Podcasts</a>, <a href="https://open.spotify.com/show/6hMdnrFKlPbia2k6MkFs8U">Spotify</a>, and <a href="https://www.youtube.com/@thedataengineeringshow">YouTube Podcasts</a>. Instructions on how to do this are <a href="https://www.fame.so/follow-rate-review">here</a>.</div><div><br><strong>Episode Resources:</strong></div><div><br></div><ul><li>Apache Gravitino <a href="https://gravitino.apache.org">website</a></li></ul><div><br><strong>For Feedback &amp; Discussions on Firebolt Core:</strong><br><br></div><ul><li><a href="https://discord.com/invite/UpMPDHActM">Join Firebolt Discord Community</a></li><li><a href="https://github.com/firebolt-db/firebolt-core/discussions">Join Firebolt GitHub Discussions</a></li><li><a href="https://github.com/firebolt-db/firebolt-core">Firebolt Core Github Repository</a>&nbsp;</li><li>Benjamin@Firebolt.io</li></ul><div><br></div><div>The Data Engineering Show is brought to you by <a href="https://www.firebolt.io/">firebolt.io</a> and handcrafted by our friends over at: <a href="https://www.fame.so/?utm_medium=podcast&amp;utm_source=bcast&amp;utm_campaign=masters-of-community-with-david-spinks?utm_medium=podcast&amp;utm_source=bcast&amp;utm_campaign=confessions-of-a-b2b-marketer">fame.so</a><br><br>Previous guests include: Joseph Machado of Linkedin, Metthew Weingarten of Disney, Joe Reis and Matt Housely, authors of <em>The Fundamentals of Data Engineering, </em>Zach Wilson of Eczachly Inc, Megan Lieu of Deepnote, Erik Heintare of Bolt, Lior Solomon of Vimeo, Krishna Naidu of Canva, Mike Cohen of Substack, Jens Larsson of Ark, Gunnar Tangring of Klarna, Yoav Shmaria of Similarweb and Xiaoxu Gao of Adyen.<br><br>Check out our three most downloaded episodes:</div><ul><li><a href="https://www.dataengineeringshow.com/e/6nrrrlzn">Zach Wilson on What Makes a Great Data Engineer</a></li><li><a href="https://www.dataengineeringshow.com/e/v8w441mn">Joe Reis and Matt Housley on The Fundamentals of Data Engineering</a></li><li><a href="https://www.dataengineeringshow.com/e/p8lxxy18">Bill Inmon, The Godfather of Data Warehousing</a></li></ul>]]></content:encoded>
      <pubDate>Tue, 08 Apr 2025 10:00:00 +0000</pubDate>
      <author>The Firebolt Data Bros</author>
      <enclosure url="https://media.fame.so/wj07910w.mp3" length="56659052" type="audio/mpeg"/>
      <itunes:author>The Firebolt Data Bros</itunes:author>
      <itunes:image href="https://content.fameapp.so/uploads/86qywn5q/caa7efb0-6c25-11ef-9489-d768d5a0a06d/caa7edf0-6c25-11ef-a69f-3da5bd01e0ea.jpg"/>
      <itunes:duration>1416</itunes:duration>
      <itunes:summary>In this episode of The Data Engineering Show, the bros sit with Lisa Cao, Product Manager at DataStrato, to explore data catalogs and Apache Gravitino, a unified metadata lake used to manage access and perform data governance for all data sources. They discuss data catalogs and how they refine the data management process.</itunes:summary>
      <itunes:subtitle>In this episode of The Data Engineering Show, the bros sit with Lisa Cao, Product Manager at DataStrato, to explore data catalogs and Apache Gravitino, a unified metadata lake used to manage access and perform data governance for all data sources. They discuss data catalogs and how they refine the data management process.</itunes:subtitle>
      <itunes:keywords>Data Catalogs, Open Source Catalogs, Iceberg Rest Catalog, Data Governance, Data Security Frameworks, Data Access Control, Data Engineering Solutions, Apache Gravitino, RBAC Systems</itunes:keywords>
      <itunes:explicit>No</itunes:explicit>
      <googleplay:explicit>No</googleplay:explicit>
    </item>
    <item>
      <title>Database Technology in the Age of AI with DuckDB Labs co-creator Hannes Mühleisen</title>
      <link>https://podcasts.fame.so/e/18pv9vz8-beyond-database-optimization-with-ai</link>
      <itunes:title>Database Technology in the Age of AI with DuckDB Labs co-creator Hannes Mühleisen</itunes:title>
      <itunes:episode>40</itunes:episode>
      <itunes:block>No</itunes:block>
      <googleplay:block>No</googleplay:block>
      <guid isPermaLink="false">z1r3w3m0</guid>
      <description>In this episode of The Data Engineering Show, the bros welcome the CEO DuckDB Labs and co-creator DuckDB, Hannes Mühleisen. They delve into the groundbreaking journey of DuckDB, an analytical database that processes billions of queries every month. Learn why DuckDB prioritizes broad compatibility over specialized optimizations, how its extension model works and the emerging solutions for database technology in the age of AI.</description>
      <content:encoded><![CDATA[<div>In this episode of The Data Engineering Show, host Benjamin and co-host Eldad sit with CEO DuckDB Labs and co-creator DuckDB, Hannes Mühleisen.</div><div><br></div><div><strong>Together, they:</strong><br><br></div><ul><li>Talk about the journey of DuckDB, an open-source analytical database system designed as a universal wrangling tool.</li><li>Explain how DuckDB differs from SQLite, highlighting the analytical and transactional use cases.</li><li>Discuss DuckDB’s special feature and its approach to innovation including creating their Parquet Reader.</li><li>Explore the simple and efficient ecosystem of DuckDB, allowing developers to add custom functionality without changing its core stability.</li><li>Consider Hannes' perspective on the role of AI in databases.</li><li>Delve into the system’s infrastructure, design choices and the dedication of the team to ensure a continuous, reliable database system.</li></ul><div><br></div><div>If you enjoyed this episode, make sure to subscribe, rate, and review it on Apple Podcasts, Spotify, and YouTube Podcasts, instructions on how to do this are [insert link].</div><div><br></div><div>Hannes Mühleisen is the CEO of DuckDB Labs and a Professor in The Netherlands, renowned for co-creating DuckDB, an open-source analytical database system. With a background in database architecture and research from CWI database architectures group, he has pioneered the development of DuckDB as a universal data wrangling tool that can run everywhere from phones to space satellites. Under his leadership, DuckDB has achieved remarkable success, reaching 10 million downloads monthly and becoming a go-to solution for analytical database needs. His commitment to keeping DuckDB lightweight, portable, and hardware-agnostic while maintaining high performance has revolutionized how developers approach analytical database solutions. As both an academic and technology leader, Hannes brings unique insights into database architecture, open-source development, and the future of analytical data processing.</div><div><br></div><div><strong>Episode Highlights:</strong></div><div><br></div><ul><li>The Purpose of DuckDB (01:04)</li></ul><div>Hannes gives a full description of what DuckDB is as well as what it is designed to do. He describes the tool as one that understands SQL and is specifically designed to simplify complex analytical use cases.</div><ul><li>SQLite vs DuckDB (02:53)</li></ul><div>Hannes compares two different tools stating that SQLite is an amazing system that is not meant for analytical queries but for transactional use cases while DuckDB is specifically designed for that exact purpose - analytical use cases.&nbsp;</div><div><br></div><ul><li>The Importance of Collaboration (08:14)</li></ul><div>Hannes states the need for community collaboration as the database engine space seems to have hundreds of brilliant people trying to solve the same problems. He shares his profound admiration for a team in Munich, praising them for their exploits in implementing concepts only described in paper.</div><div><br></div><ul><li>The Component-Based Architecture of DuckDB (11:25)</li></ul><div>Hannes highlights a special feature in DuckDB, that is, it can be used as a component and he explains that the in-process architecture is a success because of the memory of data sharing that can be achieved.</div><div><br></div><ul><li>The Parquet Reader Journey (17:51)</li></ul><div>Hannes explains how he built his Parquet Reader out of necessity, although he would have preferred not to. He shares how a creator named Ove Korn from Germany donated the reader to a project named “The Arrow Project” and managed it to the degree that the entire project depended on the use of the Parquet Reader and it became an issue to use both independently. Hannes adds that a parquet reader that is competent has no choice but to become a database engine which is one of the interesting things about development.</div><div><br></div><ul><li>The Role of AI in Database Interaction (22:41)</li></ul><div>Hannes states that he doesn’t think that AI has a place in a database engine but rather, it is needed for optimization because the researchers who built their careers on optimization are out of jobs. He explains that the role of AI should be for assistance tasks and not for a total execution.</div><div><br></div><ul><li>SQL - A Defined Interface (29:20)</li></ul><div>Hannes introduces us to a tool that allows us to pro-programmatically build a query called relational API stating that it helps to simplify the tasks of a programmer. Although, Hannes agrees that using a well-defined interface is important for components like databases, he also argues that SQL can provide a relatively defined behavior within a single system.&nbsp;</div><div><br></div><ul><li>The Golden Age of Database (38:57)</li></ul><div>Hannes concludes the episode by appreciating Firebolt and other engineers for taking on core engine tasks. He shares his excitement for the golden age of databases where there is a showcasing of what is possible.<br><br>If you enjoyed this episode, make sure to subscribe, rate, and review it on <a href="https://podcasts.apple.com/gb/podcast/the-data-engineering-show/id1561927688">Apple Podcasts</a>, <a href="https://open.spotify.com/show/6hMdnrFKlPbia2k6MkFs8U">Spotify</a>, and <a href="https://www.youtube.com/@thedataengineeringshow">YouTube Podcasts</a>. Instructions on how to do this are <a href="https://www.fame.so/follow-rate-review">here</a>.<br><br></div><div><strong>Quotes:</strong></div><div><br></div><ol><li><em>“DuckDB is a universal data wrangling tool. It is a relational data management system that speaks SQL designed to do well on analytical use cases.”</em></li></ol><div><br></div><ol><li><em>“We call ourselves the SQLite for analytics because it explains the original design goal of DuckDB very well.”</em></li></ol><div><br></div><ol><li><em>“Within the database engine space, we are all working to solve the same problems, and that's like, a hundred of us on the planet.”</em></li></ol><div><br></div><ol><li><em>“It actually turns out in order to make a competent parquet reader, you do need query execution. There is just no way around it.”</em></li></ol><div><br></div><ol><li><em>“I really like this golden age of databases we are in and personally, as somebody who really likes tables and SQL, I'm quite happy to see things like firebolt and others really working on core engine stuff.”</em></li></ol><div><br><strong>For Feedback &amp; Discussions on Firebolt Core:</strong><br><br></div><ul><li><a href="https://discord.com/invite/UpMPDHActM">Join Firebolt Discord Community</a></li><li><a href="https://github.com/firebolt-db/firebolt-core/discussions">Join Firebolt GitHub Discussions</a></li><li><a href="https://github.com/firebolt-db/firebolt-core">Firebolt Core Github Repository</a>&nbsp;</li><li>Benjamin@Firebolt.io</li></ul><div>The Data Engineering Show is brought to you by <a href="https://www.firebolt.io/">firebolt.io</a> and handcrafted by our friends over at: <a href="https://www.fame.so/?utm_medium=podcast&amp;utm_source=bcast&amp;utm_campaign=masters-of-community-with-david-spinks?utm_medium=podcast&amp;utm_source=bcast&amp;utm_campaign=confessions-of-a-b2b-marketer">fame.so</a><br><br>Previous guests include: Joseph Machado of Linkedin, Metthew Weingarten of Disney, Joe Reis and Matt Housely, authors of <em>The Fundamentals of Data Engineering, </em>Zach Wilson of Eczachly Inc, Megan Lieu of Deepnote, Erik Heintare of Bolt, Lior Solomon of Vimeo, Krishna Naidu of Canva, Mike Cohen of Substack, Jens Larsson of Ark, Gunnar Tangring of Klarna, Yoav Shmaria of Similarweb and Xiaoxu Gao of Adyen.<br><br>Check out our three most downloaded episodes:</div><ul><li><a href="https://www.dataengineeringshow.com/e/6nrrrlzn">Zach Wilson on What Makes a Great Data Engineer</a></li><li><a href="https://www.dataengineeringshow.com/e/v8w441mn">Joe Reis and Matt Housley on The Fundamentals of Data Engineering</a></li><li><a href="https://www.dataengineeringshow.com/e/p8lxxy18">Bill Inmon, The Godfather of Data Warehousing</a></li></ul>]]></content:encoded>
      <pubDate>Wed, 19 Mar 2025 11:00:00 +0000</pubDate>
      <author>The Firebolt Data Bros</author>
      <enclosure url="https://media.fame.so/8x9jj2jw.mp3" length="74083264" type="audio/mpeg"/>
      <itunes:author>The Firebolt Data Bros</itunes:author>
      <itunes:image href="https://content.fameapp.so/uploads/86qywn5q/caa7efb0-6c25-11ef-9489-d768d5a0a06d/caa7edf0-6c25-11ef-a69f-3da5bd01e0ea.jpg"/>
      <itunes:duration>1852</itunes:duration>
      <itunes:summary>In this episode of The Data Engineering Show, the bros welcome the CEO DuckDB Labs and co-creator DuckDB, Hannes Mühleisen. They delve into the groundbreaking journey of DuckDB, an analytical database that processes billions of queries every month. Learn why DuckDB prioritizes broad compatibility over specialized optimizations, how its extension model works and the emerging solutions for database technology in the age of AI.</itunes:summary>
      <itunes:subtitle>In this episode of The Data Engineering Show, the bros welcome the CEO DuckDB Labs and co-creator DuckDB, Hannes Mühleisen. They delve into the groundbreaking journey of DuckDB, an analytical database that processes billions of queries every month. Learn why DuckDB prioritizes broad compatibility over specialized optimizations, how its extension model works and the emerging solutions for database technology in the age of AI.</itunes:subtitle>
      <itunes:keywords/>
      <itunes:explicit>No</itunes:explicit>
      <googleplay:explicit>No</googleplay:explicit>
    </item>
    <item>
      <title>AI and Data Movement: Trends and Best Practices with Estuary’s Daniel Pálma</title>
      <link>https://podcasts.fame.so/e/6nrrxkyn-ai-and-data-movement-trends-and-best-practices-with-estuary-s-daniel-palma</link>
      <itunes:title>AI and Data Movement: Trends and Best Practices with Estuary’s Daniel Pálma</itunes:title>
      <itunes:episode>39</itunes:episode>
      <itunes:block>No</itunes:block>
      <googleplay:block>No</googleplay:block>
      <guid isPermaLink="false">81562zp0</guid>
      <description>In this episode of The Data Engineering Show, the bros sit with Daniel Pálma, Head of Marketing at Estuary, to delve into the intriguing world of data engineering and marketing. Daniel shares his transition journey into marketing from data engineering and how his technical proficiency has been leveraged to market to engineers. The conversation cuts across the importance of AI in data movement, the future of data engineering, real-time data integration challenges, and the evolution of data integration.</description>
      <content:encoded><![CDATA[<div>In this episode of <em>The Data Engineering Show</em>, the bros sit with Daniel Pálma, Head of Marketing at Estuary.</div><div><br></div><div><strong>Join them as they:</strong></div><div><br></div><ul><li>Talk about Daniel’s career transition from data engineering to marketing and how his background in data engineering has been a tremendous help to his marketing competence.</li><li>Discuss the role of AI in the evolution of data movement ensuring a faster and easier process of creating data pipelines.</li><li>Shine light on the challenges of vector databases and structured data in AI applications.</li><li>Delve into the future of Apache Iceberg and data lakehouses, highlighting their current challenges.</li><li>Shares insights on the golden age of data expressing the need for more data engineers, data analysts and data practitioners in the data space.</li></ul><div><br></div><div>If you enjoyed this episode, make sure to subscribe, rate, and review it on Apple Podcasts, Spotify, and YouTube Podcasts, instructions on how to do this are <a href="https://www.fame.so/follow-rate-review">here</a>.</div><div><br></div><div>Daniel Pálma serves as Head of Marketing at Estuary, bringing a unique blend of technical expertise and marketing acumen to the data integration space. With nearly a decade of experience as a data engineer across startups, enterprises, and consulting roles, Daniel made a strategic pivot to marketing to help bridge the gap between complex technical solutions and their practical applications for data practitioners. His background in data engineering enables him to deeply understand the customers' challenges and create authentic, education-focused marketing content that resonates with technical audiences. Daniel’s thought leadership and content creation in the data engineering space, combined with his hands-on technical experience, positions him as a valuable voice in conversations about the evolution of data infrastructure and integration technologies. <br><br>If you enjoyed this episode, make sure to subscribe, rate, and review it on <a href="https://podcasts.apple.com/gb/podcast/the-data-engineering-show/id1561927688">Apple Podcasts</a>, <a href="https://open.spotify.com/show/6hMdnrFKlPbia2k6MkFs8U">Spotify</a>, and <a href="https://www.youtube.com/@thedataengineeringshow">YouTube Podcasts</a>. Instructions on how to do this are <a href="https://www.fame.so/follow-rate-review">here</a>.</div><div><br><strong>For Feedback &amp; Discussions on Firebolt Core:</strong><br><br></div><ul><li><a href="https://discord.com/invite/UpMPDHActM">Join Firebolt Discord Community</a></li><li><a href="https://github.com/firebolt-db/firebolt-core/discussions">Join Firebolt GitHub Discussions</a></li><li><a href="https://github.com/firebolt-db/firebolt-core">Firebolt Core Github Repository</a>&nbsp;</li><li>Benjamin@Firebolt.io</li></ul><div>The Data Engineering Show is brought to you by <a href="https://www.firebolt.io/">firebolt.io</a> and handcrafted by our friends over at: <a href="https://www.fame.so/?utm_medium=podcast&amp;utm_source=bcast&amp;utm_campaign=masters-of-community-with-david-spinks?utm_medium=podcast&amp;utm_source=bcast&amp;utm_campaign=confessions-of-a-b2b-marketer">fame.so</a><br><br>Previous guests include: Joseph Machado of Linkedin, Metthew Weingarten of Disney, Joe Reis and Matt Housely, authors of <em>The Fundamentals of Data Engineering, </em>Zach Wilson of Eczachly Inc, Megan Lieu of Deepnote, Erik Heintare of Bolt, Lior Solomon of Vimeo, Krishna Naidu of Canva, Mike Cohen of Substack, Jens Larsson of Ark, Gunnar Tangring of Klarna, Yoav Shmaria of Similarweb and Xiaoxu Gao of Adyen.<br><br>Check out our three most downloaded episodes:</div><ul><li><a href="https://www.dataengineeringshow.com/e/6nrrrlzn">Zach Wilson on What Makes a Great Data Engineer</a></li><li><a href="https://www.dataengineeringshow.com/e/v8w441mn">Joe Reis and Matt Housley on The Fundamentals of Data Engineering</a></li><li><a href="https://www.dataengineeringshow.com/e/p8lxxy18">Bill Inmon, The Godfather of Data Warehousing</a></li></ul>]]></content:encoded>
      <pubDate>Tue, 11 Feb 2025 10:16:00 +0000</pubDate>
      <author>The Firebolt Data Bros</author>
      <enclosure url="https://media.fame.so/821vjzrw.mp3" length="73342695" type="audio/mpeg"/>
      <itunes:author>The Firebolt Data Bros</itunes:author>
      <itunes:image href="https://content.fameapp.so/uploads/86qywn5q/caa7efb0-6c25-11ef-9489-d768d5a0a06d/caa7edf0-6c25-11ef-a69f-3da5bd01e0ea.jpg"/>
      <itunes:duration>1833</itunes:duration>
      <itunes:summary>In this episode of The Data Engineering Show, the bros sit with Daniel Pálma, Head of Marketing at Estuary, to delve into the intriguing world of data engineering and marketing. Daniel shares his transition journey into marketing from data engineering and how his technical proficiency has been leveraged to market to engineers. The conversation cuts across the importance of AI in data movement, the future of data engineering, real-time data integration challenges, and the evolution of data integration.</itunes:summary>
      <itunes:subtitle>In this episode of The Data Engineering Show, the bros sit with Daniel Pálma, Head of Marketing at Estuary, to delve into the intriguing world of data engineering and marketing. Daniel shares his transition journey into marketing from data engineering and how his technical proficiency has been leveraged to market to engineers. The conversation cuts across the importance of AI in data movement, the future of data engineering, real-time data integration challenges, and the evolution of data integration.</itunes:subtitle>
      <itunes:keywords>Data Engineering Marketing, Data Movement Integration, ETL Tools Comparison, Data Warehouse Transition, AI Data Movement, Streaming Data Pipelines, Real-Time Data Integration, Data Movement Solutions, SQL Optimization, Data Warehouse Migration</itunes:keywords>
      <itunes:explicit>No</itunes:explicit>
      <googleplay:explicit>No</googleplay:explicit>
    </item>
    <item>
      <title>AI and Data Change Management with Chad Sanderson, CEO Gable AI</title>
      <link>https://podcasts.fame.so/e/2861r0rn-ai-and-data-change-management-with-chad-sanderson-ceo-gable-ai</link>
      <itunes:title>AI and Data Change Management with Chad Sanderson, CEO Gable AI</itunes:title>
      <itunes:episode>38</itunes:episode>
      <itunes:block>No</itunes:block>
      <googleplay:block>No</googleplay:block>
      <guid isPermaLink="false">j122vyv1</guid>
      <description>In this episode of The Data Engineering Show, host Benjamin and co-host Eldad are joined by Chad Sanderson, CEO and co-founder of Gable AI to discuss the revolution of data quality and governance, the importance of understanding data flow and the processes that help organizations manage their data more effectively.</description>
      <content:encoded><![CDATA[<div>In this episode of The Data Engineering Show, host Benjamin and co-host Eldad sit with Chad Sanderson, CEO and co-founder of Gable AI&nbsp; to explore the interesting world of data change management.</div><div><br></div><div>Join them as they:</div><ul><li>Delve into challenges of data quality, how it degrades over time and the one-sided data quality checks on the “last mile” of the data supply chain.</li><li>Talk about how Gable works through a 3-layer flow of technology which is to identify data production points, trace the data flow and communicate the impact of changes before they reach production.</li><li>Explain why the gap between data producers and consumers need to be bridged and how Gable continues to emphasize the need for effective communication and understanding data change management across teams</li><li>Shine light on how AI can enhance data management by extracting semantics from code and effectively manage the translation output.</li><li>Discuss Chad’s vision for 2025 which is to help companies start to care about data and how the changes made to data affect other people.</li></ul><div><br></div><div>Chad Sanderson is the CEO and co-founder of Gable AI, a data change management platform. Chad has over a decade of experience in data engineering and infrastructure space, holding significant roles at major companies like Microsoft, Oracle, Sephora where he focused on data quality and governance challenges. He is a former Head of Data at Convoy, a LinkedIn writer, and a published author. He lives in Seattle, Washington, and is the Chief Operator of the Data Quality Camp. His journey from data scientist to data engineer and ultimately to CEO was driven by a desire to transform how organizations manage and utilize data. Gable AI addresses the complexities of the data supply chain, by providing tools for code scanning, data contracts and governance as code, enabling teams to proactively manage data changes and impact.<br><br></div><div>If you enjoyed this episode, make sure to subscribe, rate, and review it on <a href="https://podcasts.apple.com/gb/podcast/the-data-engineering-show/id1561927688">Apple Podcasts</a>, <a href="https://open.spotify.com/show/6hMdnrFKlPbia2k6MkFs8U">Spotify</a>, and <a href="https://www.youtube.com/@thedataengineeringshow">YouTube Podcasts</a>. Instructions on how to do this are <a href="https://www.fame.so/follow-rate-review">here</a>.</div><div><br><strong>Episode Resources</strong></div><ul><li>Gable AI <a href="https://www.linkedin.com/company/gable-ai/">website</a></li><li>Chad Sanderson on <a href="https://www.linkedin.com/in/chad-sanderson/">LinkedIn</a></li></ul><div><br><strong>For Feedback &amp; Discussions on Firebolt Core:</strong><br><br></div><ul><li><a href="https://discord.com/invite/UpMPDHActM">Join Firebolt Discord Community</a></li><li><a href="https://github.com/firebolt-db/firebolt-core/discussions">Join Firebolt GitHub Discussions</a></li><li><a href="https://github.com/firebolt-db/firebolt-core">Firebolt Core Github Repository</a>&nbsp;</li><li>Benjamin@Firebolt.io</li></ul><div><br><br></div><div>The Data Engineering Show is brought to you by <a href="https://www.firebolt.io/">firebolt.io</a> and handcrafted by our friends over at: <a href="https://www.fame.so/?utm_medium=podcast&amp;utm_source=bcast&amp;utm_campaign=masters-of-community-with-david-spinks?utm_medium=podcast&amp;utm_source=bcast&amp;utm_campaign=confessions-of-a-b2b-marketer">fame.so</a><br><br>Previous guests include: Joseph Machado of Linkedin, Metthew Weingarten of Disney, Joe Reis and Matt Housely, authors of <em>The Fundamentals of Data Engineering, </em>Zach Wilson of Eczachly Inc, Megan Lieu of Deepnote, Erik Heintare of Bolt, Lior Solomon of Vimeo, Krishna Naidu of Canva, Mike Cohen of Substack, Jens Larsson of Ark, Gunnar Tangring of Klarna, Yoav Shmaria of Similarweb and Xiaoxu Gao of Adyen.<br><br>Check out our three most downloaded episodes:</div><ul><li><a href="https://www.dataengineeringshow.com/e/6nrrrlzn">Zach Wilson on What Makes a Great Data Engineer</a></li><li><a href="https://www.dataengineeringshow.com/e/v8w441mn">Joe Reis and Matt Housley on The Fundamentals of Data Engineering</a></li><li><a href="https://www.dataengineeringshow.com/e/p8lxxy18">Bill Inmon, The Godfather of Data Warehousing</a></li></ul>]]></content:encoded>
      <pubDate>Tue, 07 Jan 2025 10:00:00 +0000</pubDate>
      <author>The Firebolt Data Bros</author>
      <enclosure url="https://media.fame.so/wvy297v8.mp3" length="88151770" type="audio/mpeg"/>
      <itunes:author>The Firebolt Data Bros</itunes:author>
      <itunes:image href="https://content.fameapp.so/uploads/86qywn5q/caa7efb0-6c25-11ef-9489-d768d5a0a06d/caa7edf0-6c25-11ef-a69f-3da5bd01e0ea.jpg"/>
      <itunes:duration>2203</itunes:duration>
      <itunes:summary>In this episode of The Data Engineering Show, host Benjamin and co-host Eldad are joined by Chad Sanderson, CEO and co-founder of Gable AI to discuss the revolution of data quality and governance, the importance of understanding data flow and the processes that help organizations manage their data more effectively.</itunes:summary>
      <itunes:subtitle>In this episode of The Data Engineering Show, host Benjamin and co-host Eldad are joined by Chad Sanderson, CEO and co-founder of Gable AI to discuss the revolution of data quality and governance, the importance of understanding data flow and the processes that help organizations manage their data more effectively.</itunes:subtitle>
      <itunes:keywords>Chad Sanderson, gable, 3-layer flow of technology, technology, Benjamin Wagner, understanding data, data, AI, SQL, effective communication,Data Engineering,</itunes:keywords>
      <itunes:explicit>No</itunes:explicit>
      <googleplay:explicit>No</googleplay:explicit>
    </item>
    <item>
      <title>Tech Stacks and Tradeoffs: Xudo's Founder on Picking the Right Tools for BI Success</title>
      <link>https://podcasts.fame.so/e/qn0q69q8-tech-stacks-and-tradeoffs-xudo-s-founder-on-picking-the-right-tools-for-bi-success</link>
      <itunes:title>Tech Stacks and Tradeoffs: Xudo's Founder on Picking the Right Tools for BI Success</itunes:title>
      <itunes:episode>37</itunes:episode>
      <itunes:block>No</itunes:block>
      <googleplay:block>No</googleplay:block>
      <guid isPermaLink="false">p1kpzyp0</guid>
      <description>Wouter Trappers is the founder of Xudo and shares his slightly unconventional path from philosopher to data consultant with the Bros in this latest episode of The Data Engineering Show. Wouter’s grounding in philosophy has proved to be a shaping influence on his approach to business intelligence. Much more than just a software solution, for Wouter, BI is all about change management and aligning leadership with data projects.</description>
      <content:encoded><![CDATA[<div><br>Wouter Trappers is the founder of Xudo and shares his slightly unconventional path from philosopher to data consultant with the Bros in this latest episode of <em>The Data Engineering Show. </em>Wouter’s grounding in philosophy has proved to be a shaping influence on his approach to business intelligence. Much more than just a software solution, for Wouter, BI is all about change management and aligning leadership with data projects.<br><br></div><div><br>They discuss:<br><br></div><ul><li><strong><br>From Excel to Expert:</strong> From basic Excel tasks to a full mastery of BI tools like QlikView, Wouter has blended his technical and philosophical approaches to data to become a bona fide expert.</li><li><strong>Data Strategy as Transformation:</strong> Good change management principles have to be adhered to if a BI project is going to bear fruit. Focus on leadership alignment, KPI clarity, and user empowerment instead of simply implementing software.&nbsp;</li><li><strong>Challenges of Starting Small:</strong> Wouter has some tips to offer smaller companies around bootstrapping their data journey using existing tools, practical education, and even Gen AI.</li><li><strong>Balancing Scales:</strong> Smaller startups compared to large enterprises face a very different set of challenges.<br><br></li></ul><div><br>Wouter’s combination of philosophy and pragmatism brings fresh takes to building effective data solutions.<br><br></div><div><br><br></div><div>The Data Engineering Show is brought to you by <a href="https://www.firebolt.io/">firebolt.io</a> and handcrafted by our friends over at: <a href="https://www.fame.so/?utm_medium=podcast&amp;utm_source=bcast&amp;utm_campaign=masters-of-community-with-david-spinks?utm_medium=podcast&amp;utm_source=bcast&amp;utm_campaign=confessions-of-a-b2b-marketer">fame.so</a><br><br>Previous guests include: Joseph Machado of Linkedin, Metthew Weingarten of Disney, Joe Reis and Matt Housely, authors of <em>The Fundamentals of Data Engineering, </em>Zach Wilson of Eczachly Inc, Megan Lieu of Deepnote, Erik Heintare of Bolt, Lior Solomon of Vimeo, Krishna Naidu of Canva, Mike Cohen of Substack, Jens Larsson of Ark, Gunnar Tangring of Klarna, Yoav Shmaria of Similarweb and Xiaoxu Gao of Adyen.<br><br>Check out our three most downloaded episodes:</div><ul><li><a href="https://www.dataengineeringshow.com/e/6nrrrlzn">Zach Wilson on What Makes a Great Data Engineer</a></li><li><a href="https://www.dataengineeringshow.com/e/v8w441mn">Joe Reis and Matt Housley on The Fundamentals of Data Engineering</a></li><li><a href="https://www.dataengineeringshow.com/e/p8lxxy18">Bill Inmon, The Godfather of Data Warehousing</a></li></ul>]]></content:encoded>
      <pubDate>Tue, 26 Nov 2024 10:15:00 +0000</pubDate>
      <author>The Firebolt Data Bros</author>
      <enclosure url="https://media.fame.so/8vy255kw.mp3" length="59873697" type="audio/mpeg"/>
      <itunes:author>The Firebolt Data Bros</itunes:author>
      <itunes:image href="https://content.fameapp.so/uploads/86qywn5q/caa7efb0-6c25-11ef-9489-d768d5a0a06d/caa7edf0-6c25-11ef-a69f-3da5bd01e0ea.jpg"/>
      <itunes:duration>1496</itunes:duration>
      <itunes:summary>Wouter Trappers is the founder of Xudo and shares his slightly unconventional path from philosopher to data consultant with the Bros in this latest episode of The Data Engineering Show. Wouter’s grounding in philosophy has proved to be a shaping influence on his approach to business intelligence. Much more than just a software solution, for Wouter, BI is all about change management and aligning leadership with data projects.</itunes:summary>
      <itunes:subtitle>Wouter Trappers is the founder of Xudo and shares his slightly unconventional path from philosopher to data consultant with the Bros in this latest episode of The Data Engineering Show. Wouter’s grounding in philosophy has proved to be a shaping influence on his approach to business intelligence. Much more than just a software solution, for Wouter, BI is all about change management and aligning leadership with data projects.</itunes:subtitle>
      <itunes:keywords>Wouter Trappers, Benjamin Wagner, Data, AI, Data Engineering,  business intelligence, Xudo, Excel , Access, Data Strategy</itunes:keywords>
      <itunes:explicit>No</itunes:explicit>
      <googleplay:explicit>No</googleplay:explicit>
    </item>
    <item>
      <title>Data Rewind: Conversation Highlights from Zach Wilson, Matthew Housley, Joe Reis, and Krishnan Viswanathan</title>
      <link>https://podcasts.fame.so/e/08j0y398-data-rewind-conversation-highlights-from-zach-wilson-matthew-housley-joe-reis-and-krishnan-viswanathan</link>
      <itunes:title>Data Rewind: Conversation Highlights from Zach Wilson, Matthew Housley, Joe Reis, and Krishnan Viswanathan</itunes:title>
      <itunes:episode>39</itunes:episode>
      <itunes:block>No</itunes:block>
      <googleplay:block>No</googleplay:block>
      <guid isPermaLink="false">41pmqj80</guid>
      <description>This is a special episode of The Data Engineering Show, and joining the Bros is not one guest, nor even two – instead they’re revisiting the best bits from three different fascinating episodes. In each, they spotlight essential trends and lessons learned across the evolving data engineering landscape. From data observability to bridging academia with real-world practice, this episode covers perspectives on where data engineering is heading and why certain challenges persist.</description>
      <content:encoded><![CDATA[<div><br>In this special roundup episode of <em>The Data Engineering Show</em>, the Bros revisits some of the best bits from episodes with data thought leaders Zach Wilson, Matthew Housley, Joe Reis, and Krishnan Viswanathan, spotlighting essential trends and lessons learned across the evolving data engineering landscape. From data observability to bridging academia with real-world practice, this episode covers perspectives on where data engineering is heading and why certain challenges persist.<br><br></div><div><br>Topics include:<br><br></div><ul><li><strong>Foundations of Data Engineering</strong>: Zach Wilson emphasizes the importance of core, tech-agnostic skills in data modeling, quality assurance, and storytelling. By sharing his experiences at Airbnb and in education, he reveals that effective data engineering hinges on creating robust data models, quality controls, and persuasive narratives rather than expertise in any single tool or language.</li><li><strong>Bridging Academia and Practice:</strong> Matthew Housley and Joe Reis delve into the need for better data education, emphasizing hands-on experience and data fundamentals over tool-specific training, and advocate for apprenticeships and real-world collaborations in educational settings.</li><li><strong>Legacy Meets Modern in Data Engineering:</strong> Krishnan Viswanathan reflects on recurring themes in data engineering and the importance of adapting legacy approaches to new data needs, underscoring the challenges and benefits of vendor-built versus in-house solutions.<br><br></li></ul><div><br>Join the Bros for a well-rounded exploration of current themes in data engineering, filled with practical advice for data professionals at any stage of their journey.<br><br></div><div><br><br></div><div>The Data Engineering Show is brought to you by <a href="https://www.firebolt.io/">firebolt.io</a> and handcrafted by our friends over at: <a href="https://www.fame.so/?utm_medium=podcast&amp;utm_source=bcast&amp;utm_campaign=masters-of-community-with-david-spinks?utm_medium=podcast&amp;utm_source=bcast&amp;utm_campaign=confessions-of-a-b2b-marketer">fame.so</a><br><br>Previous guests include: Joseph Machado of Linkedin, Metthew Weingarten of Disney, Joe Reis and Matt Housely, authors of <em>The Fundamentals of Data Engineering, </em>Zach Wilson of Eczachly Inc, Megan Lieu of Deepnote, Erik Heintare of Bolt, Lior Solomon of Vimeo, Krishna Naidu of Canva, Mike Cohen of Substack, Jens Larsson of Ark, Gunnar Tangring of Klarna, Yoav Shmaria of Similarweb and Xiaoxu Gao of Adyen.<br><br>Check out our three most downloaded episodes:</div><ul><li><a href="https://www.dataengineeringshow.com/e/6nrrrlzn">Zach Wilson on What Makes a Great Data Engineer</a></li><li><a href="https://www.dataengineeringshow.com/e/v8w441mn">Joe Reis and Matt Housley on The Fundamentals of Data Engineering</a></li><li><a href="https://www.dataengineeringshow.com/e/p8lxxy18">Bill Inmon, The Godfather of Data Warehousing</a></li></ul>]]></content:encoded>
      <pubDate>Thu, 31 Oct 2024 13:41:00 +0000</pubDate>
      <author>The Firebolt Data Bros</author>
      <enclosure url="https://media.fame.so/8163x5jw.mp3" length="67294280" type="audio/mpeg"/>
      <itunes:author>The Firebolt Data Bros</itunes:author>
      <itunes:image href="https://content.fameapp.so/uploads/86qywn5q/caa7efb0-6c25-11ef-9489-d768d5a0a06d/caa7edf0-6c25-11ef-a69f-3da5bd01e0ea.jpg"/>
      <itunes:duration>1682</itunes:duration>
      <itunes:summary>This is a special episode of The Data Engineering Show, and joining the Bros is not one guest, nor even two – instead they’re revisiting the best bits from three different fascinating episodes. In each, they spotlight essential trends and lessons learned across the evolving data engineering landscape. From data observability to bridging academia with real-world practice, this episode covers perspectives on where data engineering is heading and why certain challenges persist.</itunes:summary>
      <itunes:subtitle>This is a special episode of The Data Engineering Show, and joining the Bros is not one guest, nor even two – instead they’re revisiting the best bits from three different fascinating episodes. In each, they spotlight essential trends and lessons learned across the evolving data engineering landscape. From data observability to bridging academia with real-world practice, this episode covers perspectives on where data engineering is heading and why certain challenges persist.</itunes:subtitle>
      <itunes:keywords>Data engineering skills, Data quality in engineering, Data modeling techniques, Data observability trends, Data engineering boot camps, ML ops in data engineering, SQL and Python for data engineers, Challenges in data engineering, Data engineering curriculum, Cloud data warehouse analytics</itunes:keywords>
      <itunes:explicit>No</itunes:explicit>
      <googleplay:explicit>No</googleplay:explicit>
    </item>
    <item>
      <title>The Resurgence of SQL: Insights from Ryanne Dolan from LinkedIn</title>
      <link>https://podcasts.fame.so/e/p8m557w8-the-resurgence-of-sql-insights-from-ryanne-dolan-from-linkedin</link>
      <itunes:title>The Resurgence of SQL: Insights from Ryanne Dolan from LinkedIn</itunes:title>
      <itunes:episode>38</itunes:episode>
      <itunes:block>No</itunes:block>
      <googleplay:block>No</googleplay:block>
      <guid isPermaLink="false">70vll590</guid>
      <description>In this episode of The Data Engineering Show, Ryanne Dolan from LinkedIn joins the Bros to discuss LinkedIn's Hoptimator project. Ryanne explains how they’re simplifying complex data workflows by automating them through SQL queries, integrating Kubernetes, Kafka, and Flink. The conversation highlights the shift towards a consumer-driven data model and the future of data engineering.</description>
      <content:encoded><![CDATA[<div><br>In this episode of <em>The Data Engineering Show</em>, the bros, Eldad and Benjamin are joined by Ryanne Dolan from LinkedIn to discuss the innovative <a href="https://github.com/linkedin/Hoptimator">Hoptimator</a> (H2) project. This conversation reveals how LinkedIn has improved its data pipelines by automating the setup and management of complex workflows.<br><br></div><div><br>Together they cover:<br><br></div><ul><li><strong>Automated Data Pipelines:</strong> Ryanne explains how Hoptimator allows users to create and manage data pipelines using just a simple SQL SELECT query, streamlining the process of setting up Kafka topics, Flink jobs, and schemas.</li><li><strong>Integration with Kubernetes:</strong> The project utilizes Kubernetes to handle infrastructure tasks, treating Kubernetes as a database for managing state. This integration simplifies the orchestration of data workflows and automates routine tasks.</li><li><strong>Consumer-Driven Model:</strong> Ryanne discusses the shift from a producer-driven to a consumer-driven data model, emphasizing the importance of understanding and addressing consumer needs to reduce engineering complexity and optimize data systems.</li><li><strong>Future of Data Engineering:</strong> The conversation touches on the ongoing experimental nature of Hoptimator and its potential to transform data engineering practices, highlighting its impact on LinkedIn's data infrastructure.</li></ul><div><br><br></div><div>The Data Engineering Show is brought to you by <a href="https://www.firebolt.io/">firebolt.io</a> and handcrafted by our friends over at: <a href="https://www.fame.so/?utm_medium=podcast&amp;utm_source=bcast&amp;utm_campaign=masters-of-community-with-david-spinks?utm_medium=podcast&amp;utm_source=bcast&amp;utm_campaign=confessions-of-a-b2b-marketer">fame.so</a><br><br>Previous guests include: Joseph Machado of Linkedin, Metthew Weingarten of Disney, Joe Reis and Matt Housely, authors of <em>The Fundamentals of Data Engineering, </em>Zach Wilson of Eczachly Inc, Megan Lieu of Deepnote, Erik Heintare of Bolt, Lior Solomon of Vimeo, Krishna Naidu of Canva, Mike Cohen of Substack, Jens Larsson of Ark, Gunnar Tangring of Klarna, Yoav Shmaria of Similarweb and Xiaoxu Gao of Adyen.<br><br>Check out our three most downloaded episodes:</div><ul><li><a href="https://www.dataengineeringshow.com/e/6nrrrlzn">Zach Wilson on What Makes a Great Data Engineer</a></li><li><a href="https://www.dataengineeringshow.com/e/v8w441mn">Joe Reis and Matt Housley on The Fundamentals of Data Engineering</a></li><li><a href="https://www.dataengineeringshow.com/e/p8lxxy18">Bill Inmon, The Godfather of Data Warehousing</a></li></ul>]]></content:encoded>
      <pubDate>Tue, 24 Sep 2024 10:00:00 +0000</pubDate>
      <author>The Firebolt Data Bros</author>
      <enclosure url="https://media.fame.so/w3l03668.mp3" length="79094612" type="audio/mpeg"/>
      <itunes:author>The Firebolt Data Bros</itunes:author>
      <itunes:image href="https://content.fameapp.so/uploads/86qywn5q/caa7efb0-6c25-11ef-9489-d768d5a0a06d/caa7edf0-6c25-11ef-a69f-3da5bd01e0ea.jpg"/>
      <itunes:duration>1977</itunes:duration>
      <itunes:summary>In this episode of The Data Engineering Show, Ryanne Dolan from LinkedIn joins the Bros to discuss LinkedIn's Hoptimator project. Ryanne explains how they’re simplifying complex data workflows by automating them through SQL queries, integrating Kubernetes, Kafka, and Flink. The conversation highlights the shift towards a consumer-driven data model and the future of data engineering.</itunes:summary>
      <itunes:subtitle>In this episode of The Data Engineering Show, Ryanne Dolan from LinkedIn joins the Bros to discuss LinkedIn's Hoptimator project. Ryanne explains how they’re simplifying complex data workflows by automating them through SQL queries, integrating Kubernetes, Kafka, and Flink. The conversation highlights the shift towards a consumer-driven data model and the future of data engineering.</itunes:subtitle>
      <itunes:keywords>Data pipelines automation, Kafka and Flink integration, SQL-driven data workflows, Kubernetes in data engineering, Multi-hop data pipelines, Ryanne Dolan, Optimizing data with Apache Flink, Consumer-driven data architecture, SQL to YAML transformation, LinkedIn data infrastructure, Automating data workflows with Kubernetes</itunes:keywords>
      <itunes:explicit>No</itunes:explicit>
      <googleplay:explicit>No</googleplay:explicit>
    </item>
    <item>
      <title>Vector Databases Won’t Replace SQL - Andy Pavlo</title>
      <link>https://podcasts.fame.so/e/v8555xq8</link>
      <itunes:title>Vector Databases Won’t Replace SQL - Andy Pavlo</itunes:title>
      <itunes:episode>37</itunes:episode>
      <itunes:block>No</itunes:block>
      <googleplay:block>No</googleplay:block>
      <guid isPermaLink="false">81qqq6n1</guid>
      <description>SQL’s slow. SQL’s stupid. We hear these claims every time a new shiny tool enters the market, only to realize five years later when the hype dies down that SQL is actually a good idea. In this super techie episode of the Data Engineering Show, Andy Pavlo, Associate Professor at Carnegie Mellon University, joins the bros to delve into database internals and optimization. Andy discusses leveraging ML for autonomous database optimization, using Postgres for practical applications, tuning production databases safely, and why SQL is here to stay.</description>
      <content:encoded><![CDATA[<p>SQL’s slow. SQL’s stupid. We hear these claims every time a new shiny tool enters the market, only to realize five years later when the hype dies down that SQL is actually a good idea. </p><p>In this super techie episode of the Data Engineering Show, Andy Pavlo, Associate Professor at Carnegie Mellon University, joins the bros to delve into database internals and optimization. </p><p>Andy discusses leveraging ML for autonomous database optimization, using Postgres for practical applications, tuning production databases safely, and why SQL is here to stay.</p><div>The Data Engineering Show is brought to you by <a href="https://www.firebolt.io/">firebolt.io</a> and handcrafted by our friends over at: <a href="https://www.fame.so/?utm_medium=podcast&amp;utm_source=bcast&amp;utm_campaign=masters-of-community-with-david-spinks?utm_medium=podcast&amp;utm_source=bcast&amp;utm_campaign=confessions-of-a-b2b-marketer">fame.so</a><br><br>Previous guests include: Joseph Machado of Linkedin, Metthew Weingarten of Disney, Joe Reis and Matt Housely, authors of <em>The Fundamentals of Data Engineering, </em>Zach Wilson of Eczachly Inc, Megan Lieu of Deepnote, Erik Heintare of Bolt, Lior Solomon of Vimeo, Krishna Naidu of Canva, Mike Cohen of Substack, Jens Larsson of Ark, Gunnar Tangring of Klarna, Yoav Shmaria of Similarweb and Xiaoxu Gao of Adyen.<br><br>Check out our three most downloaded episodes:</div><ul><li><a href="https://www.dataengineeringshow.com/e/6nrrrlzn">Zach Wilson on What Makes a Great Data Engineer</a></li><li><a href="https://www.dataengineeringshow.com/e/v8w441mn">Joe Reis and Matt Housley on The Fundamentals of Data Engineering</a></li><li><a href="https://www.dataengineeringshow.com/e/p8lxxy18">Bill Inmon, The Godfather of Data Warehousing</a></li></ul>]]></content:encoded>
      <pubDate>Tue, 04 Jun 2024 00:25:06 +0000</pubDate>
      <author>The Firebolt Data Bros</author>
      <enclosure url="https://media.fame.so/w95zy9rw.mp3" length="45029028" type="audio/mpeg"/>
      <itunes:author>The Firebolt Data Bros</itunes:author>
      <itunes:image href="https://content.fameapp.so/uploads/86qywn5q/caa7efb0-6c25-11ef-9489-d768d5a0a06d/caa7edf0-6c25-11ef-a69f-3da5bd01e0ea.jpg"/>
      <itunes:duration>2579</itunes:duration>
      <itunes:summary>SQL’s slow. SQL’s stupid. We hear these claims every time a new shiny tool enters the market, only to realize five years later when the hype dies down that SQL is actually a good idea. In this super techie episode of the Data Engineering Show, Andy Pavlo, Associate Professor at Carnegie Mellon University, joins the bros to delve into database internals and optimization. Andy discusses leveraging ML for autonomous database optimization, using Postgres for practical applications, tuning production databases safely, and why SQL is here to stay.</itunes:summary>
      <itunes:subtitle>SQL’s slow. SQL’s stupid. We hear these claims every time a new shiny tool enters the market, only to realize five years later when the hype dies down that SQL is actually a good idea. In this super techie episode of the Data Engineering Show, Andy Pavlo, Associate Professor at Carnegie Mellon University, joins the bros to delve into database internals and optimization. Andy discusses leveraging ML for autonomous database optimization, using Postgres for practical applications, tuning production databases safely, and why SQL is here to stay.</itunes:subtitle>
      <itunes:keywords>data engineering, analytics, data</itunes:keywords>
      <itunes:explicit>No</itunes:explicit>
      <googleplay:explicit>No</googleplay:explicit>
    </item>
    <item>
      <title>How ZoomInfo transitioned from data graveyards to ROI-driven data projects</title>
      <link>https://podcasts.fame.so/e/xn144qx8</link>
      <itunes:title>How ZoomInfo transitioned from data graveyards to ROI-driven data projects</itunes:title>
      <itunes:episode>36</itunes:episode>
      <itunes:block>No</itunes:block>
      <googleplay:block>No</googleplay:block>
      <guid isPermaLink="false">703pp6j1</guid>
      <description>Too often expensive resources and manhours are spent on dashboards no one uses, resulting in zero ROI. Philip Philip Zelitchenko, VP of Data &amp;amp; Analytics at ZoomInfo met the bros to talk about adopting product management principles to ensure data projects have value, and provide an unfiltered peak into ZoomInfo’s data stack and unique tech culture. </description>
      <content:encoded><![CDATA[<p>Too often expensive resources and manhours are spent on dashboards no one uses, resulting in zero ROI. Philip Philip Zelitchenko, VP of Data &amp; Analytics at ZoomInfo met the bros to talk about adopting product management principles to ensure data projects have value, and provide an unfiltered peak into ZoomInfo’s data stack and unique tech culture. </p><div>The Data Engineering Show is brought to you by <a href="https://www.firebolt.io/">firebolt.io</a> and handcrafted by our friends over at: <a href="https://www.fame.so/?utm_medium=podcast&amp;utm_source=bcast&amp;utm_campaign=masters-of-community-with-david-spinks?utm_medium=podcast&amp;utm_source=bcast&amp;utm_campaign=confessions-of-a-b2b-marketer">fame.so</a><br><br>Previous guests include: Joseph Machado of Linkedin, Metthew Weingarten of Disney, Joe Reis and Matt Housely, authors of <em>The Fundamentals of Data Engineering, </em>Zach Wilson of Eczachly Inc, Megan Lieu of Deepnote, Erik Heintare of Bolt, Lior Solomon of Vimeo, Krishna Naidu of Canva, Mike Cohen of Substack, Jens Larsson of Ark, Gunnar Tangring of Klarna, Yoav Shmaria of Similarweb and Xiaoxu Gao of Adyen.<br><br>Check out our three most downloaded episodes:</div><ul><li><a href="https://www.dataengineeringshow.com/e/6nrrrlzn">Zach Wilson on What Makes a Great Data Engineer</a></li><li><a href="https://www.dataengineeringshow.com/e/v8w441mn">Joe Reis and Matt Housley on The Fundamentals of Data Engineering</a></li><li><a href="https://www.dataengineeringshow.com/e/p8lxxy18">Bill Inmon, The Godfather of Data Warehousing</a></li></ul>]]></content:encoded>
      <pubDate>Tue, 16 Apr 2024 03:49:13 +0000</pubDate>
      <author>The Firebolt Data Bros</author>
      <enclosure url="https://media.fame.so/w7p6zyk8.mp3" length="41665006" type="audio/mpeg"/>
      <itunes:author>The Firebolt Data Bros</itunes:author>
      <itunes:image href="https://content.fameapp.so/uploads/86qywn5q/caa7efb0-6c25-11ef-9489-d768d5a0a06d/caa7edf0-6c25-11ef-a69f-3da5bd01e0ea.jpg"/>
      <itunes:duration>2386</itunes:duration>
      <itunes:summary>Too often expensive resources and manhours are spent on dashboards no one uses, resulting in zero ROI. Philip Philip Zelitchenko, VP of Data &amp;amp; Analytics at ZoomInfo met the bros to talk about adopting product management principles to ensure data projects have value, and provide an unfiltered peak into ZoomInfo’s data stack and unique tech culture. </itunes:summary>
      <itunes:subtitle>Too often expensive resources and manhours are spent on dashboards no one uses, resulting in zero ROI. Philip Philip Zelitchenko, VP of Data &amp;amp; Analytics at ZoomInfo met the bros to talk about adopting product management principles to ensure data projects have value, and provide an unfiltered peak into ZoomInfo’s data stack and unique tech culture. </itunes:subtitle>
      <itunes:keywords>data engineering, analytics, data</itunes:keywords>
      <itunes:explicit>No</itunes:explicit>
      <googleplay:explicit>No</googleplay:explicit>
    </item>
    <item>
      <title>Matthew Weingarten from Disney Streaming about Data Quality Best Practices</title>
      <link>https://podcasts.fame.so/e/x8ymmwl8</link>
      <itunes:title>Matthew Weingarten from Disney Streaming about Data Quality Best Practices</itunes:title>
      <itunes:episode>35</itunes:episode>
      <itunes:block>No</itunes:block>
      <googleplay:block>No</googleplay:block>
      <guid isPermaLink="false">l04nnkw0</guid>
      <description>Matthew Weingarten, Lead Data Engineer at Disney Streaming, talks about principles essential for data quality, cost optimization, debugging, and data modeling, as adopted by the world's leading companies.</description>
      <content:encoded><![CDATA[<p>Matthew Weingarten, Lead Data Engineer at Disney Streaming, talks about principles essential for data quality, cost optimization, debugging, and data modeling, as adopted by the world's leading companies.</p><div>The Data Engineering Show is brought to you by <a href="https://www.firebolt.io/">firebolt.io</a> and handcrafted by our friends over at: <a href="https://www.fame.so/?utm_medium=podcast&amp;utm_source=bcast&amp;utm_campaign=masters-of-community-with-david-spinks?utm_medium=podcast&amp;utm_source=bcast&amp;utm_campaign=confessions-of-a-b2b-marketer">fame.so</a><br><br>Previous guests include: Joseph Machado of Linkedin, Metthew Weingarten of Disney, Joe Reis and Matt Housely, authors of <em>The Fundamentals of Data Engineering, </em>Zach Wilson of Eczachly Inc, Megan Lieu of Deepnote, Erik Heintare of Bolt, Lior Solomon of Vimeo, Krishna Naidu of Canva, Mike Cohen of Substack, Jens Larsson of Ark, Gunnar Tangring of Klarna, Yoav Shmaria of Similarweb and Xiaoxu Gao of Adyen.<br><br>Check out our three most downloaded episodes:</div><ul><li><a href="https://www.dataengineeringshow.com/e/6nrrrlzn">Zach Wilson on What Makes a Great Data Engineer</a></li><li><a href="https://www.dataengineeringshow.com/e/v8w441mn">Joe Reis and Matt Housley on The Fundamentals of Data Engineering</a></li><li><a href="https://www.dataengineeringshow.com/e/p8lxxy18">Bill Inmon, The Godfather of Data Warehousing</a></li></ul>]]></content:encoded>
      <pubDate>Tue, 26 Mar 2024 00:54:45 +0000</pubDate>
      <author>The Firebolt Data Bros</author>
      <enclosure url="https://media.fame.so/8py76kqw.mp3" length="26467073" type="audio/mpeg"/>
      <itunes:author>The Firebolt Data Bros</itunes:author>
      <itunes:image href="https://content.fameapp.so/uploads/86qywn5q/caa7efb0-6c25-11ef-9489-d768d5a0a06d/caa7edf0-6c25-11ef-a69f-3da5bd01e0ea.jpg"/>
      <itunes:duration>1641</itunes:duration>
      <itunes:summary>Matthew Weingarten, Lead Data Engineer at Disney Streaming, talks about principles essential for data quality, cost optimization, debugging, and data modeling, as adopted by the world's leading companies.</itunes:summary>
      <itunes:subtitle>Matthew Weingarten, Lead Data Engineer at Disney Streaming, talks about principles essential for data quality, cost optimization, debugging, and data modeling, as adopted by the world's leading companies.</itunes:subtitle>
      <itunes:keywords>data engineering, analytics, data</itunes:keywords>
      <itunes:explicit>No</itunes:explicit>
      <googleplay:explicit>No</googleplay:explicit>
    </item>
    <item>
      <title>Joseph Machado, Senior Data Engineer @ LinkedIn talks best practices</title>
      <link>https://podcasts.fame.so/e/rnkmmyvn</link>
      <itunes:title>Joseph Machado, Senior Data Engineer @ LinkedIn talks best practices</itunes:title>
      <itunes:episode>34</itunes:episode>
      <itunes:block>No</itunes:block>
      <googleplay:block>No</googleplay:block>
      <guid isPermaLink="false">70w77vw0</guid>
      <description>Data engineering should be less about the stack and more about best practices. While tools may change, foundational principles will remain constant. Joseph Mercado, Senior Data Engineer at LinkedIn, is on The Data Engineering Show to talk about principles that are key to success, leveraging AI for automation, and adopting software engineering methods. </description>
      <content:encoded><![CDATA[<p>Data engineering should be less about the stack and more about best practices. While tools may change, foundational principles will remain constant. Joseph Mercado, Senior Data Engineer at LinkedIn, is on The Data Engineering Show to talk about principles that are key to success, leveraging AI for automation, and adopting software engineering methods. </p><div>The Data Engineering Show is brought to you by <a href="https://www.firebolt.io/">firebolt.io</a> and handcrafted by our friends over at: <a href="https://www.fame.so/?utm_medium=podcast&amp;utm_source=bcast&amp;utm_campaign=masters-of-community-with-david-spinks?utm_medium=podcast&amp;utm_source=bcast&amp;utm_campaign=confessions-of-a-b2b-marketer">fame.so</a><br><br>Previous guests include: Joseph Machado of Linkedin, Metthew Weingarten of Disney, Joe Reis and Matt Housely, authors of <em>The Fundamentals of Data Engineering, </em>Zach Wilson of Eczachly Inc, Megan Lieu of Deepnote, Erik Heintare of Bolt, Lior Solomon of Vimeo, Krishna Naidu of Canva, Mike Cohen of Substack, Jens Larsson of Ark, Gunnar Tangring of Klarna, Yoav Shmaria of Similarweb and Xiaoxu Gao of Adyen.<br><br>Check out our three most downloaded episodes:</div><ul><li><a href="https://www.dataengineeringshow.com/e/6nrrrlzn">Zach Wilson on What Makes a Great Data Engineer</a></li><li><a href="https://www.dataengineeringshow.com/e/v8w441mn">Joe Reis and Matt Housley on The Fundamentals of Data Engineering</a></li><li><a href="https://www.dataengineeringshow.com/e/p8lxxy18">Bill Inmon, The Godfather of Data Warehousing</a></li></ul>]]></content:encoded>
      <pubDate>Thu, 29 Feb 2024 01:52:57 +0000</pubDate>
      <author>The Firebolt Data Bros</author>
      <enclosure url="https://media.fame.so/86lnk258.mp3" length="25309713" type="audio/mpeg"/>
      <itunes:author>The Firebolt Data Bros</itunes:author>
      <itunes:image href="https://content.fameapp.so/uploads/86qywn5q/caa7efb0-6c25-11ef-9489-d768d5a0a06d/caa7edf0-6c25-11ef-a69f-3da5bd01e0ea.jpg"/>
      <itunes:duration>1559</itunes:duration>
      <itunes:summary>Data engineering should be less about the stack and more about best practices. While tools may change, foundational principles will remain constant. Joseph Mercado, Senior Data Engineer at LinkedIn, is on The Data Engineering Show to talk about principles that are key to success, leveraging AI for automation, and adopting software engineering methods. </itunes:summary>
      <itunes:subtitle>Data engineering should be less about the stack and more about best practices. While tools may change, foundational principles will remain constant. Joseph Mercado, Senior Data Engineer at LinkedIn, is on The Data Engineering Show to talk about principles that are key to success, leveraging AI for automation, and adopting software engineering methods. </itunes:subtitle>
      <itunes:keywords>data engineering, analytics, data</itunes:keywords>
      <itunes:explicit>No</itunes:explicit>
      <googleplay:explicit>No</googleplay:explicit>
    </item>
    <item>
      <title>Professors Joe Hellerstein and Joseph Gonzalez on LLMs</title>
      <link>https://podcasts.fame.so/e/4892256n</link>
      <itunes:title>Professors Joe Hellerstein and Joseph Gonzalez on LLMs</itunes:title>
      <itunes:episode>33</itunes:episode>
      <itunes:block>No</itunes:block>
      <googleplay:block>No</googleplay:block>
      <guid isPermaLink="false">v17vv8q0</guid>
      <description>Joe Hellerstein is the Jim Gray Professor of Computer Science at Berkeley and Joseph Gonzalez is an Associate Professor in the Electrical Engineering and Computer Science department. They’ve inspired generations of database enthusiasts (including Benji and Eldad) and have come on the show to talk about all things LLM and RunLLM which they co-founded.If you consider yourself a hardcore engineer, this episode is for you.</description>
      <content:encoded><![CDATA[<p>Joe Hellerstein is the Jim Gray Professor of Computer Science at Berkeley and Joseph Gonzalez is an Associate Professor in the Electrical Engineering and Computer Science department. </p><p>They’ve inspired generations of database enthusiasts (including Benji and Eldad) and have come on the show to talk about all things LLM and RunLLM which they co-founded.</p><p>If you consider yourself a hardcore engineer, this episode is for you.</p><div>The Data Engineering Show is brought to you by <a href="https://www.firebolt.io/">firebolt.io</a> and handcrafted by our friends over at: <a href="https://www.fame.so/?utm_medium=podcast&amp;utm_source=bcast&amp;utm_campaign=masters-of-community-with-david-spinks?utm_medium=podcast&amp;utm_source=bcast&amp;utm_campaign=confessions-of-a-b2b-marketer">fame.so</a><br><br>Previous guests include: Joseph Machado of Linkedin, Metthew Weingarten of Disney, Joe Reis and Matt Housely, authors of <em>The Fundamentals of Data Engineering, </em>Zach Wilson of Eczachly Inc, Megan Lieu of Deepnote, Erik Heintare of Bolt, Lior Solomon of Vimeo, Krishna Naidu of Canva, Mike Cohen of Substack, Jens Larsson of Ark, Gunnar Tangring of Klarna, Yoav Shmaria of Similarweb and Xiaoxu Gao of Adyen.<br><br>Check out our three most downloaded episodes:</div><ul><li><a href="https://www.dataengineeringshow.com/e/6nrrrlzn">Zach Wilson on What Makes a Great Data Engineer</a></li><li><a href="https://www.dataengineeringshow.com/e/v8w441mn">Joe Reis and Matt Housley on The Fundamentals of Data Engineering</a></li><li><a href="https://www.dataengineeringshow.com/e/p8lxxy18">Bill Inmon, The Godfather of Data Warehousing</a></li></ul>]]></content:encoded>
      <pubDate>Wed, 24 Jan 2024 04:44:14 +0000</pubDate>
      <author>The Firebolt Data Bros</author>
      <enclosure url="https://media.fame.so/8mk7jvy8.mp3" length="44971877" type="audio/mpeg"/>
      <itunes:author>The Firebolt Data Bros</itunes:author>
      <itunes:image href="https://content.fameapp.so/uploads/86qywn5q/caa7efb0-6c25-11ef-9489-d768d5a0a06d/caa7edf0-6c25-11ef-a69f-3da5bd01e0ea.jpg"/>
      <itunes:duration>2767</itunes:duration>
      <itunes:summary>Joe Hellerstein is the Jim Gray Professor of Computer Science at Berkeley and Joseph Gonzalez is an Associate Professor in the Electrical Engineering and Computer Science department. They’ve inspired generations of database enthusiasts (including Benji and Eldad) and have come on the show to talk about all things LLM and RunLLM which they co-founded.If you consider yourself a hardcore engineer, this episode is for you.</itunes:summary>
      <itunes:subtitle>Joe Hellerstein is the Jim Gray Professor of Computer Science at Berkeley and Joseph Gonzalez is an Associate Professor in the Electrical Engineering and Computer Science department. They’ve inspired generations of database enthusiasts (including Benji and Eldad) and have come on the show to talk about all things LLM and RunLLM which they co-founded.If you consider yourself a hardcore engineer, this episode is for you.</itunes:subtitle>
      <itunes:keywords>data engineering, analytics, data</itunes:keywords>
      <itunes:explicit>No</itunes:explicit>
      <googleplay:explicit>No</googleplay:explicit>
    </item>
    <item>
      <title>Megan Lieu on powerful notebooks that enable collaboration</title>
      <link>https://podcasts.fame.so/e/rn7yyrrn</link>
      <itunes:title>Megan Lieu on powerful notebooks that enable collaboration</itunes:title>
      <itunes:episode>32</itunes:episode>
      <itunes:block>No</itunes:block>
      <googleplay:block>No</googleplay:block>
      <guid isPermaLink="false">k18mm880</guid>
      <description>There are two types of data influencers on LinkedIn:1. Those who talk directly about the products and companies they work for2. Those that provide more general guidance, tips and opinions Can influencers actually be passionate about the products they’re developing and straightforwardly talk about them without sounding salesly? We’re kicking off 2024 with the amazing Megan Lieu on a new Data Engineering Show episode.Megan is one of those influencers that combine the two approaches, and with almost 100K followers, her content seems to be resonating with many data folks. She talked to the bros about her approach to data advocacy as well as the power of notebooks, especially when they become broader and enable collaboration.</description>
      <content:encoded><![CDATA[<p>There are two types of data influencers on LinkedIn:</p><p>1. Those who talk directly about the products and companies they work for<br>2. Those that provide more general guidance, tips and opinions </p><p>Can influencers actually be passionate about the products they’re developing and straightforwardly talk about them without sounding salesly? </p><p>We’re kicking off 2024 with the amazing Megan Lieu on a new Data Engineering Show episode.</p><p>Megan is one of those influencers that combine the two approaches, and with almost 100K followers, her content seems to be resonating with many data folks. </p><p>She talked to the bros about her approach to data advocacy as well as the power of notebooks, especially when they become broader and enable collaboration.</p><div>The Data Engineering Show is brought to you by <a href="https://www.firebolt.io/">firebolt.io</a> and handcrafted by our friends over at: <a href="https://www.fame.so/?utm_medium=podcast&amp;utm_source=bcast&amp;utm_campaign=masters-of-community-with-david-spinks?utm_medium=podcast&amp;utm_source=bcast&amp;utm_campaign=confessions-of-a-b2b-marketer">fame.so</a><br><br>Previous guests include: Joseph Machado of Linkedin, Metthew Weingarten of Disney, Joe Reis and Matt Housely, authors of <em>The Fundamentals of Data Engineering, </em>Zach Wilson of Eczachly Inc, Megan Lieu of Deepnote, Erik Heintare of Bolt, Lior Solomon of Vimeo, Krishna Naidu of Canva, Mike Cohen of Substack, Jens Larsson of Ark, Gunnar Tangring of Klarna, Yoav Shmaria of Similarweb and Xiaoxu Gao of Adyen.<br><br>Check out our three most downloaded episodes:</div><ul><li><a href="https://www.dataengineeringshow.com/e/6nrrrlzn">Zach Wilson on What Makes a Great Data Engineer</a></li><li><a href="https://www.dataengineeringshow.com/e/v8w441mn">Joe Reis and Matt Housley on The Fundamentals of Data Engineering</a></li><li><a href="https://www.dataengineeringshow.com/e/p8lxxy18">Bill Inmon, The Godfather of Data Warehousing</a></li></ul>]]></content:encoded>
      <pubDate>Mon, 01 Jan 2024 06:43:29 +0000</pubDate>
      <author>The Firebolt Data Bros</author>
      <enclosure url="https://media.fame.so/8k470rrw.mp3" length="32705384" type="audio/mpeg"/>
      <itunes:author>The Firebolt Data Bros</itunes:author>
      <itunes:image href="https://content.fameapp.so/uploads/86qywn5q/caa7efb0-6c25-11ef-9489-d768d5a0a06d/caa7edf0-6c25-11ef-a69f-3da5bd01e0ea.jpg"/>
      <itunes:duration>1891</itunes:duration>
      <itunes:summary>There are two types of data influencers on LinkedIn:1. Those who talk directly about the products and companies they work for2. Those that provide more general guidance, tips and opinions Can influencers actually be passionate about the products they’re developing and straightforwardly talk about them without sounding salesly? We’re kicking off 2024 with the amazing Megan Lieu on a new Data Engineering Show episode.Megan is one of those influencers that combine the two approaches, and with almost 100K followers, her content seems to be resonating with many data folks. She talked to the bros about her approach to data advocacy as well as the power of notebooks, especially when they become broader and enable collaboration.</itunes:summary>
      <itunes:subtitle>There are two types of data influencers on LinkedIn:1. Those who talk directly about the products and companies they work for2. Those that provide more general guidance, tips and opinions Can influencers actually be passionate about the products they’re developing and straightforwardly talk about them without sounding salesly? We’re kicking off 2024 with the amazing Megan Lieu on a new Data Engineering Show episode.Megan is one of those influencers that combine the two approaches, and with almost 100K followers, her content seems to be resonating with many data folks. She talked to the bros about her approach to data advocacy as well as the power of notebooks, especially when they become broader and enable collaboration.</itunes:subtitle>
      <itunes:keywords>data engineering, analytics, data</itunes:keywords>
      <itunes:explicit>No</itunes:explicit>
      <googleplay:explicit>No</googleplay:explicit>
    </item>
    <item>
      <title>Transitioning from software engineering to data engineering</title>
      <link>https://podcasts.fame.so/e/1npvvqxn</link>
      <itunes:title>Transitioning from software engineering to data engineering</itunes:title>
      <itunes:episode>31</itunes:episode>
      <itunes:block>No</itunes:block>
      <googleplay:block>No</googleplay:block>
      <guid isPermaLink="false">z0r33851</guid>
      <description>Every data team should have at least one data engineer with a software engineering background. This time on The Data Engineering Show, Xiaoxu Gao is an inspiring Python and data engineering expert with 10.6K followers on Medium. She’s a data engineer at Adyen with a software engineering background, and she met the bros to talk about why both software and data engineering skills are so important.Without software engineering skills you’ll be limited to the rigid capabilities of your stack. But without data engineering skills you’ll find it hard to be cost effective and see the bigger picture.</description>
      <content:encoded><![CDATA[<p>Every data team should have at least one data engineer with a software engineering background. This time on The Data Engineering Show, Xiaoxu Gao is an inspiring Python and data engineering expert with 10.6K followers on Medium. </p><p>She’s a data engineer at Adyen with a software engineering background, and she met the bros to talk about why both software and data engineering skills are so important.</p><p>Without software engineering skills you’ll be limited to the rigid capabilities of your stack. But without data engineering skills you’ll find it hard to be cost effective and see the bigger picture.</p><div>The Data Engineering Show is brought to you by <a href="https://www.firebolt.io/">firebolt.io</a> and handcrafted by our friends over at: <a href="https://www.fame.so/?utm_medium=podcast&amp;utm_source=bcast&amp;utm_campaign=masters-of-community-with-david-spinks?utm_medium=podcast&amp;utm_source=bcast&amp;utm_campaign=confessions-of-a-b2b-marketer">fame.so</a><br><br>Previous guests include: Joseph Machado of Linkedin, Metthew Weingarten of Disney, Joe Reis and Matt Housely, authors of <em>The Fundamentals of Data Engineering, </em>Zach Wilson of Eczachly Inc, Megan Lieu of Deepnote, Erik Heintare of Bolt, Lior Solomon of Vimeo, Krishna Naidu of Canva, Mike Cohen of Substack, Jens Larsson of Ark, Gunnar Tangring of Klarna, Yoav Shmaria of Similarweb and Xiaoxu Gao of Adyen.<br><br>Check out our three most downloaded episodes:</div><ul><li><a href="https://www.dataengineeringshow.com/e/6nrrrlzn">Zach Wilson on What Makes a Great Data Engineer</a></li><li><a href="https://www.dataengineeringshow.com/e/v8w441mn">Joe Reis and Matt Housley on The Fundamentals of Data Engineering</a></li><li><a href="https://www.dataengineeringshow.com/e/p8lxxy18">Bill Inmon, The Godfather of Data Warehousing</a></li></ul>]]></content:encoded>
      <pubDate>Wed, 22 Nov 2023 06:50:27 +0000</pubDate>
      <author>The Firebolt Data Bros</author>
      <enclosure url="https://media.fame.so/8z71y04w.mp3" length="29047064" type="audio/mpeg"/>
      <itunes:author>The Firebolt Data Bros</itunes:author>
      <itunes:image href="https://content.fameapp.so/uploads/86qywn5q/caa7efb0-6c25-11ef-9489-d768d5a0a06d/caa7edf0-6c25-11ef-a69f-3da5bd01e0ea.jpg"/>
      <itunes:duration>1788</itunes:duration>
      <itunes:summary>Every data team should have at least one data engineer with a software engineering background. This time on The Data Engineering Show, Xiaoxu Gao is an inspiring Python and data engineering expert with 10.6K followers on Medium. She’s a data engineer at Adyen with a software engineering background, and she met the bros to talk about why both software and data engineering skills are so important.Without software engineering skills you’ll be limited to the rigid capabilities of your stack. But without data engineering skills you’ll find it hard to be cost effective and see the bigger picture.</itunes:summary>
      <itunes:subtitle>Every data team should have at least one data engineer with a software engineering background. This time on The Data Engineering Show, Xiaoxu Gao is an inspiring Python and data engineering expert with 10.6K followers on Medium. She’s a data engineer at Adyen with a software engineering background, and she met the bros to talk about why both software and data engineering skills are so important.Without software engineering skills you’ll be limited to the rigid capabilities of your stack. But without data engineering skills you’ll find it hard to be cost effective and see the bigger picture.</itunes:subtitle>
      <itunes:keywords>data engineering, analytics, data</itunes:keywords>
      <itunes:explicit>No</itunes:explicit>
      <googleplay:explicit>No</googleplay:explicit>
    </item>
    <item>
      <title>Vin Vashishta explains why we should stop using dashboards</title>
      <link>https://podcasts.fame.so/e/58zxx1y8</link>
      <itunes:title>Vin Vashishta explains why we should stop using dashboards</itunes:title>
      <itunes:episode>30</itunes:episode>
      <itunes:block>No</itunes:block>
      <googleplay:block>No</googleplay:block>
      <guid isPermaLink="false">81z77w80</guid>
      <description>Vin Vashista, the guy we all love to follow, has never seen a dashboard with positive ROI. This time on The Data Engineering Show, he met the bros to talk about the difference between BI dashboards and analytics that actually introduce knowledge. It’s no longer just about the data volume, it’s about quality and relevance.</description>
      <content:encoded><![CDATA[<p>Vin Vashista, the guy we all love to follow, has never seen a dashboard with positive ROI. This time on The Data Engineering Show, he met the bros to talk about the difference between BI dashboards and analytics that actually introduce knowledge. It’s no longer just about the data volume, it’s about quality and relevance.</p><div>The Data Engineering Show is brought to you by <a href="https://www.firebolt.io/">firebolt.io</a> and handcrafted by our friends over at: <a href="https://www.fame.so/?utm_medium=podcast&amp;utm_source=bcast&amp;utm_campaign=masters-of-community-with-david-spinks?utm_medium=podcast&amp;utm_source=bcast&amp;utm_campaign=confessions-of-a-b2b-marketer">fame.so</a><br><br>Previous guests include: Joseph Machado of Linkedin, Metthew Weingarten of Disney, Joe Reis and Matt Housely, authors of <em>The Fundamentals of Data Engineering, </em>Zach Wilson of Eczachly Inc, Megan Lieu of Deepnote, Erik Heintare of Bolt, Lior Solomon of Vimeo, Krishna Naidu of Canva, Mike Cohen of Substack, Jens Larsson of Ark, Gunnar Tangring of Klarna, Yoav Shmaria of Similarweb and Xiaoxu Gao of Adyen.<br><br>Check out our three most downloaded episodes:</div><ul><li><a href="https://www.dataengineeringshow.com/e/6nrrrlzn">Zach Wilson on What Makes a Great Data Engineer</a></li><li><a href="https://www.dataengineeringshow.com/e/v8w441mn">Joe Reis and Matt Housley on The Fundamentals of Data Engineering</a></li><li><a href="https://www.dataengineeringshow.com/e/p8lxxy18">Bill Inmon, The Godfather of Data Warehousing</a></li></ul>]]></content:encoded>
      <pubDate>Wed, 04 Oct 2023 03:59:27 +0000</pubDate>
      <author>The Firebolt Data Bros</author>
      <enclosure url="https://media.fame.so/wj07x1mw.mp3" length="34352064" type="audio/mpeg"/>
      <itunes:author>The Firebolt Data Bros</itunes:author>
      <itunes:image href="https://content.fameapp.so/uploads/86qywn5q/caa7efb0-6c25-11ef-9489-d768d5a0a06d/caa7edf0-6c25-11ef-a69f-3da5bd01e0ea.jpg"/>
      <itunes:duration>2145</itunes:duration>
      <itunes:summary>Vin Vashista, the guy we all love to follow, has never seen a dashboard with positive ROI. This time on The Data Engineering Show, he met the bros to talk about the difference between BI dashboards and analytics that actually introduce knowledge. It’s no longer just about the data volume, it’s about quality and relevance.</itunes:summary>
      <itunes:subtitle>Vin Vashista, the guy we all love to follow, has never seen a dashboard with positive ROI. This time on The Data Engineering Show, he met the bros to talk about the difference between BI dashboards and analytics that actually introduce knowledge. It’s no longer just about the data volume, it’s about quality and relevance.</itunes:subtitle>
      <itunes:keywords>data engineering, analytics, data</itunes:keywords>
      <itunes:explicit>No</itunes:explicit>
      <googleplay:explicit>No</googleplay:explicit>
    </item>
    <item>
      <title>Joe Reis and Matt Housley on the fundamentals of data engineering</title>
      <link>https://podcasts.fame.so/e/v8w441mn</link>
      <itunes:title>Joe Reis and Matt Housley on the fundamentals of data engineering</itunes:title>
      <itunes:episode>29</itunes:episode>
      <itunes:block>No</itunes:block>
      <googleplay:block>No</googleplay:block>
      <guid isPermaLink="false">80x77m41</guid>
      <description>After co-writing the best-selling book ‘Fundamentals of Data Engineering’, Joe Reis and Matt Housely joined the bros for some much-needed ranting, priceless data advice, and good laughs. So why are we still talking about providing business value and dashboards, even though we don’t really have anything new to say? If there are so many great tools in the data stack, why are we still so troubled? How can we focus more on things like data governance and data quality that’ll actually push the industry forward?</description>
      <content:encoded><![CDATA[<p>After co-writing the best-selling book ‘Fundamentals of Data Engineering’, Joe Reis and Matt Housely joined the bros for some much-needed ranting, priceless data advice, and good laughs. So why are we still talking about providing business value and dashboards, even though we don’t really have anything new to say? If there are so many great tools in the data stack, why are we still so troubled? How can we focus more on things like data governance and data quality that’ll actually push the industry forward?</p><div>The Data Engineering Show is brought to you by <a href="https://www.firebolt.io/">firebolt.io</a> and handcrafted by our friends over at: <a href="https://www.fame.so/?utm_medium=podcast&amp;utm_source=bcast&amp;utm_campaign=masters-of-community-with-david-spinks?utm_medium=podcast&amp;utm_source=bcast&amp;utm_campaign=confessions-of-a-b2b-marketer">fame.so</a><br><br>Previous guests include: Joseph Machado of Linkedin, Metthew Weingarten of Disney, Joe Reis and Matt Housely, authors of <em>The Fundamentals of Data Engineering, </em>Zach Wilson of Eczachly Inc, Megan Lieu of Deepnote, Erik Heintare of Bolt, Lior Solomon of Vimeo, Krishna Naidu of Canva, Mike Cohen of Substack, Jens Larsson of Ark, Gunnar Tangring of Klarna, Yoav Shmaria of Similarweb and Xiaoxu Gao of Adyen.<br><br>Check out our three most downloaded episodes:</div><ul><li><a href="https://www.dataengineeringshow.com/e/6nrrrlzn">Zach Wilson on What Makes a Great Data Engineer</a></li><li><a href="https://www.dataengineeringshow.com/e/v8w441mn">Joe Reis and Matt Housley on The Fundamentals of Data Engineering</a></li><li><a href="https://www.dataengineeringshow.com/e/p8lxxy18">Bill Inmon, The Godfather of Data Warehousing</a></li></ul>]]></content:encoded>
      <pubDate>Wed, 06 Sep 2023 04:38:25 +0000</pubDate>
      <author>The Firebolt Data Bros</author>
      <enclosure url="https://media.fame.so/w0vrj2vw.mp3" length="41729501" type="audio/mpeg"/>
      <itunes:author>The Firebolt Data Bros</itunes:author>
      <itunes:image href="https://content.fameapp.so/uploads/86qywn5q/caa7efb0-6c25-11ef-9489-d768d5a0a06d/caa7edf0-6c25-11ef-a69f-3da5bd01e0ea.jpg"/>
      <itunes:duration>2531</itunes:duration>
      <itunes:summary>After co-writing the best-selling book ‘Fundamentals of Data Engineering’, Joe Reis and Matt Housely joined the bros for some much-needed ranting, priceless data advice, and good laughs. So why are we still talking about providing business value and dashboards, even though we don’t really have anything new to say? If there are so many great tools in the data stack, why are we still so troubled? How can we focus more on things like data governance and data quality that’ll actually push the industry forward?</itunes:summary>
      <itunes:subtitle>After co-writing the best-selling book ‘Fundamentals of Data Engineering’, Joe Reis and Matt Housely joined the bros for some much-needed ranting, priceless data advice, and good laughs. So why are we still talking about providing business value and dashboards, even though we don’t really have anything new to say? If there are so many great tools in the data stack, why are we still so troubled? How can we focus more on things like data governance and data quality that’ll actually push the industry forward?</itunes:subtitle>
      <itunes:keywords>data engineering, analytics, data</itunes:keywords>
      <itunes:explicit>No</itunes:explicit>
      <googleplay:explicit>No</googleplay:explicit>
    </item>
    <item>
      <title>Bill Inmon, the Godfather of Data Warehousing</title>
      <link>https://podcasts.fame.so/e/p8lxxy18</link>
      <itunes:title>Bill Inmon, the Godfather of Data Warehousing</itunes:title>
      <itunes:episode>28</itunes:episode>
      <itunes:block>No</itunes:block>
      <googleplay:block>No</googleplay:block>
      <guid isPermaLink="false">x1lnnrm1</guid>
      <description>As people in the data industry go, Bill Inmon is among the top, often seen as the godfather of the data warehouse. In this Data Engineering Show episode, Bill Inmon talks about surviving rabbit holes throughout the evolution of data, the data modeling renaissance, and why ChatGPT is not Textual ETL.</description>
      <content:encoded><![CDATA[<p>As people in the data industry go, Bill Inmon is among the top, often seen as the godfather of the data warehouse. In this Data Engineering Show episode, Bill Inmon talks about surviving rabbit holes throughout the evolution of data, the data modeling renaissance, and why ChatGPT is not Textual ETL.</p><div>The Data Engineering Show is brought to you by <a href="https://www.firebolt.io/">firebolt.io</a> and handcrafted by our friends over at: <a href="https://www.fame.so/?utm_medium=podcast&amp;utm_source=bcast&amp;utm_campaign=masters-of-community-with-david-spinks?utm_medium=podcast&amp;utm_source=bcast&amp;utm_campaign=confessions-of-a-b2b-marketer">fame.so</a><br><br>Previous guests include: Joseph Machado of Linkedin, Metthew Weingarten of Disney, Joe Reis and Matt Housely, authors of <em>The Fundamentals of Data Engineering, </em>Zach Wilson of Eczachly Inc, Megan Lieu of Deepnote, Erik Heintare of Bolt, Lior Solomon of Vimeo, Krishna Naidu of Canva, Mike Cohen of Substack, Jens Larsson of Ark, Gunnar Tangring of Klarna, Yoav Shmaria of Similarweb and Xiaoxu Gao of Adyen.<br><br>Check out our three most downloaded episodes:</div><ul><li><a href="https://www.dataengineeringshow.com/e/6nrrrlzn">Zach Wilson on What Makes a Great Data Engineer</a></li><li><a href="https://www.dataengineeringshow.com/e/v8w441mn">Joe Reis and Matt Housley on The Fundamentals of Data Engineering</a></li><li><a href="https://www.dataengineeringshow.com/e/p8lxxy18">Bill Inmon, The Godfather of Data Warehousing</a></li></ul>]]></content:encoded>
      <pubDate>Tue, 08 Aug 2023 04:07:23 +0000</pubDate>
      <author>The Firebolt Data Bros</author>
      <enclosure url="https://media.fame.so/8x9jl3rw.mp3" length="29353677" type="audio/mpeg"/>
      <itunes:author>The Firebolt Data Bros</itunes:author>
      <itunes:image href="https://content.fameapp.so/uploads/86qywn5q/caa7efb0-6c25-11ef-9489-d768d5a0a06d/caa7edf0-6c25-11ef-a69f-3da5bd01e0ea.jpg"/>
      <itunes:duration>1832</itunes:duration>
      <itunes:summary>As people in the data industry go, Bill Inmon is among the top, often seen as the godfather of the data warehouse. In this Data Engineering Show episode, Bill Inmon talks about surviving rabbit holes throughout the evolution of data, the data modeling renaissance, and why ChatGPT is not Textual ETL.</itunes:summary>
      <itunes:subtitle>As people in the data industry go, Bill Inmon is among the top, often seen as the godfather of the data warehouse. In this Data Engineering Show episode, Bill Inmon talks about surviving rabbit holes throughout the evolution of data, the data modeling renaissance, and why ChatGPT is not Textual ETL.</itunes:subtitle>
      <itunes:keywords>data engineering, analytics, data</itunes:keywords>
      <itunes:explicit>No</itunes:explicit>
      <googleplay:explicit>No</googleplay:explicit>
    </item>
    <item>
      <title>Large-scale data engineering at Momentive.ai - Meenal Iyer</title>
      <link>https://podcasts.fame.so/e/xnvqq1v8</link>
      <itunes:title>Large-scale data engineering at Momentive.ai - Meenal Iyer</itunes:title>
      <itunes:episode>27</itunes:episode>
      <itunes:block>No</itunes:block>
      <googleplay:block>No</googleplay:block>
      <guid isPermaLink="false">71y77qr1</guid>
      <description>As companies scale, data gets messy. The data team says one thing, the business team says something completely different. Meenal Iyer, VP Data at Momentive.ai, Met the Data Bros to talk about enforcing collaboration in large organizations to ensure what she considers the three most important data factors: Adoption, Trust, and Value.</description>
      <content:encoded><![CDATA[<p>As companies scale, data gets messy. The data team says one thing, the business team says something completely different. Meenal Iyer, VP Data at Momentive.ai, Met the Data Bros to talk about enforcing collaboration in large organizations to ensure what she considers the three most important data factors: Adoption, Trust, and Value.</p><div>The Data Engineering Show is brought to you by <a href="https://www.firebolt.io/">firebolt.io</a> and handcrafted by our friends over at: <a href="https://www.fame.so/?utm_medium=podcast&amp;utm_source=bcast&amp;utm_campaign=masters-of-community-with-david-spinks?utm_medium=podcast&amp;utm_source=bcast&amp;utm_campaign=confessions-of-a-b2b-marketer">fame.so</a><br><br>Previous guests include: Joseph Machado of Linkedin, Metthew Weingarten of Disney, Joe Reis and Matt Housely, authors of <em>The Fundamentals of Data Engineering, </em>Zach Wilson of Eczachly Inc, Megan Lieu of Deepnote, Erik Heintare of Bolt, Lior Solomon of Vimeo, Krishna Naidu of Canva, Mike Cohen of Substack, Jens Larsson of Ark, Gunnar Tangring of Klarna, Yoav Shmaria of Similarweb and Xiaoxu Gao of Adyen.<br><br>Check out our three most downloaded episodes:</div><ul><li><a href="https://www.dataengineeringshow.com/e/6nrrrlzn">Zach Wilson on What Makes a Great Data Engineer</a></li><li><a href="https://www.dataengineeringshow.com/e/v8w441mn">Joe Reis and Matt Housley on The Fundamentals of Data Engineering</a></li><li><a href="https://www.dataengineeringshow.com/e/p8lxxy18">Bill Inmon, The Godfather of Data Warehousing</a></li></ul>]]></content:encoded>
      <pubDate>Wed, 12 Jul 2023 01:06:52 +0000</pubDate>
      <author>The Firebolt Data Bros</author>
      <enclosure url="https://media.fame.so/821v4j7w.mp3" length="37154257" type="audio/mpeg"/>
      <itunes:author>The Firebolt Data Bros</itunes:author>
      <itunes:image href="https://content.fameapp.so/uploads/86qywn5q/caa7efb0-6c25-11ef-9489-d768d5a0a06d/caa7edf0-6c25-11ef-a69f-3da5bd01e0ea.jpg"/>
      <itunes:duration>2320</itunes:duration>
      <itunes:summary>As companies scale, data gets messy. The data team says one thing, the business team says something completely different. Meenal Iyer, VP Data at Momentive.ai, Met the Data Bros to talk about enforcing collaboration in large organizations to ensure what she considers the three most important data factors: Adoption, Trust, and Value.</itunes:summary>
      <itunes:subtitle>As companies scale, data gets messy. The data team says one thing, the business team says something completely different. Meenal Iyer, VP Data at Momentive.ai, Met the Data Bros to talk about enforcing collaboration in large organizations to ensure what she considers the three most important data factors: Adoption, Trust, and Value.</itunes:subtitle>
      <itunes:keywords>data engineering, analytics, data</itunes:keywords>
      <itunes:explicit>No</itunes:explicit>
      <googleplay:explicit>No</googleplay:explicit>
    </item>
    <item>
      <title>Data engineering from the early 2000s till today - BlackRock</title>
      <link>https://podcasts.fame.so/e/18200w78</link>
      <itunes:title>Data engineering from the early 2000s till today - BlackRock</itunes:title>
      <itunes:episode>25</itunes:episode>
      <itunes:block>No</itunes:block>
      <googleplay:block>No</googleplay:block>
      <guid isPermaLink="false">209qqrn0</guid>
      <description>When it comes to data management, have we come a long way since the early 2000s? Or has it simply taken us 20 years to finally realize that you can’t scale properly without data modeling. With over 20 years of experience in the data space, leading engineering teams at Cisco, Oracle, Greenplum, and now as Sr. Director of Engineering at BlackRock, Krishnan Viswanathan talks about the data engineering challenges that existed two decades ago and still exist today.</description>
      <content:encoded><![CDATA[<p>When it comes to data management, have we come a long way since the early 2000s? Or has it simply taken us 20 years to finally realize that you can’t scale properly without data modeling. With over 20 years of experience in the data space, leading engineering teams at Cisco, Oracle, Greenplum, and now as Sr. Director of Engineering at BlackRock, Krishnan Viswanathan talks about the data engineering challenges that existed two decades ago and still exist today.</p><div>The Data Engineering Show is brought to you by <a href="https://www.firebolt.io/">firebolt.io</a> and handcrafted by our friends over at: <a href="https://www.fame.so/?utm_medium=podcast&amp;utm_source=bcast&amp;utm_campaign=masters-of-community-with-david-spinks?utm_medium=podcast&amp;utm_source=bcast&amp;utm_campaign=confessions-of-a-b2b-marketer">fame.so</a><br><br>Previous guests include: Joseph Machado of Linkedin, Metthew Weingarten of Disney, Joe Reis and Matt Housely, authors of <em>The Fundamentals of Data Engineering, </em>Zach Wilson of Eczachly Inc, Megan Lieu of Deepnote, Erik Heintare of Bolt, Lior Solomon of Vimeo, Krishna Naidu of Canva, Mike Cohen of Substack, Jens Larsson of Ark, Gunnar Tangring of Klarna, Yoav Shmaria of Similarweb and Xiaoxu Gao of Adyen.<br><br>Check out our three most downloaded episodes:</div><ul><li><a href="https://www.dataengineeringshow.com/e/6nrrrlzn">Zach Wilson on What Makes a Great Data Engineer</a></li><li><a href="https://www.dataengineeringshow.com/e/v8w441mn">Joe Reis and Matt Housley on The Fundamentals of Data Engineering</a></li><li><a href="https://www.dataengineeringshow.com/e/p8lxxy18">Bill Inmon, The Godfather of Data Warehousing</a></li></ul>]]></content:encoded>
      <pubDate>Thu, 08 Jun 2023 06:55:59 +0000</pubDate>
      <author>The Firebolt Data Bros</author>
      <enclosure url="https://media.fame.so/wl4x53qw.mp3" length="40179709" type="audio/mpeg"/>
      <itunes:author>The Firebolt Data Bros</itunes:author>
      <itunes:image href="https://content.fameapp.so/uploads/86qywn5q/caa7efb0-6c25-11ef-9489-d768d5a0a06d/caa7edf0-6c25-11ef-a69f-3da5bd01e0ea.jpg"/>
      <itunes:duration>2509</itunes:duration>
      <itunes:summary>When it comes to data management, have we come a long way since the early 2000s? Or has it simply taken us 20 years to finally realize that you can’t scale properly without data modeling. With over 20 years of experience in the data space, leading engineering teams at Cisco, Oracle, Greenplum, and now as Sr. Director of Engineering at BlackRock, Krishnan Viswanathan talks about the data engineering challenges that existed two decades ago and still exist today.</itunes:summary>
      <itunes:subtitle>When it comes to data management, have we come a long way since the early 2000s? Or has it simply taken us 20 years to finally realize that you can’t scale properly without data modeling. With over 20 years of experience in the data space, leading engineering teams at Cisco, Oracle, Greenplum, and now as Sr. Director of Engineering at BlackRock, Krishnan Viswanathan talks about the data engineering challenges that existed two decades ago and still exist today.</itunes:subtitle>
      <itunes:keywords>data engineering, analytics, data</itunes:keywords>
      <itunes:explicit>No</itunes:explicit>
      <googleplay:explicit>No</googleplay:explicit>
    </item>
    <item>
      <title>Zach Wilson on what makes a great data engineer</title>
      <link>https://podcasts.fame.so/e/6nrrrlzn</link>
      <itunes:title>Zach Wilson on what makes a great data engineer</itunes:title>
      <itunes:episode>24</itunes:episode>
      <itunes:block>No</itunes:block>
      <googleplay:block>No</googleplay:block>
      <guid isPermaLink="false">81566840</guid>
      <description>How good you are at Spark or Flink ≠ how good you are at data engineering. After years of data engineering experience at Airbnb, Netflix, and Facebook, Zach Wilson is now focused on spreading the knowledge in EcZachly and all over social media. He met Benjamin Wagner to explain why data modeling and storytelling are more important than the actual tech, why data engineering is going to see more job growth than data science, and what brought him to start creating content, reaching over 250K followers on LinkedIn.</description>
      <content:encoded><![CDATA[<p>How good you are at Spark or Flink ≠ how good you are at data engineering. After years of data engineering experience at Airbnb, Netflix, and Facebook, Zach Wilson is now focused on spreading the knowledge in EcZachly and all over social media. He met Benjamin Wagner to explain why data modeling and storytelling are more important than the actual tech, why data engineering is going to see more job growth than data science, and what brought him to start creating content, reaching over 250K followers on LinkedIn.</p><div>The Data Engineering Show is brought to you by <a href="https://www.firebolt.io/">firebolt.io</a> and handcrafted by our friends over at: <a href="https://www.fame.so/?utm_medium=podcast&amp;utm_source=bcast&amp;utm_campaign=masters-of-community-with-david-spinks?utm_medium=podcast&amp;utm_source=bcast&amp;utm_campaign=confessions-of-a-b2b-marketer">fame.so</a><br><br>Previous guests include: Joseph Machado of Linkedin, Metthew Weingarten of Disney, Joe Reis and Matt Housely, authors of <em>The Fundamentals of Data Engineering, </em>Zach Wilson of Eczachly Inc, Megan Lieu of Deepnote, Erik Heintare of Bolt, Lior Solomon of Vimeo, Krishna Naidu of Canva, Mike Cohen of Substack, Jens Larsson of Ark, Gunnar Tangring of Klarna, Yoav Shmaria of Similarweb and Xiaoxu Gao of Adyen.<br><br>Check out our three most downloaded episodes:</div><ul><li><a href="https://www.dataengineeringshow.com/e/6nrrrlzn">Zach Wilson on What Makes a Great Data Engineer</a></li><li><a href="https://www.dataengineeringshow.com/e/v8w441mn">Joe Reis and Matt Housley on The Fundamentals of Data Engineering</a></li><li><a href="https://www.dataengineeringshow.com/e/p8lxxy18">Bill Inmon, The Godfather of Data Warehousing</a></li></ul>]]></content:encoded>
      <pubDate>Thu, 27 Apr 2023 01:59:31 +0000</pubDate>
      <author>The Firebolt Data Bros</author>
      <enclosure url="https://media.fame.so/w21v4078.mp3" length="35199104" type="audio/mpeg"/>
      <itunes:author>The Firebolt Data Bros</itunes:author>
      <itunes:image href="https://content.fameapp.so/uploads/86qywn5q/caa7efb0-6c25-11ef-9489-d768d5a0a06d/caa7edf0-6c25-11ef-a69f-3da5bd01e0ea.jpg"/>
      <itunes:duration>2042</itunes:duration>
      <itunes:summary>How good you are at Spark or Flink ≠ how good you are at data engineering. After years of data engineering experience at Airbnb, Netflix, and Facebook, Zach Wilson is now focused on spreading the knowledge in EcZachly and all over social media. He met Benjamin Wagner to explain why data modeling and storytelling are more important than the actual tech, why data engineering is going to see more job growth than data science, and what brought him to start creating content, reaching over 250K followers on LinkedIn.</itunes:summary>
      <itunes:subtitle>How good you are at Spark or Flink ≠ how good you are at data engineering. After years of data engineering experience at Airbnb, Netflix, and Facebook, Zach Wilson is now focused on spreading the knowledge in EcZachly and all over social media. He met Benjamin Wagner to explain why data modeling and storytelling are more important than the actual tech, why data engineering is going to see more job growth than data science, and what brought him to start creating content, reaching over 250K followers on LinkedIn.</itunes:subtitle>
      <itunes:keywords>data engineering, analytics, data</itunes:keywords>
      <itunes:explicit>No</itunes:explicit>
      <googleplay:explicit>No</googleplay:explicit>
    </item>
    <item>
      <title>How ZipRecruiter and Yotpo power self-service data platforms that work</title>
      <link>https://podcasts.fame.so/e/1n3mm1qn</link>
      <itunes:title>How ZipRecruiter and Yotpo power self-service data platforms that work</itunes:title>
      <itunes:episode>23</itunes:episode>
      <itunes:block>No</itunes:block>
      <googleplay:block>No</googleplay:block>
      <guid isPermaLink="false">m1j22841</guid>
      <description>Data engineers are not paid to do support. Liran Yogev, Director of Engineering at ZipRecruiter, and Doron Porat, Director of Infrastructure at Yotpo talk about building resilient self-service products that keep customers happy and engineers calm. They walked the bros through their data stacks and explained how ZipRecruiter is completely rebuilding its data layer from scratch.</description>
      <content:encoded><![CDATA[<p>Data engineers are not paid to do support. Liran Yogev, Director of Engineering at ZipRecruiter, and Doron Porat, Director of Infrastructure at Yotpo talk about building resilient self-service products that keep customers happy and engineers calm. </p><p>They walked the bros through their data stacks and explained how ZipRecruiter is completely rebuilding its data layer from scratch.</p><div>The Data Engineering Show is brought to you by <a href="https://www.firebolt.io/">firebolt.io</a> and handcrafted by our friends over at: <a href="https://www.fame.so/?utm_medium=podcast&amp;utm_source=bcast&amp;utm_campaign=masters-of-community-with-david-spinks?utm_medium=podcast&amp;utm_source=bcast&amp;utm_campaign=confessions-of-a-b2b-marketer">fame.so</a><br><br>Previous guests include: Joseph Machado of Linkedin, Metthew Weingarten of Disney, Joe Reis and Matt Housely, authors of <em>The Fundamentals of Data Engineering, </em>Zach Wilson of Eczachly Inc, Megan Lieu of Deepnote, Erik Heintare of Bolt, Lior Solomon of Vimeo, Krishna Naidu of Canva, Mike Cohen of Substack, Jens Larsson of Ark, Gunnar Tangring of Klarna, Yoav Shmaria of Similarweb and Xiaoxu Gao of Adyen.<br><br>Check out our three most downloaded episodes:</div><ul><li><a href="https://www.dataengineeringshow.com/e/6nrrrlzn">Zach Wilson on What Makes a Great Data Engineer</a></li><li><a href="https://www.dataengineeringshow.com/e/v8w441mn">Joe Reis and Matt Housley on The Fundamentals of Data Engineering</a></li><li><a href="https://www.dataengineeringshow.com/e/p8lxxy18">Bill Inmon, The Godfather of Data Warehousing</a></li></ul>]]></content:encoded>
      <pubDate>Thu, 23 Mar 2023 05:57:24 +0000</pubDate>
      <author>The Firebolt Data Bros</author>
      <enclosure url="https://media.fame.so/8l4x50q8.mp3" length="47701261" type="audio/mpeg"/>
      <itunes:author>The Firebolt Data Bros</itunes:author>
      <itunes:image href="https://content.fameapp.so/uploads/86qywn5q/caa7efb0-6c25-11ef-9489-d768d5a0a06d/caa7edf0-6c25-11ef-a69f-3da5bd01e0ea.jpg"/>
      <itunes:duration>2748</itunes:duration>
      <itunes:summary>Data engineers are not paid to do support. Liran Yogev, Director of Engineering at ZipRecruiter, and Doron Porat, Director of Infrastructure at Yotpo talk about building resilient self-service products that keep customers happy and engineers calm. They walked the bros through their data stacks and explained how ZipRecruiter is completely rebuilding its data layer from scratch.</itunes:summary>
      <itunes:subtitle>Data engineers are not paid to do support. Liran Yogev, Director of Engineering at ZipRecruiter, and Doron Porat, Director of Infrastructure at Yotpo talk about building resilient self-service products that keep customers happy and engineers calm. They walked the bros through their data stacks and explained how ZipRecruiter is completely rebuilding its data layer from scratch.</itunes:subtitle>
      <itunes:keywords>data engineering, analytics, data</itunes:keywords>
      <itunes:explicit>No</itunes:explicit>
      <googleplay:explicit>No</googleplay:explicit>
    </item>
    <item>
      <title>Data Observability with Millions of Users - Barr Moses</title>
      <link>https://podcasts.fame.so/e/2n611248</link>
      <itunes:title>Data Observability with Millions of Users - Barr Moses</itunes:title>
      <itunes:episode>22</itunes:episode>
      <itunes:block>No</itunes:block>
      <googleplay:block>No</googleplay:block>
      <guid isPermaLink="false">j0222kz0</guid>
      <description>Barr Moses, CEO of Monte Carlo explains the difference between data quality and data observability, and how to make sure your data is accurate in a world where so many different teams are accessing it.</description>
      <content:encoded><![CDATA[<p>Barr Moses, CEO of Monte Carlo explains the difference between data quality and data observability, and how to make sure your data is accurate in a world where so many different teams are accessing it.</p><div>The Data Engineering Show is brought to you by <a href="https://www.firebolt.io/">firebolt.io</a> and handcrafted by our friends over at: <a href="https://www.fame.so/?utm_medium=podcast&amp;utm_source=bcast&amp;utm_campaign=masters-of-community-with-david-spinks?utm_medium=podcast&amp;utm_source=bcast&amp;utm_campaign=confessions-of-a-b2b-marketer">fame.so</a><br><br>Previous guests include: Joseph Machado of Linkedin, Metthew Weingarten of Disney, Joe Reis and Matt Housely, authors of <em>The Fundamentals of Data Engineering, </em>Zach Wilson of Eczachly Inc, Megan Lieu of Deepnote, Erik Heintare of Bolt, Lior Solomon of Vimeo, Krishna Naidu of Canva, Mike Cohen of Substack, Jens Larsson of Ark, Gunnar Tangring of Klarna, Yoav Shmaria of Similarweb and Xiaoxu Gao of Adyen.<br><br>Check out our three most downloaded episodes:</div><ul><li><a href="https://www.dataengineeringshow.com/e/6nrrrlzn">Zach Wilson on What Makes a Great Data Engineer</a></li><li><a href="https://www.dataengineeringshow.com/e/v8w441mn">Joe Reis and Matt Housley on The Fundamentals of Data Engineering</a></li><li><a href="https://www.dataengineeringshow.com/e/p8lxxy18">Bill Inmon, The Godfather of Data Warehousing</a></li></ul>]]></content:encoded>
      <pubDate>Wed, 08 Feb 2023 05:50:54 +0000</pubDate>
      <author>The Firebolt Data Bros</author>
      <enclosure url="https://media.fame.so/8qy7j298.mp3" length="37089464" type="audio/mpeg"/>
      <itunes:author>The Firebolt Data Bros</itunes:author>
      <itunes:image href="https://content.fameapp.so/uploads/86qywn5q/caa7efb0-6c25-11ef-9489-d768d5a0a06d/caa7edf0-6c25-11ef-a69f-3da5bd01e0ea.jpg"/>
      <itunes:duration>2316</itunes:duration>
      <itunes:summary>Barr Moses, CEO of Monte Carlo explains the difference between data quality and data observability, and how to make sure your data is accurate in a world where so many different teams are accessing it.</itunes:summary>
      <itunes:subtitle>Barr Moses, CEO of Monte Carlo explains the difference between data quality and data observability, and how to make sure your data is accurate in a world where so many different teams are accessing it.</itunes:subtitle>
      <itunes:keywords>data engineering, analytics, data</itunes:keywords>
      <itunes:explicit>No</itunes:explicit>
      <googleplay:explicit>No</googleplay:explicit>
    </item>
    <item>
      <title>How Amplitude Engineers Process 5 Trillion Real-time Events</title>
      <link>https://podcasts.fame.so/e/08j00708</link>
      <itunes:title>How Amplitude Engineers Process 5 Trillion Real-time Events</itunes:title>
      <itunes:episode>21</itunes:episode>
      <itunes:block>No</itunes:block>
      <googleplay:block>No</googleplay:block>
      <guid isPermaLink="false">41pmmwm0</guid>
      <description>Weichen Wang, Senior Engineering Manager at Amplitude, came to meet the bros to talk about Amplitude's cutting-edge data stack and how it processes 5 Trillion real-time events while dealing with mutable data and massive scale.</description>
      <content:encoded><![CDATA[<p>Weichen Wang, Senior Engineering Manager at Amplitude, came to meet the bros to talk about Amplitude's cutting-edge data stack and how it processes 5 Trillion real-time events while dealing with mutable data and massive scale.</p><div>The Data Engineering Show is brought to you by <a href="https://www.firebolt.io/">firebolt.io</a> and handcrafted by our friends over at: <a href="https://www.fame.so/?utm_medium=podcast&amp;utm_source=bcast&amp;utm_campaign=masters-of-community-with-david-spinks?utm_medium=podcast&amp;utm_source=bcast&amp;utm_campaign=confessions-of-a-b2b-marketer">fame.so</a><br><br>Previous guests include: Joseph Machado of Linkedin, Metthew Weingarten of Disney, Joe Reis and Matt Housely, authors of <em>The Fundamentals of Data Engineering, </em>Zach Wilson of Eczachly Inc, Megan Lieu of Deepnote, Erik Heintare of Bolt, Lior Solomon of Vimeo, Krishna Naidu of Canva, Mike Cohen of Substack, Jens Larsson of Ark, Gunnar Tangring of Klarna, Yoav Shmaria of Similarweb and Xiaoxu Gao of Adyen.<br><br>Check out our three most downloaded episodes:</div><ul><li><a href="https://www.dataengineeringshow.com/e/6nrrrlzn">Zach Wilson on What Makes a Great Data Engineer</a></li><li><a href="https://www.dataengineeringshow.com/e/v8w441mn">Joe Reis and Matt Housley on The Fundamentals of Data Engineering</a></li><li><a href="https://www.dataengineeringshow.com/e/p8lxxy18">Bill Inmon, The Godfather of Data Warehousing</a></li></ul>]]></content:encoded>
      <pubDate>Thu, 05 Jan 2023 01:39:23 +0000</pubDate>
      <author>The Firebolt Data Bros</author>
      <enclosure url="https://media.fame.so/wx9jlvr8.mp3" length="26896133" type="audio/mpeg"/>
      <itunes:author>The Firebolt Data Bros</itunes:author>
      <itunes:image href="https://content.fameapp.so/uploads/86qywn5q/caa7efb0-6c25-11ef-9489-d768d5a0a06d/caa7edf0-6c25-11ef-a69f-3da5bd01e0ea.jpg"/>
      <itunes:duration>1679</itunes:duration>
      <itunes:summary>Weichen Wang, Senior Engineering Manager at Amplitude, came to meet the bros to talk about Amplitude's cutting-edge data stack and how it processes 5 Trillion real-time events while dealing with mutable data and massive scale.</itunes:summary>
      <itunes:subtitle>Weichen Wang, Senior Engineering Manager at Amplitude, came to meet the bros to talk about Amplitude's cutting-edge data stack and how it processes 5 Trillion real-time events while dealing with mutable data and massive scale.</itunes:subtitle>
      <itunes:keywords>data engineering, analytics, data</itunes:keywords>
      <itunes:explicit>No</itunes:explicit>
      <googleplay:explicit>No</googleplay:explicit>
    </item>
    <item>
      <title>Making Observability a Key Business Driver</title>
      <link>https://podcasts.fame.so/e/l8q221wn</link>
      <itunes:title>Making Observability a Key Business Driver</itunes:title>
      <itunes:episode>20</itunes:episode>
      <itunes:block>No</itunes:block>
      <googleplay:block>No</googleplay:block>
      <guid isPermaLink="false">80nkkpv1</guid>
      <description>80% of the code that you write doesn’t work on the first try. And that’s fine. But knowing which 80% is not working and which 20% is working is the actual challenge. After 10 years at Facebook, managing and scaling the Seattle site to over 6000 engineers(!) Vijaye Raji founded Statsig to make observability automated and real-time. How is the semantic layer managed? How was the Statsig team able to build an observability product that handles real-time ever-changing metadata? What are Vijaye’s main takeaways from engineering at Facebook? Tune in.</description>
      <content:encoded><![CDATA[80% of the code that you write doesn’t work on the first try. And that’s fine. But knowing which 80% is not working and which 20% is working is the actual challenge. After 10 years at Facebook, managing and scaling the Seattle site to over 6000 engineers(!) Vijaye Raji founded Statsig to make observability automated and real-time. How is the semantic layer managed? How was the Statsig team able to build an observability product that handles real-time ever-changing metadata? What are Vijaye’s main takeaways from engineering at Facebook? Tune in.<div>The Data Engineering Show is brought to you by <a href="https://www.firebolt.io/">firebolt.io</a> and handcrafted by our friends over at: <a href="https://www.fame.so/?utm_medium=podcast&amp;utm_source=bcast&amp;utm_campaign=masters-of-community-with-david-spinks?utm_medium=podcast&amp;utm_source=bcast&amp;utm_campaign=confessions-of-a-b2b-marketer">fame.so</a><br><br>Previous guests include: Joseph Machado of Linkedin, Metthew Weingarten of Disney, Joe Reis and Matt Housely, authors of <em>The Fundamentals of Data Engineering, </em>Zach Wilson of Eczachly Inc, Megan Lieu of Deepnote, Erik Heintare of Bolt, Lior Solomon of Vimeo, Krishna Naidu of Canva, Mike Cohen of Substack, Jens Larsson of Ark, Gunnar Tangring of Klarna, Yoav Shmaria of Similarweb and Xiaoxu Gao of Adyen.<br><br>Check out our three most downloaded episodes:</div><ul><li><a href="https://www.dataengineeringshow.com/e/6nrrrlzn">Zach Wilson on What Makes a Great Data Engineer</a></li><li><a href="https://www.dataengineeringshow.com/e/v8w441mn">Joe Reis and Matt Housley on The Fundamentals of Data Engineering</a></li><li><a href="https://www.dataengineeringshow.com/e/p8lxxy18">Bill Inmon, The Godfather of Data Warehousing</a></li></ul>]]></content:encoded>
      <pubDate>Tue, 29 Nov 2022 02:44:27 +0000</pubDate>
      <author>The Firebolt Data Bros</author>
      <enclosure url="https://media.fame.so/8vy26rpw.mp3" length="47063274" type="audio/mpeg"/>
      <itunes:author>The Firebolt Data Bros</itunes:author>
      <itunes:image href="https://content.fameapp.so/uploads/86qywn5q/caa7efb0-6c25-11ef-9489-d768d5a0a06d/caa7edf0-6c25-11ef-a69f-3da5bd01e0ea.jpg"/>
      <itunes:duration>2939</itunes:duration>
      <itunes:summary>80% of the code that you write doesn’t work on the first try. And that’s fine. But knowing which 80% is not working and which 20% is working is the actual challenge. After 10 years at Facebook, managing and scaling the Seattle site to over 6000 engineers(!) Vijaye Raji founded Statsig to make observability automated and real-time. How is the semantic layer managed? How was the Statsig team able to build an observability product that handles real-time ever-changing metadata? What are Vijaye’s main takeaways from engineering at Facebook? Tune in.</itunes:summary>
      <itunes:subtitle>80% of the code that you write doesn’t work on the first try. And that’s fine. But knowing which 80% is not working and which 20% is working is the actual challenge. After 10 years at Facebook, managing and scaling the Seattle site to over 6000 engineers(!) Vijaye Raji founded Statsig to make observability automated and real-time. How is the semantic layer managed? How was the Statsig team able to build an observability product that handles real-time ever-changing metadata? What are Vijaye’s main takeaways from engineering at Facebook? Tune in.</itunes:subtitle>
      <itunes:keywords>data engineering, analytics, data</itunes:keywords>
      <itunes:explicit>No</itunes:explicit>
      <googleplay:explicit>No</googleplay:explicit>
    </item>
    <item>
      <title>A ClickHouse Review from a Practitioner’s Point of View</title>
      <link>https://podcasts.fame.so/e/m84xxm08</link>
      <itunes:title>A ClickHouse Review from a Practitioner’s Point of View</itunes:title>
      <itunes:episode>19</itunes:episode>
      <itunes:block>No</itunes:block>
      <googleplay:block>No</googleplay:block>
      <guid isPermaLink="false">x16ll671</guid>
      <description>Sudeep Kumar, Prinipal Engineer at Salesforce is a ClickHouse fan. He considers the shift to ClickHouse as one of his biggest accomplishments during his eBay days and walks Boaz through his experience with the platform. How on one hand it handled 2B events per minute, but also how it required rollups which compromised granularity when extending time windows. 

Besides a ClickHouse review from a practitioner’s point of view, Sudeep tells us about interesting use-cases he’s working on at Salesforce.</description>
      <content:encoded><![CDATA[<p>Sudeep Kumar, Principal Engineer at Salesforce is a ClickHouse fan. He considers the shift to Clickhouse as one of his biggest accomplishments during his eBay days and walks Boaz through his experience with the platform. How on one hand it handled 2B events per minute, but also how it required rollups which compromised granularity when extending time windows. </p><p>Besides a ClickHouse review from a practitioner’s point of view, Sudeep tells us about interesting use-cases he’s working on at Salesforce. </p><div>The Data Engineering Show is brought to you by <a href="https://www.firebolt.io/">firebolt.io</a> and handcrafted by our friends over at: <a href="https://www.fame.so/?utm_medium=podcast&amp;utm_source=bcast&amp;utm_campaign=masters-of-community-with-david-spinks?utm_medium=podcast&amp;utm_source=bcast&amp;utm_campaign=confessions-of-a-b2b-marketer">fame.so</a><br><br>Previous guests include: Joseph Machado of Linkedin, Metthew Weingarten of Disney, Joe Reis and Matt Housely, authors of <em>The Fundamentals of Data Engineering, </em>Zach Wilson of Eczachly Inc, Megan Lieu of Deepnote, Erik Heintare of Bolt, Lior Solomon of Vimeo, Krishna Naidu of Canva, Mike Cohen of Substack, Jens Larsson of Ark, Gunnar Tangring of Klarna, Yoav Shmaria of Similarweb and Xiaoxu Gao of Adyen.<br><br>Check out our three most downloaded episodes:</div><ul><li><a href="https://www.dataengineeringshow.com/e/6nrrrlzn">Zach Wilson on What Makes a Great Data Engineer</a></li><li><a href="https://www.dataengineeringshow.com/e/v8w441mn">Joe Reis and Matt Housley on The Fundamentals of Data Engineering</a></li><li><a href="https://www.dataengineeringshow.com/e/p8lxxy18">Bill Inmon, The Godfather of Data Warehousing</a></li></ul>]]></content:encoded>
      <pubDate>Thu, 01 Sep 2022 03:05:05 +0000</pubDate>
      <author>The Firebolt Data Bros</author>
      <enclosure url="https://media.fame.so/853qvz68.mp3" length="33362094" type="audio/mpeg"/>
      <itunes:author>The Firebolt Data Bros</itunes:author>
      <itunes:image href="https://content.fameapp.so/uploads/86qywn5q/caa7efb0-6c25-11ef-9489-d768d5a0a06d/caa7edf0-6c25-11ef-a69f-3da5bd01e0ea.jpg"/>
      <itunes:duration>2083</itunes:duration>
      <itunes:summary>Sudeep Kumar, Prinipal Engineer at Salesforce is a ClickHouse fan. He considers the shift to ClickHouse as one of his biggest accomplishments during his eBay days and walks Boaz through his experience with the platform. How on one hand it handled 2B events per minute, but also how it required rollups which compromised granularity when extending time windows. 

Besides a ClickHouse review from a practitioner’s point of view, Sudeep tells us about interesting use-cases he’s working on at Salesforce.</itunes:summary>
      <itunes:subtitle>Sudeep Kumar, Prinipal Engineer at Salesforce is a ClickHouse fan. He considers the shift to ClickHouse as one of his biggest accomplishments during his eBay days and walks Boaz through his experience with the platform. How on one hand it handled 2B events per minute, but also how it required rollups which compromised granularity when extending time windows. 

Besides a ClickHouse review from a practitioner’s point of view, Sudeep tells us about interesting use-cases he’s working on at Salesforce.</itunes:subtitle>
      <itunes:keywords>data engineering, analytics, data</itunes:keywords>
      <itunes:explicit>No</itunes:explicit>
      <googleplay:explicit>No</googleplay:explicit>
    </item>
    <item>
      <title>The Creator of Airflow About His Recipe for Smart Data-Driven Companies</title>
      <link>https://podcasts.fame.so/e/qn0qq1y8</link>
      <itunes:title>The Creator of Airflow About His Recipe for Smart Data-Driven Companies</itunes:title>
      <itunes:episode>18</itunes:episode>
      <itunes:block>No</itunes:block>
      <googleplay:block>No</googleplay:block>
      <guid isPermaLink="false">p1kppl70</guid>
      <description>According to Maxime Beauchemin, CEO &amp;amp; Founder at Preset and Creator of Apache Superset and Apache Airflow, building a thriving company is not so straight-forward. So how did he do it?
 
Choosing the right system and services is key for a successful start, and can help you avoid the chaos of having too many tools spread across multiple teams.
 
Max walks the Bros through his recipe for a smart data-driven company, and the genesis of Airflow, Superset &amp;amp; Presto (with some great tidbits about Airflow's old school marketing approach and how the open source platform took on a life of its own).</description>
      <content:encoded><![CDATA[<p>According to Maxime Beauchemin, CEO &amp; Founder at Preset and Creator of Apache Superset and Apache Airflow, it's not so straight-forward to understand what you're really getting into and the vastness of the skills that are required in order to build a thriving company.</p><p>Picking the right system and services is key for a successful start, and can help you avoid the chaos of having too many tools spread across multiple teams.</p><p>Plus, Max walks the bros through the genesis of Airflow, Superset &amp; Presto, and Airflow's old school marketing approach that won the hearts of developers across the world. And just like the terminator, once the machine takes over, you can't stop.</p><div>The Data Engineering Show is brought to you by <a href="https://www.firebolt.io/">firebolt.io</a> and handcrafted by our friends over at: <a href="https://www.fame.so/?utm_medium=podcast&amp;utm_source=bcast&amp;utm_campaign=masters-of-community-with-david-spinks?utm_medium=podcast&amp;utm_source=bcast&amp;utm_campaign=confessions-of-a-b2b-marketer">fame.so</a><br><br>Previous guests include: Joseph Machado of Linkedin, Metthew Weingarten of Disney, Joe Reis and Matt Housely, authors of <em>The Fundamentals of Data Engineering, </em>Zach Wilson of Eczachly Inc, Megan Lieu of Deepnote, Erik Heintare of Bolt, Lior Solomon of Vimeo, Krishna Naidu of Canva, Mike Cohen of Substack, Jens Larsson of Ark, Gunnar Tangring of Klarna, Yoav Shmaria of Similarweb and Xiaoxu Gao of Adyen.<br><br>Check out our three most downloaded episodes:</div><ul><li><a href="https://www.dataengineeringshow.com/e/6nrrrlzn">Zach Wilson on What Makes a Great Data Engineer</a></li><li><a href="https://www.dataengineeringshow.com/e/v8w441mn">Joe Reis and Matt Housley on The Fundamentals of Data Engineering</a></li><li><a href="https://www.dataengineeringshow.com/e/p8lxxy18">Bill Inmon, The Godfather of Data Warehousing</a></li></ul>]]></content:encoded>
      <pubDate>Wed, 03 Aug 2022 00:43:50 +0000</pubDate>
      <author>The Firebolt Data Bros</author>
      <enclosure url="https://media.fame.so/8163jvzw.mp3" length="44141205" type="audio/mpeg"/>
      <itunes:author>The Firebolt Data Bros</itunes:author>
      <itunes:image href="https://content.fameapp.so/uploads/86qywn5q/caa7efb0-6c25-11ef-9489-d768d5a0a06d/caa7edf0-6c25-11ef-a69f-3da5bd01e0ea.jpg"/>
      <itunes:duration>2756</itunes:duration>
      <itunes:summary>According to Maxime Beauchemin, CEO &amp;amp; Founder at Preset and Creator of Apache Superset and Apache Airflow, building a thriving company is not so straight-forward. So how did he do it?
 
Choosing the right system and services is key for a successful start, and can help you avoid the chaos of having too many tools spread across multiple teams.
 
Max walks the Bros through his recipe for a smart data-driven company, and the genesis of Airflow, Superset &amp;amp; Presto (with some great tidbits about Airflow's old school marketing approach and how the open source platform took on a life of its own).</itunes:summary>
      <itunes:subtitle>According to Maxime Beauchemin, CEO &amp;amp; Founder at Preset and Creator of Apache Superset and Apache Airflow, building a thriving company is not so straight-forward. So how did he do it?
 
Choosing the right system and services is key for a successful start, and can help you avoid the chaos of having too many tools spread across multiple teams.
 
Max walks the Bros through his recipe for a smart data-driven company, and the genesis of Airflow, Superset &amp;amp; Presto (with some great tidbits about Airflow's old school marketing approach and how the open source platform took on a life of its own).</itunes:subtitle>
      <itunes:keywords>data engineering, analytics, data</itunes:keywords>
      <itunes:explicit>No</itunes:explicit>
      <googleplay:explicit>No</googleplay:explicit>
    </item>
    <item>
      <title>How Similarweb Delivers Customer Facing Analytics Over 100s of TBs</title>
      <link>https://podcasts.fame.so/e/pnm55pjn</link>
      <itunes:title>How Similarweb Delivers Customer Facing Analytics Over 100s of TBs</itunes:title>
      <itunes:episode>17</itunes:episode>
      <itunes:block>No</itunes:block>
      <googleplay:block>No</googleplay:block>
      <guid isPermaLink="false">71vllq71</guid>
      <description>According to Yoav Shmaria, VP R&amp;amp;D Platform at Similarweb, the best way to manage data warehouse costs is tagging every table, database or ETL running to have good granularity over every feature.  

Besides handy cost management tips, Yoav walks the bros through the tech stack he implemented to analyze 100s of TBs of web data to serve fast customer-facing analytics.

Full disclosure, Similarweb is a Firebolt customer, but the bros kept it objective, and there’s no Firebolt talk in this episode.</description>
      <content:encoded><![CDATA[<p>According to Yoav Shmaria, VP R&amp;D Platform at Similarweb, the best way to manage data warehouse costs is to tag every table, database or ETL running to have good granularity over every feature.  </p><p>Besides handy cost management tips, Yoav walks the bros through the tech stack he implemented to analyze 100s of TBs of web data to serve fast customer-facing analytics.</p><p>Full disclosure, Similarweb is a Firebolt customer, but the bros kept it objective, and there’s no Firebolt talk in this episode.</p><div>The Data Engineering Show is brought to you by <a href="https://www.firebolt.io/">firebolt.io</a> and handcrafted by our friends over at: <a href="https://www.fame.so/?utm_medium=podcast&amp;utm_source=bcast&amp;utm_campaign=masters-of-community-with-david-spinks?utm_medium=podcast&amp;utm_source=bcast&amp;utm_campaign=confessions-of-a-b2b-marketer">fame.so</a><br><br>Previous guests include: Joseph Machado of Linkedin, Metthew Weingarten of Disney, Joe Reis and Matt Housely, authors of <em>The Fundamentals of Data Engineering, </em>Zach Wilson of Eczachly Inc, Megan Lieu of Deepnote, Erik Heintare of Bolt, Lior Solomon of Vimeo, Krishna Naidu of Canva, Mike Cohen of Substack, Jens Larsson of Ark, Gunnar Tangring of Klarna, Yoav Shmaria of Similarweb and Xiaoxu Gao of Adyen.<br><br>Check out our three most downloaded episodes:</div><ul><li><a href="https://www.dataengineeringshow.com/e/6nrrrlzn">Zach Wilson on What Makes a Great Data Engineer</a></li><li><a href="https://www.dataengineeringshow.com/e/v8w441mn">Joe Reis and Matt Housley on The Fundamentals of Data Engineering</a></li><li><a href="https://www.dataengineeringshow.com/e/p8lxxy18">Bill Inmon, The Godfather of Data Warehousing</a></li></ul>]]></content:encoded>
      <pubDate>Wed, 13 Jul 2022 23:56:28 +0000</pubDate>
      <author>The Firebolt Data Bros</author>
      <enclosure url="https://media.fame.so/83l0q42w.mp3" length="35752620" type="audio/mpeg"/>
      <itunes:author>The Firebolt Data Bros</itunes:author>
      <itunes:image href="https://content.fameapp.so/uploads/86qywn5q/caa7efb0-6c25-11ef-9489-d768d5a0a06d/caa7edf0-6c25-11ef-a69f-3da5bd01e0ea.jpg"/>
      <itunes:duration>2231</itunes:duration>
      <itunes:summary>According to Yoav Shmaria, VP R&amp;amp;D Platform at Similarweb, the best way to manage data warehouse costs is tagging every table, database or ETL running to have good granularity over every feature.  

Besides handy cost management tips, Yoav walks the bros through the tech stack he implemented to analyze 100s of TBs of web data to serve fast customer-facing analytics.

Full disclosure, Similarweb is a Firebolt customer, but the bros kept it objective, and there’s no Firebolt talk in this episode.</itunes:summary>
      <itunes:subtitle>According to Yoav Shmaria, VP R&amp;amp;D Platform at Similarweb, the best way to manage data warehouse costs is tagging every table, database or ETL running to have good granularity over every feature.  

Besides handy cost management tips, Yoav walks the bros through the tech stack he implemented to analyze 100s of TBs of web data to serve fast customer-facing analytics.

Full disclosure, Similarweb is a Firebolt customer, but the bros kept it objective, and there’s no Firebolt talk in this episode.</itunes:subtitle>
      <itunes:keywords>data engineering, analytics, data</itunes:keywords>
      <itunes:explicit>No</itunes:explicit>
      <googleplay:explicit>No</googleplay:explicit>
    </item>
    <item>
      <title>How Klarna Designed a New Data Platform in the Cloud</title>
      <link>https://podcasts.fame.so/e/2nx0096n</link>
      <itunes:title>How Klarna Designed a New Data Platform in the Cloud</itunes:title>
      <itunes:episode>16</itunes:episode>
      <itunes:block>No</itunes:block>
      <googleplay:block>No</googleplay:block>
      <guid isPermaLink="false">61m66xv0</guid>
      <description>Klarna is one of the leading fintech companies in the world, valued at $45B. 

While many corporations are “stuck” on-prem, Klarna made the move and today is a cloud-only company. Gunnar Tangring, Klarna’s Lead Data Engineer tells Boaz what this new modernized stack looks like.</description>
      <content:encoded><![CDATA[<p>Klarna is one of the leading fintech companies in the world, valued at $45B. </p><p>While many corporations are “stuck” on-prem, Klarna made the move and today is a cloud-only company. Gunnar Tangring, Klarna’s Lead Data Engineer tells Boaz what this new modernized stack looks like.</p><div>The Data Engineering Show is brought to you by <a href="https://www.firebolt.io/">firebolt.io</a> and handcrafted by our friends over at: <a href="https://www.fame.so/?utm_medium=podcast&amp;utm_source=bcast&amp;utm_campaign=masters-of-community-with-david-spinks?utm_medium=podcast&amp;utm_source=bcast&amp;utm_campaign=confessions-of-a-b2b-marketer">fame.so</a><br><br>Previous guests include: Joseph Machado of Linkedin, Metthew Weingarten of Disney, Joe Reis and Matt Housely, authors of <em>The Fundamentals of Data Engineering, </em>Zach Wilson of Eczachly Inc, Megan Lieu of Deepnote, Erik Heintare of Bolt, Lior Solomon of Vimeo, Krishna Naidu of Canva, Mike Cohen of Substack, Jens Larsson of Ark, Gunnar Tangring of Klarna, Yoav Shmaria of Similarweb and Xiaoxu Gao of Adyen.<br><br>Check out our three most downloaded episodes:</div><ul><li><a href="https://www.dataengineeringshow.com/e/6nrrrlzn">Zach Wilson on What Makes a Great Data Engineer</a></li><li><a href="https://www.dataengineeringshow.com/e/v8w441mn">Joe Reis and Matt Housley on The Fundamentals of Data Engineering</a></li><li><a href="https://www.dataengineeringshow.com/e/p8lxxy18">Bill Inmon, The Godfather of Data Warehousing</a></li></ul>]]></content:encoded>
      <pubDate>Thu, 09 Jun 2022 04:51:21 +0000</pubDate>
      <author>The Firebolt Data Bros</author>
      <enclosure url="https://media.fame.so/wk470vr8.mp3" length="39037105" type="audio/mpeg"/>
      <itunes:author>The Firebolt Data Bros</itunes:author>
      <itunes:image href="https://content.fameapp.so/uploads/86qywn5q/caa7efb0-6c25-11ef-9489-d768d5a0a06d/caa7edf0-6c25-11ef-a69f-3da5bd01e0ea.jpg"/>
      <itunes:duration>2437</itunes:duration>
      <itunes:summary>Klarna is one of the leading fintech companies in the world, valued at $45B. 

While many corporations are “stuck” on-prem, Klarna made the move and today is a cloud-only company. Gunnar Tangring, Klarna’s Lead Data Engineer tells Boaz what this new modernized stack looks like.</itunes:summary>
      <itunes:subtitle>Klarna is one of the leading fintech companies in the world, valued at $45B. 

While many corporations are “stuck” on-prem, Klarna made the move and today is a cloud-only company. Gunnar Tangring, Klarna’s Lead Data Engineer tells Boaz what this new modernized stack looks like.</itunes:subtitle>
      <itunes:keywords>data engineering, analytics, data</itunes:keywords>
      <itunes:explicit>No</itunes:explicit>
      <googleplay:explicit>No</googleplay:explicit>
    </item>
    <item>
      <title>How Eventbrite is Modernizing its Data Stack</title>
      <link>https://podcasts.fame.so/e/vn555vqn</link>
      <itunes:title>How Eventbrite is Modernizing its Data Stack</itunes:title>
      <itunes:episode>15</itunes:episode>
      <itunes:block>No</itunes:block>
      <googleplay:block>No</googleplay:block>
      <guid isPermaLink="false">80qqq8n0</guid>
      <description>Archana Ganapathi, Head of Data &amp;amp; Analytics Engineering at Eventbrite, shares Eventbrite’s data stack modernization process, and how you get engineers to adopt new technologies like dbt which may be outside their comfort zone.</description>
      <content:encoded><![CDATA[<p>Archana shares Eventbrite’s data stack modernization process, and how you get engineers to adopt new technologies like dbt which may be outside their comfort zone.  </p><div>The Data Engineering Show is brought to you by <a href="https://www.firebolt.io/">firebolt.io</a> and handcrafted by our friends over at: <a href="https://www.fame.so/?utm_medium=podcast&amp;utm_source=bcast&amp;utm_campaign=masters-of-community-with-david-spinks?utm_medium=podcast&amp;utm_source=bcast&amp;utm_campaign=confessions-of-a-b2b-marketer">fame.so</a><br><br>Previous guests include: Joseph Machado of Linkedin, Metthew Weingarten of Disney, Joe Reis and Matt Housely, authors of <em>The Fundamentals of Data Engineering, </em>Zach Wilson of Eczachly Inc, Megan Lieu of Deepnote, Erik Heintare of Bolt, Lior Solomon of Vimeo, Krishna Naidu of Canva, Mike Cohen of Substack, Jens Larsson of Ark, Gunnar Tangring of Klarna, Yoav Shmaria of Similarweb and Xiaoxu Gao of Adyen.<br><br>Check out our three most downloaded episodes:</div><ul><li><a href="https://www.dataengineeringshow.com/e/6nrrrlzn">Zach Wilson on What Makes a Great Data Engineer</a></li><li><a href="https://www.dataengineeringshow.com/e/v8w441mn">Joe Reis and Matt Housley on The Fundamentals of Data Engineering</a></li><li><a href="https://www.dataengineeringshow.com/e/p8lxxy18">Bill Inmon, The Godfather of Data Warehousing</a></li></ul>]]></content:encoded>
      <pubDate>Mon, 23 May 2022 02:46:02 +0000</pubDate>
      <author>The Firebolt Data Bros</author>
      <enclosure url="https://media.fame.so/wz71y448.mp3" length="22534779" type="audio/mpeg"/>
      <itunes:author>The Firebolt Data Bros</itunes:author>
      <itunes:image href="https://content.fameapp.so/uploads/86qywn5q/caa7efb0-6c25-11ef-9489-d768d5a0a06d/caa7edf0-6c25-11ef-a69f-3da5bd01e0ea.jpg"/>
      <itunes:duration>1405</itunes:duration>
      <itunes:summary>Archana Ganapathi, Head of Data &amp;amp; Analytics Engineering at Eventbrite, shares Eventbrite’s data stack modernization process, and how you get engineers to adopt new technologies like dbt which may be outside their comfort zone.</itunes:summary>
      <itunes:subtitle>Archana Ganapathi, Head of Data &amp;amp; Analytics Engineering at Eventbrite, shares Eventbrite’s data stack modernization process, and how you get engineers to adopt new technologies like dbt which may be outside their comfort zone.</itunes:subtitle>
      <itunes:keywords>data engineering, analytics, data</itunes:keywords>
      <itunes:explicit>No</itunes:explicit>
      <googleplay:explicit>No</googleplay:explicit>
    </item>
    <item>
      <title>A Deep Dive into Slack's Data Architecture</title>
      <link>https://podcasts.fame.so/e/x8144wxn</link>
      <itunes:title>A Deep Dive into Slack's Data Architecture</itunes:title>
      <itunes:episode>14</itunes:episode>
      <itunes:block>No</itunes:block>
      <googleplay:block>No</googleplay:block>
      <guid isPermaLink="false">713ppxj0</guid>
      <description>Growing from a startup to an IPOed and then an acquired company meant that Slack’s sales org was scaling rapidly. 
Apun Hiran, Slack’s Director of Software Engineering explains how the data stack and architecture evolved to support this growth with more reliable and timely metrics. 


Speaker: Apun Hiran, Director of Software Engineering (Data), Slack
Hosts: Eldad and Boaz Farkash, CEO and CPO, Firebolt</description>
      <content:encoded><![CDATA[<p>Growing from a startup to an IPOed and then an acquired company meant that Slack’s sales org was scaling rapidly. <br>Apun Hiran, Slack’s Director of Software Engineering explains how the data stack and architecture evolved to support this growth with more reliable and timely metrics. </p><p><br>Speaker: Apun Hiran, Director of Software Engineering (Data), Slack<br>Hosts: Eldad and Boaz Farkash, CEO and CPO, Firebolt</p><div>The Data Engineering Show is brought to you by <a href="https://www.firebolt.io/">firebolt.io</a> and handcrafted by our friends over at: <a href="https://www.fame.so/?utm_medium=podcast&amp;utm_source=bcast&amp;utm_campaign=masters-of-community-with-david-spinks?utm_medium=podcast&amp;utm_source=bcast&amp;utm_campaign=confessions-of-a-b2b-marketer">fame.so</a><br><br>Previous guests include: Joseph Machado of Linkedin, Metthew Weingarten of Disney, Joe Reis and Matt Housely, authors of <em>The Fundamentals of Data Engineering, </em>Zach Wilson of Eczachly Inc, Megan Lieu of Deepnote, Erik Heintare of Bolt, Lior Solomon of Vimeo, Krishna Naidu of Canva, Mike Cohen of Substack, Jens Larsson of Ark, Gunnar Tangring of Klarna, Yoav Shmaria of Similarweb and Xiaoxu Gao of Adyen.<br><br>Check out our three most downloaded episodes:</div><ul><li><a href="https://www.dataengineeringshow.com/e/6nrrrlzn">Zach Wilson on What Makes a Great Data Engineer</a></li><li><a href="https://www.dataengineeringshow.com/e/v8w441mn">Joe Reis and Matt Housley on The Fundamentals of Data Engineering</a></li><li><a href="https://www.dataengineeringshow.com/e/p8lxxy18">Bill Inmon, The Godfather of Data Warehousing</a></li></ul>]]></content:encoded>
      <pubDate>Tue, 10 May 2022 23:15:55 +0000</pubDate>
      <author>The Firebolt Data Bros</author>
      <enclosure url="https://media.fame.so/w163jyz8.mp3" length="32788224" type="audio/mpeg"/>
      <itunes:author>The Firebolt Data Bros</itunes:author>
      <itunes:image href="https://content.fameapp.so/uploads/86qywn5q/caa7efb0-6c25-11ef-9489-d768d5a0a06d/caa7edf0-6c25-11ef-a69f-3da5bd01e0ea.jpg"/>
      <itunes:duration>2046</itunes:duration>
      <itunes:summary>Growing from a startup to an IPOed and then an acquired company meant that Slack’s sales org was scaling rapidly. 
Apun Hiran, Slack’s Director of Software Engineering explains how the data stack and architecture evolved to support this growth with more reliable and timely metrics. 


Speaker: Apun Hiran, Director of Software Engineering (Data), Slack
Hosts: Eldad and Boaz Farkash, CEO and CPO, Firebolt</itunes:summary>
      <itunes:subtitle>Growing from a startup to an IPOed and then an acquired company meant that Slack’s sales org was scaling rapidly. 
Apun Hiran, Slack’s Director of Software Engineering explains how the data stack and architecture evolved to support this growth with more reliable and timely metrics. 


Speaker: Apun Hiran, Director of Software Engineering (Data), Slack
Hosts: Eldad and Boaz Farkash, CEO and CPO, Firebolt</itunes:subtitle>
      <itunes:keywords>data engineering, analytics, data</itunes:keywords>
      <itunes:explicit>No</itunes:explicit>
      <googleplay:explicit>No</googleplay:explicit>
    </item>
    <item>
      <title>Transitioning Scopely’s 5.5 PB Data Platform to the Modern Data Stack</title>
      <link>https://podcasts.fame.so/e/xnymm5ln</link>
      <itunes:title>Transitioning Scopely’s 5.5 PB Data Platform to the Modern Data Stack</itunes:title>
      <itunes:episode>13</itunes:episode>
      <itunes:block>No</itunes:block>
      <googleplay:block>No</googleplay:block>
      <guid isPermaLink="false">l14nnyw1</guid>
      <description>Should data engineering AND BI be handled by the same people? According to Jonathan Palmer, VP Data Platform at Scopely – YES. By Analytics Engineers. 

His team of Analytics Engineers is in the final stages of transitioning 5.5 PBs of data which include 15B evens per day to the modern data stack. Tune in to learn how they did it.</description>
      <content:encoded><![CDATA[Should data engineering AND BI be handled by the same people? According to Jonathan Palmer, VP Data Platform at Scopely – YES. By Analytics Engineers. 

His team of Analytics Engineers is in the final stages of transitioning 5.5 PBs of data which include 15B evens per day to the modern data stack. Tune in to learn how they did it.<div>The Data Engineering Show is brought to you by <a href="https://www.firebolt.io/">firebolt.io</a> and handcrafted by our friends over at: <a href="https://www.fame.so/?utm_medium=podcast&amp;utm_source=bcast&amp;utm_campaign=masters-of-community-with-david-spinks?utm_medium=podcast&amp;utm_source=bcast&amp;utm_campaign=confessions-of-a-b2b-marketer">fame.so</a><br><br>Previous guests include: Joseph Machado of Linkedin, Metthew Weingarten of Disney, Joe Reis and Matt Housely, authors of <em>The Fundamentals of Data Engineering, </em>Zach Wilson of Eczachly Inc, Megan Lieu of Deepnote, Erik Heintare of Bolt, Lior Solomon of Vimeo, Krishna Naidu of Canva, Mike Cohen of Substack, Jens Larsson of Ark, Gunnar Tangring of Klarna, Yoav Shmaria of Similarweb and Xiaoxu Gao of Adyen.<br><br>Check out our three most downloaded episodes:</div><ul><li><a href="https://www.dataengineeringshow.com/e/6nrrrlzn">Zach Wilson on What Makes a Great Data Engineer</a></li><li><a href="https://www.dataengineeringshow.com/e/v8w441mn">Joe Reis and Matt Housley on The Fundamentals of Data Engineering</a></li><li><a href="https://www.dataengineeringshow.com/e/p8lxxy18">Bill Inmon, The Godfather of Data Warehousing</a></li></ul>]]></content:encoded>
      <pubDate>Tue, 12 Apr 2022 05:08:42 +0000</pubDate>
      <author>The Firebolt Data Bros</author>
      <enclosure url="https://media.fame.so/wvy264p8.mp3" length="30640447" type="audio/mpeg"/>
      <itunes:author>The Firebolt Data Bros</itunes:author>
      <itunes:image href="https://content.fameapp.so/uploads/86qywn5q/caa7efb0-6c25-11ef-9489-d768d5a0a06d/caa7edf0-6c25-11ef-a69f-3da5bd01e0ea.jpg"/>
      <itunes:duration>1912</itunes:duration>
      <itunes:summary>Should data engineering AND BI be handled by the same people? According to Jonathan Palmer, VP Data Platform at Scopely – YES. By Analytics Engineers. 

His team of Analytics Engineers is in the final stages of transitioning 5.5 PBs of data which include 15B evens per day to the modern data stack. Tune in to learn how they did it.</itunes:summary>
      <itunes:subtitle>Should data engineering AND BI be handled by the same people? According to Jonathan Palmer, VP Data Platform at Scopely – YES. By Analytics Engineers. 

His team of Analytics Engineers is in the final stages of transitioning 5.5 PBs of data which include 15B evens per day to the modern data stack. Tune in to learn how they did it.</itunes:subtitle>
      <itunes:keywords>data engineering, analytics, data</itunes:keywords>
      <itunes:explicit>No</itunes:explicit>
      <googleplay:explicit>No</googleplay:explicit>
    </item>
    <item>
      <title>Getting rid of raw data with Jens Larsson</title>
      <link>https://podcasts.fame.so/e/r8kmm2v8</link>
      <itunes:title>Getting rid of raw data with Jens Larsson</itunes:title>
      <itunes:episode>12</itunes:episode>
      <itunes:block>No</itunes:block>
      <googleplay:block>No</googleplay:block>
      <guid isPermaLink="false">71w77rw1</guid>
      <description>Why would you create ugly data? According to Jens Larsson, don’t even go near raw data. Jens started off at Google, continued to manage data science at Spotify, caught the startup bug at Tink, and recently joined an exciting new company called Ark Kapital, together with Spotify’s former VP Analytics. Jens explains how he and his team killed the notion of raw data at Tink and walks us through the Google, Spotify and Ark Kapital data stacks.</description>
      <content:encoded><![CDATA[<p>Why would you create ugly data? According to Jens Larsson, don’t even go near raw data. Jens started off at Google, continued to manage data science at Spotify, caught the startup bug at Tink, and recently joined an exciting new company called Ark Kapital, together with Spotify’s former VP Analytics. Jens explains how he and his team killed the notion of raw data at Tink and walks us through the Google, Spotify and Ark Kapital data stacks.</p><div>The Data Engineering Show is brought to you by <a href="https://www.firebolt.io/">firebolt.io</a> and handcrafted by our friends over at: <a href="https://www.fame.so/?utm_medium=podcast&amp;utm_source=bcast&amp;utm_campaign=masters-of-community-with-david-spinks?utm_medium=podcast&amp;utm_source=bcast&amp;utm_campaign=confessions-of-a-b2b-marketer">fame.so</a><br><br>Previous guests include: Joseph Machado of Linkedin, Metthew Weingarten of Disney, Joe Reis and Matt Housely, authors of <em>The Fundamentals of Data Engineering, </em>Zach Wilson of Eczachly Inc, Megan Lieu of Deepnote, Erik Heintare of Bolt, Lior Solomon of Vimeo, Krishna Naidu of Canva, Mike Cohen of Substack, Jens Larsson of Ark, Gunnar Tangring of Klarna, Yoav Shmaria of Similarweb and Xiaoxu Gao of Adyen.<br><br>Check out our three most downloaded episodes:</div><ul><li><a href="https://www.dataengineeringshow.com/e/6nrrrlzn">Zach Wilson on What Makes a Great Data Engineer</a></li><li><a href="https://www.dataengineeringshow.com/e/v8w441mn">Joe Reis and Matt Housley on The Fundamentals of Data Engineering</a></li><li><a href="https://www.dataengineeringshow.com/e/p8lxxy18">Bill Inmon, The Godfather of Data Warehousing</a></li></ul>]]></content:encoded>
      <pubDate>Tue, 22 Mar 2022 00:32:47 +0000</pubDate>
      <author>The Firebolt Data Bros</author>
      <enclosure url="https://media.fame.so/wmk7j4yw.mp3" length="27899959" type="audio/mpeg"/>
      <itunes:author>The Firebolt Data Bros</itunes:author>
      <itunes:image href="https://content.fameapp.so/uploads/86qywn5q/caa7efb0-6c25-11ef-9489-d768d5a0a06d/caa7edf0-6c25-11ef-a69f-3da5bd01e0ea.jpg"/>
      <itunes:duration>1741</itunes:duration>
      <itunes:summary>Why would you create ugly data? According to Jens Larsson, don’t even go near raw data. Jens started off at Google, continued to manage data science at Spotify, caught the startup bug at Tink, and recently joined an exciting new company called Ark Kapital, together with Spotify’s former VP Analytics. Jens explains how he and his team killed the notion of raw data at Tink and walks us through the Google, Spotify and Ark Kapital data stacks.</itunes:summary>
      <itunes:subtitle>Why would you create ugly data? According to Jens Larsson, don’t even go near raw data. Jens started off at Google, continued to manage data science at Spotify, caught the startup bug at Tink, and recently joined an exciting new company called Ark Kapital, together with Spotify’s former VP Analytics. Jens explains how he and his team killed the notion of raw data at Tink and walks us through the Google, Spotify and Ark Kapital data stacks.</itunes:subtitle>
      <itunes:keywords>data engineering, analytics, data</itunes:keywords>
      <itunes:explicit>No</itunes:explicit>
      <googleplay:explicit>No</googleplay:explicit>
    </item>
    <item>
      <title>How Zendesk engineers manage customer-facing data applications</title>
      <link>https://podcasts.fame.so/e/4n922l68</link>
      <itunes:title>How Zendesk engineers manage customer-facing data applications</itunes:title>
      <itunes:episode>11</itunes:episode>
      <itunes:block>No</itunes:block>
      <googleplay:block>No</googleplay:block>
      <guid isPermaLink="false">v07vvjq1</guid>
      <description>This time on the data engineering show, Eldad abandoned his brother Boaz but it’s ok because Boaz got the full 30 minutes to talk to one of the most interesting people in the data space. 

Ananth Packkildurai is Principal Software Engineer at Zendesk and runs one of the strongest newsletters in data – Data Engineering Weekly. 

He talked about data applications at Zendesk and how they’re built, technologies that excite him like data lineage and data catalog, and the best routes for software engineers to get their hands dirty in the data world.</description>
      <content:encoded><![CDATA[<p>This time on the data engineering show, Eldad abandoned his brother Boaz but it’s ok because Boaz got the full 30 minutes to talk to one of the most interesting people in the data space. </p><p>Ananth Packkildurai is Principal Software Engineer at Zendesk and runs one of the strongest newsletters in data – Data Engineering Weekly. He talked about data applications at Zendesk and how they’re built, technologies that excite him like data lineage and data catalog, and the best routes for software engineers to get their hands dirty in the data world.</p><p>INTERVIEWER: Boaz Farkash.</p><p>ZENDESK GUEST:  Ananth Packkildura - Principal Software Engineer.</p><p><br></p><div>The Data Engineering Show is brought to you by <a href="https://www.firebolt.io/">firebolt.io</a> and handcrafted by our friends over at: <a href="https://www.fame.so/?utm_medium=podcast&amp;utm_source=bcast&amp;utm_campaign=masters-of-community-with-david-spinks?utm_medium=podcast&amp;utm_source=bcast&amp;utm_campaign=confessions-of-a-b2b-marketer">fame.so</a><br><br>Previous guests include: Joseph Machado of Linkedin, Metthew Weingarten of Disney, Joe Reis and Matt Housely, authors of <em>The Fundamentals of Data Engineering, </em>Zach Wilson of Eczachly Inc, Megan Lieu of Deepnote, Erik Heintare of Bolt, Lior Solomon of Vimeo, Krishna Naidu of Canva, Mike Cohen of Substack, Jens Larsson of Ark, Gunnar Tangring of Klarna, Yoav Shmaria of Similarweb and Xiaoxu Gao of Adyen.<br><br>Check out our three most downloaded episodes:</div><ul><li><a href="https://www.dataengineeringshow.com/e/6nrrrlzn">Zach Wilson on What Makes a Great Data Engineer</a></li><li><a href="https://www.dataengineeringshow.com/e/v8w441mn">Joe Reis and Matt Housley on The Fundamentals of Data Engineering</a></li><li><a href="https://www.dataengineeringshow.com/e/p8lxxy18">Bill Inmon, The Godfather of Data Warehousing</a></li></ul>]]></content:encoded>
      <pubDate>Thu, 17 Feb 2022 06:15:21 +0000</pubDate>
      <author>The Firebolt Data Bros</author>
      <enclosure url="https://media.fame.so/w53qvr6w.mp3" length="32180633" type="audio/mpeg"/>
      <itunes:author>The Firebolt Data Bros</itunes:author>
      <itunes:image href="https://content.fameapp.so/uploads/86qywn5q/caa7efb0-6c25-11ef-9489-d768d5a0a06d/caa7edf0-6c25-11ef-a69f-3da5bd01e0ea.jpg"/>
      <itunes:duration>2008</itunes:duration>
      <itunes:summary>This time on the data engineering show, Eldad abandoned his brother Boaz but it’s ok because Boaz got the full 30 minutes to talk to one of the most interesting people in the data space. 

Ananth Packkildurai is Principal Software Engineer at Zendesk and runs one of the strongest newsletters in data – Data Engineering Weekly. 

He talked about data applications at Zendesk and how they’re built, technologies that excite him like data lineage and data catalog, and the best routes for software engineers to get their hands dirty in the data world.</itunes:summary>
      <itunes:subtitle>This time on the data engineering show, Eldad abandoned his brother Boaz but it’s ok because Boaz got the full 30 minutes to talk to one of the most interesting people in the data space. 

Ananth Packkildurai is Principal Software Engineer at Zendesk and runs one of the strongest newsletters in data – Data Engineering Weekly. 

He talked about data applications at Zendesk and how they’re built, technologies that excite him like data lineage and data catalog, and the best routes for software engineers to get their hands dirty in the data world.</itunes:subtitle>
      <itunes:keywords>data engineering, analytics, data</itunes:keywords>
      <itunes:explicit>No</itunes:explicit>
      <googleplay:explicit>No</googleplay:explicit>
    </item>
    <item>
      <title>How are those data intensive customer facing apps engineered at Gong?</title>
      <link>https://podcasts.fame.so/e/r87yy6r8</link>
      <itunes:title>How are those data intensive customer facing apps engineered at Gong?</itunes:title>
      <itunes:episode>10</itunes:episode>
      <itunes:block>No</itunes:block>
      <googleplay:block>No</googleplay:block>
      <guid isPermaLink="false">k08mm281</guid>
      <description>Gong manages hundreds of thousands of videoconferences and millions of emails PER DAY, which add up to hundreds of TBs. 

The Data Bros met Yarin Benado, Gong’s engineering manager to understand what is required to move to a modern data stack to support all this, what this stack looks like, and why it all comes down to data quality at the end of the day.</description>
      <content:encoded><![CDATA[<p>Gong manages hundreds of thousands of videoconferences and millions of emails PER DAY, which add up to hundreds of TBs. </p><p>The Data Bros met Yarin Benado, Gong’s engineering manager to understand what is required to move to a modern data stack to support all this, what this stack looks like, and why it all comes down to data quality at the end of the day. </p><div>The Data Engineering Show is brought to you by <a href="https://www.firebolt.io/">firebolt.io</a> and handcrafted by our friends over at: <a href="https://www.fame.so/?utm_medium=podcast&amp;utm_source=bcast&amp;utm_campaign=masters-of-community-with-david-spinks?utm_medium=podcast&amp;utm_source=bcast&amp;utm_campaign=confessions-of-a-b2b-marketer">fame.so</a><br><br>Previous guests include: Joseph Machado of Linkedin, Metthew Weingarten of Disney, Joe Reis and Matt Housely, authors of <em>The Fundamentals of Data Engineering, </em>Zach Wilson of Eczachly Inc, Megan Lieu of Deepnote, Erik Heintare of Bolt, Lior Solomon of Vimeo, Krishna Naidu of Canva, Mike Cohen of Substack, Jens Larsson of Ark, Gunnar Tangring of Klarna, Yoav Shmaria of Similarweb and Xiaoxu Gao of Adyen.<br><br>Check out our three most downloaded episodes:</div><ul><li><a href="https://www.dataengineeringshow.com/e/6nrrrlzn">Zach Wilson on What Makes a Great Data Engineer</a></li><li><a href="https://www.dataengineeringshow.com/e/v8w441mn">Joe Reis and Matt Housley on The Fundamentals of Data Engineering</a></li><li><a href="https://www.dataengineeringshow.com/e/p8lxxy18">Bill Inmon, The Godfather of Data Warehousing</a></li></ul>]]></content:encoded>
      <pubDate>Thu, 20 Jan 2022 05:37:19 +0000</pubDate>
      <author>The Firebolt Data Bros</author>
      <enclosure url="https://media.fame.so/wrj7xykw.mp3" length="25265960" type="audio/mpeg"/>
      <itunes:author>The Firebolt Data Bros</itunes:author>
      <itunes:image href="https://content.fameapp.so/uploads/86qywn5q/caa7efb0-6c25-11ef-9489-d768d5a0a06d/caa7edf0-6c25-11ef-a69f-3da5bd01e0ea.jpg"/>
      <itunes:duration>1576</itunes:duration>
      <itunes:summary>Gong manages hundreds of thousands of videoconferences and millions of emails PER DAY, which add up to hundreds of TBs. 

The Data Bros met Yarin Benado, Gong’s engineering manager to understand what is required to move to a modern data stack to support all this, what this stack looks like, and why it all comes down to data quality at the end of the day.</itunes:summary>
      <itunes:subtitle>Gong manages hundreds of thousands of videoconferences and millions of emails PER DAY, which add up to hundreds of TBs. 

The Data Bros met Yarin Benado, Gong’s engineering manager to understand what is required to move to a modern data stack to support all this, what this stack looks like, and why it all comes down to data quality at the end of the day.</itunes:subtitle>
      <itunes:keywords>data engineering, analytics, data</itunes:keywords>
      <itunes:explicit>No</itunes:explicit>
      <googleplay:explicit>No</googleplay:explicit>
    </item>
    <item>
      <title>How Bolt Engineers Are Designing Its Next-Gen Data Platform</title>
      <link>https://podcasts.fame.so/e/18pvvxx8</link>
      <itunes:title>How Bolt Engineers Are Designing Its Next-Gen Data Platform</itunes:title>
      <itunes:episode>9</itunes:episode>
      <itunes:block>No</itunes:block>
      <googleplay:block>No</googleplay:block>
      <guid isPermaLink="false">z1r33550</guid>
      <description>Bolt's ride-hailing app serves over 75M users in Europe and Africa and handles 500K queries every day. 

Erik Heintare along with Bolt's engineering team is in the midst of designing a new next-gen data platform and is sharing how it's going to solve their biggest data challenges. 

Guest: Erik Heintare - Senior Analytics Engineer at Bolt
Hosts: Eldad and Boaz Farkash, AKA The Data Bros</description>
      <content:encoded><![CDATA[<p>Bolt's ride-hailing app serves 2B users in Europe and Africa and handles 500K queries every day. </p><p>Erik Heintare along with Bolt's engineering team is in the midst of designing a new next-gen data platform and is sharing how it's going to solve their biggest data challenges. </p><p>Guest: Erik Heintare - Senior Analytics Engineer at Bolt<br>Hosts: Eldad and Boaz Farkash, AKA The Data Bros</p><div>The Data Engineering Show is brought to you by <a href="https://www.firebolt.io/">firebolt.io</a> and handcrafted by our friends over at: <a href="https://www.fame.so/?utm_medium=podcast&amp;utm_source=bcast&amp;utm_campaign=masters-of-community-with-david-spinks?utm_medium=podcast&amp;utm_source=bcast&amp;utm_campaign=confessions-of-a-b2b-marketer">fame.so</a><br><br>Previous guests include: Joseph Machado of Linkedin, Metthew Weingarten of Disney, Joe Reis and Matt Housely, authors of <em>The Fundamentals of Data Engineering, </em>Zach Wilson of Eczachly Inc, Megan Lieu of Deepnote, Erik Heintare of Bolt, Lior Solomon of Vimeo, Krishna Naidu of Canva, Mike Cohen of Substack, Jens Larsson of Ark, Gunnar Tangring of Klarna, Yoav Shmaria of Similarweb and Xiaoxu Gao of Adyen.<br><br>Check out our three most downloaded episodes:</div><ul><li><a href="https://www.dataengineeringshow.com/e/6nrrrlzn">Zach Wilson on What Makes a Great Data Engineer</a></li><li><a href="https://www.dataengineeringshow.com/e/v8w441mn">Joe Reis and Matt Housley on The Fundamentals of Data Engineering</a></li><li><a href="https://www.dataengineeringshow.com/e/p8lxxy18">Bill Inmon, The Godfather of Data Warehousing</a></li></ul>]]></content:encoded>
      <pubDate>Tue, 14 Dec 2021 02:03:59 +0000</pubDate>
      <author>The Firebolt Data Bros</author>
      <enclosure url="https://media.fame.so/w3l0q928.mp3" length="34531733" type="audio/mpeg"/>
      <itunes:author>The Firebolt Data Bros</itunes:author>
      <itunes:image href="https://content.fameapp.so/uploads/86qywn5q/caa7efb0-6c25-11ef-9489-d768d5a0a06d/caa7edf0-6c25-11ef-a69f-3da5bd01e0ea.jpg"/>
      <itunes:duration>2155</itunes:duration>
      <itunes:summary>Bolt's ride-hailing app serves over 75M users in Europe and Africa and handles 500K queries every day. 

Erik Heintare along with Bolt's engineering team is in the midst of designing a new next-gen data platform and is sharing how it's going to solve their biggest data challenges. 

Guest: Erik Heintare - Senior Analytics Engineer at Bolt
Hosts: Eldad and Boaz Farkash, AKA The Data Bros</itunes:summary>
      <itunes:subtitle>Bolt's ride-hailing app serves over 75M users in Europe and Africa and handles 500K queries every day. 

Erik Heintare along with Bolt's engineering team is in the midst of designing a new next-gen data platform and is sharing how it's going to solve their biggest data challenges. 

Guest: Erik Heintare - Senior Analytics Engineer at Bolt
Hosts: Eldad and Boaz Farkash, AKA The Data Bros</itunes:subtitle>
      <itunes:keywords>data engineering, analytics, data</itunes:keywords>
      <itunes:explicit>No</itunes:explicit>
      <googleplay:explicit>No</googleplay:explicit>
    </item>
    <item>
      <title>How did Agoda scale its data platform to support 1.5T events per day?</title>
      <link>https://podcasts.fame.so/e/5nzxxwyn</link>
      <itunes:title>How did Agoda scale its data platform to support 1.5T events per day?</itunes:title>
      <itunes:episode>8</itunes:episode>
      <itunes:block>No</itunes:block>
      <googleplay:block>No</googleplay:block>
      <guid isPermaLink="false">80z77381</guid>
      <description>Scaling a data platform to support 1.5T events per day requires complicated technical migrations and alignment between hundreds of engineers. What to see how Agoda did it.

Guests: 
Amir Arad, Director of Machine Learning, Agoda
Shaun Sit, Senior Dev Manager, Agoda

Hosts: 
The Data Bros - Eldad and Boaz Farkash</description>
      <content:encoded><![CDATA[<p>Scaling a data platform to support 1.5T events per day requires complicated technical migrations and alignment between hundreds of engineers. What to see how Agoda did it.</p><p>Guests: <br>Amir Arad, Director of Machine Learning, Agoda<br>Shaun Sit, Senior Dev Manager, Agoda</p><p>Hosts: <br>The Data Bros - Eldad and Boaz Farkash</p><div>The Data Engineering Show is brought to you by <a href="https://www.firebolt.io/">firebolt.io</a> and handcrafted by our friends over at: <a href="https://www.fame.so/?utm_medium=podcast&amp;utm_source=bcast&amp;utm_campaign=masters-of-community-with-david-spinks?utm_medium=podcast&amp;utm_source=bcast&amp;utm_campaign=confessions-of-a-b2b-marketer">fame.so</a><br><br>Previous guests include: Joseph Machado of Linkedin, Metthew Weingarten of Disney, Joe Reis and Matt Housely, authors of <em>The Fundamentals of Data Engineering, </em>Zach Wilson of Eczachly Inc, Megan Lieu of Deepnote, Erik Heintare of Bolt, Lior Solomon of Vimeo, Krishna Naidu of Canva, Mike Cohen of Substack, Jens Larsson of Ark, Gunnar Tangring of Klarna, Yoav Shmaria of Similarweb and Xiaoxu Gao of Adyen.<br><br>Check out our three most downloaded episodes:</div><ul><li><a href="https://www.dataengineeringshow.com/e/6nrrrlzn">Zach Wilson on What Makes a Great Data Engineer</a></li><li><a href="https://www.dataengineeringshow.com/e/v8w441mn">Joe Reis and Matt Housley on The Fundamentals of Data Engineering</a></li><li><a href="https://www.dataengineeringshow.com/e/p8lxxy18">Bill Inmon, The Godfather of Data Warehousing</a></li></ul>]]></content:encoded>
      <pubDate>Tue, 23 Nov 2021 05:49:12 +0000</pubDate>
      <author>The Firebolt Data Bros</author>
      <enclosure url="https://media.fame.so/8nn76q98.mp3" length="37177299" type="audio/mpeg"/>
      <itunes:author>The Firebolt Data Bros</itunes:author>
      <itunes:image href="https://content.fameapp.so/uploads/86qywn5q/caa7efb0-6c25-11ef-9489-d768d5a0a06d/caa7edf0-6c25-11ef-a69f-3da5bd01e0ea.jpg"/>
      <itunes:duration>2320</itunes:duration>
      <itunes:summary>Scaling a data platform to support 1.5T events per day requires complicated technical migrations and alignment between hundreds of engineers. What to see how Agoda did it.

Guests: 
Amir Arad, Director of Machine Learning, Agoda
Shaun Sit, Senior Dev Manager, Agoda

Hosts: 
The Data Bros - Eldad and Boaz Farkash</itunes:summary>
      <itunes:subtitle>Scaling a data platform to support 1.5T events per day requires complicated technical migrations and alignment between hundreds of engineers. What to see how Agoda did it.

Guests: 
Amir Arad, Director of Machine Learning, Agoda
Shaun Sit, Senior Dev Manager, Agoda

Hosts: 
The Data Bros - Eldad and Boaz Farkash</itunes:subtitle>
      <itunes:keywords>data engineering, analytics, data</itunes:keywords>
      <itunes:explicit>No</itunes:explicit>
      <googleplay:explicit>No</googleplay:explicit>
    </item>
    <item>
      <title>Diving Into GitHub's Data Stack</title>
      <link>https://podcasts.fame.so/e/vnw44wm8</link>
      <itunes:title>Diving Into GitHub's Data Stack</itunes:title>
      <itunes:episode>7</itunes:episode>
      <itunes:block>No</itunes:block>
      <googleplay:block>No</googleplay:block>
      <guid isPermaLink="false">81x77k40</guid>
      <description>It’s the mother of all development projects. You use it daily.  And so do 65M developers around the world. This time on the Data Engineering Show – A deep dive into GitHub’s data stack. Arfon Smith KimYen (Truong) Ladia shared GitHub’s data engineering challenges and solutions and explained why every developer should know and adopt the ADR protocol.</description>
      <content:encoded><![CDATA[It’s the mother of all development projects. You use it daily.  And so do 65M developers around the world. This time on the Data Engineering Show – A deep dive into GitHub’s data stack. Arfon Smith KimYen (Truong) Ladia shared GitHub’s data engineering challenges and solutions and explained why every developer should know and adopt the ADR protocol.<div>The Data Engineering Show is brought to you by <a href="https://www.firebolt.io/">firebolt.io</a> and handcrafted by our friends over at: <a href="https://www.fame.so/?utm_medium=podcast&amp;utm_source=bcast&amp;utm_campaign=masters-of-community-with-david-spinks?utm_medium=podcast&amp;utm_source=bcast&amp;utm_campaign=confessions-of-a-b2b-marketer">fame.so</a><br><br>Previous guests include: Joseph Machado of Linkedin, Metthew Weingarten of Disney, Joe Reis and Matt Housely, authors of <em>The Fundamentals of Data Engineering, </em>Zach Wilson of Eczachly Inc, Megan Lieu of Deepnote, Erik Heintare of Bolt, Lior Solomon of Vimeo, Krishna Naidu of Canva, Mike Cohen of Substack, Jens Larsson of Ark, Gunnar Tangring of Klarna, Yoav Shmaria of Similarweb and Xiaoxu Gao of Adyen.<br><br>Check out our three most downloaded episodes:</div><ul><li><a href="https://www.dataengineeringshow.com/e/6nrrrlzn">Zach Wilson on What Makes a Great Data Engineer</a></li><li><a href="https://www.dataengineeringshow.com/e/v8w441mn">Joe Reis and Matt Housley on The Fundamentals of Data Engineering</a></li><li><a href="https://www.dataengineeringshow.com/e/p8lxxy18">Bill Inmon, The Godfather of Data Warehousing</a></li></ul>]]></content:encoded>
      <pubDate>Thu, 21 Oct 2021 05:46:21 +0000</pubDate>
      <author>The Firebolt Data Bros</author>
      <enclosure url="https://media.fame.so/wpy76zq8.mp3" length="33408755" type="audio/mpeg"/>
      <itunes:author>The Firebolt Data Bros</itunes:author>
      <itunes:image href="https://content.fameapp.so/uploads/86qywn5q/caa7efb0-6c25-11ef-9489-d768d5a0a06d/caa7edf0-6c25-11ef-a69f-3da5bd01e0ea.jpg"/>
      <itunes:duration>2085</itunes:duration>
      <itunes:summary>It’s the mother of all development projects. You use it daily.  And so do 65M developers around the world. This time on the Data Engineering Show – A deep dive into GitHub’s data stack. Arfon Smith KimYen (Truong) Ladia shared GitHub’s data engineering challenges and solutions and explained why every developer should know and adopt the ADR protocol.</itunes:summary>
      <itunes:subtitle>It’s the mother of all development projects. You use it daily.  And so do 65M developers around the world. This time on the Data Engineering Show – A deep dive into GitHub’s data stack. Arfon Smith KimYen (Truong) Ladia shared GitHub’s data engineering challenges and solutions and explained why every developer should know and adopt the ADR protocol.</itunes:subtitle>
      <itunes:keywords>data engineering, analytics, data</itunes:keywords>
      <itunes:explicit>No</itunes:explicit>
      <googleplay:explicit>No</googleplay:explicit>
    </item>
    <item>
      <title>Building Data Products For Data Engineers</title>
      <link>https://podcasts.fame.so/e/pnlxxr1n</link>
      <itunes:title>Building Data Products For Data Engineers</itunes:title>
      <itunes:episode>6</itunes:episode>
      <itunes:block>No</itunes:block>
      <googleplay:block>No</googleplay:block>
      <guid isPermaLink="false">x0lnnwm0</guid>
      <description>How does a tech stack that always needs to be at the forefront of technology look like?

Roy Miara from Explorium talks about building data products for the audience that can’t be fooled – Data Engineers.</description>
      <content:encoded><![CDATA[How does a tech stack that always needs to be at the forefront of technology look like?

Roy Miara from Explorium talks about building data products for the audience that can’t be fooled – Data Engineers.<div>The Data Engineering Show is brought to you by <a href="https://www.firebolt.io/">firebolt.io</a> and handcrafted by our friends over at: <a href="https://www.fame.so/?utm_medium=podcast&amp;utm_source=bcast&amp;utm_campaign=masters-of-community-with-david-spinks?utm_medium=podcast&amp;utm_source=bcast&amp;utm_campaign=confessions-of-a-b2b-marketer">fame.so</a><br><br>Previous guests include: Joseph Machado of Linkedin, Metthew Weingarten of Disney, Joe Reis and Matt Housely, authors of <em>The Fundamentals of Data Engineering, </em>Zach Wilson of Eczachly Inc, Megan Lieu of Deepnote, Erik Heintare of Bolt, Lior Solomon of Vimeo, Krishna Naidu of Canva, Mike Cohen of Substack, Jens Larsson of Ark, Gunnar Tangring of Klarna, Yoav Shmaria of Similarweb and Xiaoxu Gao of Adyen.<br><br>Check out our three most downloaded episodes:</div><ul><li><a href="https://www.dataengineeringshow.com/e/6nrrrlzn">Zach Wilson on What Makes a Great Data Engineer</a></li><li><a href="https://www.dataengineeringshow.com/e/v8w441mn">Joe Reis and Matt Housley on The Fundamentals of Data Engineering</a></li><li><a href="https://www.dataengineeringshow.com/e/p8lxxy18">Bill Inmon, The Godfather of Data Warehousing</a></li></ul>]]></content:encoded>
      <pubDate>Thu, 09 Sep 2021 04:55:32 +0000</pubDate>
      <author>The Firebolt Data Bros</author>
      <enclosure url="https://media.fame.so/8j07x3m8.mp3" length="38305505" type="audio/mpeg"/>
      <itunes:author>The Firebolt Data Bros</itunes:author>
      <itunes:image href="https://content.fameapp.so/uploads/86qywn5q/caa7efb0-6c25-11ef-9489-d768d5a0a06d/caa7edf0-6c25-11ef-a69f-3da5bd01e0ea.jpg"/>
      <itunes:duration>2391</itunes:duration>
      <itunes:summary>How does a tech stack that always needs to be at the forefront of technology look like?

Roy Miara from Explorium talks about building data products for the audience that can’t be fooled – Data Engineers.</itunes:summary>
      <itunes:subtitle>How does a tech stack that always needs to be at the forefront of technology look like?

Roy Miara from Explorium talks about building data products for the audience that can’t be fooled – Data Engineers.</itunes:subtitle>
      <itunes:keywords>data engineering, analytics, data</itunes:keywords>
      <itunes:explicit>No</itunes:explicit>
      <googleplay:explicit>No</googleplay:explicit>
    </item>
    <item>
      <title>How Vimeo Keeps Data Intact with 85B Events Per Month</title>
      <link>https://podcasts.fame.so/e/x8vqqwvn</link>
      <itunes:title>How Vimeo Keeps Data Intact with 85B Events Per Month</itunes:title>
      <itunes:episode>5</itunes:episode>
      <itunes:block>No</itunes:block>
      <googleplay:block>No</googleplay:block>
      <guid isPermaLink="false">70y77vr0</guid>
      <description>How does the Viemo data team deal with 2 PBs of data and 85B events per month? What made them recently build a data ops team? What data tool does the team love? And why (the hell) did they call their legacy platform Fatal Attraction? Guest: Lior Solomon, VP Data Engineering at Vimeo.</description>
      <content:encoded><![CDATA[<p>How does the Viemo data team deal with 2 PBs of data and 85B events per month? What made them recently build a data ops team? What data tool does the team love? And why (the hell) did they call their legacy platform Fatal Attraction?<br>Guest: Lior Solomon, VP Data Engineering at Vimeo.</p><div>The Data Engineering Show is brought to you by <a href="https://www.firebolt.io/">firebolt.io</a> and handcrafted by our friends over at: <a href="https://www.fame.so/?utm_medium=podcast&amp;utm_source=bcast&amp;utm_campaign=masters-of-community-with-david-spinks?utm_medium=podcast&amp;utm_source=bcast&amp;utm_campaign=confessions-of-a-b2b-marketer">fame.so</a><br><br>Previous guests include: Joseph Machado of Linkedin, Metthew Weingarten of Disney, Joe Reis and Matt Housely, authors of <em>The Fundamentals of Data Engineering, </em>Zach Wilson of Eczachly Inc, Megan Lieu of Deepnote, Erik Heintare of Bolt, Lior Solomon of Vimeo, Krishna Naidu of Canva, Mike Cohen of Substack, Jens Larsson of Ark, Gunnar Tangring of Klarna, Yoav Shmaria of Similarweb and Xiaoxu Gao of Adyen.<br><br>Check out our three most downloaded episodes:</div><ul><li><a href="https://www.dataengineeringshow.com/e/6nrrrlzn">Zach Wilson on What Makes a Great Data Engineer</a></li><li><a href="https://www.dataengineeringshow.com/e/v8w441mn">Joe Reis and Matt Housley on The Fundamentals of Data Engineering</a></li><li><a href="https://www.dataengineeringshow.com/e/p8lxxy18">Bill Inmon, The Godfather of Data Warehousing</a></li></ul>]]></content:encoded>
      <pubDate>Wed, 18 Aug 2021 07:10:08 +0000</pubDate>
      <author>The Firebolt Data Bros</author>
      <enclosure url="https://media.fame.so/8yq97098.mp3" length="38662098" type="audio/mpeg"/>
      <itunes:author>The Firebolt Data Bros</itunes:author>
      <itunes:image href="https://content.fameapp.so/uploads/86qywn5q/caa7efb0-6c25-11ef-9489-d768d5a0a06d/caa7edf0-6c25-11ef-a69f-3da5bd01e0ea.jpg"/>
      <itunes:duration>2413</itunes:duration>
      <itunes:summary>How does the Viemo data team deal with 2 PBs of data and 85B events per month? What made them recently build a data ops team? What data tool does the team love? And why (the hell) did they call their legacy platform Fatal Attraction? Guest: Lior Solomon, VP Data Engineering at Vimeo.</itunes:summary>
      <itunes:subtitle>How does the Viemo data team deal with 2 PBs of data and 85B events per month? What made them recently build a data ops team? What data tool does the team love? And why (the hell) did they call their legacy platform Fatal Attraction? Guest: Lior Solomon, VP Data Engineering at Vimeo.</itunes:subtitle>
      <itunes:keywords>data engineering, analytics, data</itunes:keywords>
      <itunes:explicit>No</itunes:explicit>
      <googleplay:explicit>No</googleplay:explicit>
    </item>
    <item>
      <title>How Substack's Data Stack Supports 500K Paying Subscribers</title>
      <link>https://podcasts.fame.so/e/1n200m7n</link>
      <itunes:title>How Substack's Data Stack Supports 500K Paying Subscribers</itunes:title>
      <itunes:episode>4</itunes:episode>
      <itunes:block>No</itunes:block>
      <googleplay:block>No</googleplay:block>
      <guid isPermaLink="false">219qqyn1</guid>
      <description>Substack is an amazing — if not the most amazing — content publishing platform out there. Essentially, it allows anyone to become a journalist or to start their own newsletters and charge subscriptions for them. So how did they build a data stack that can support all of their 500K paying subscribers?

Guest: Mike Cohen, Data Engineer at SubStack
Hosts: The Data Bros, Eldad and Boaz Farkash, CEO and CPO at Firebolt</description>
      <content:encoded><![CDATA[Substack is an amazing — if not the most amazing — content publishing platform out there. Essentially, it allows anyone to become a journalist or to start their own newsletters and charge subscriptions for them. So how did they build a data stack that can support all of their 500K paying subscribers?

Guest: Mike Cohen, Data Engineer at SubStack
Hosts: The Data Bros, Eldad and Boaz Farkash, CEO and CPO at Firebolt<div>The Data Engineering Show is brought to you by <a href="https://www.firebolt.io/">firebolt.io</a> and handcrafted by our friends over at: <a href="https://www.fame.so/?utm_medium=podcast&amp;utm_source=bcast&amp;utm_campaign=masters-of-community-with-david-spinks?utm_medium=podcast&amp;utm_source=bcast&amp;utm_campaign=confessions-of-a-b2b-marketer">fame.so</a><br><br>Previous guests include: Joseph Machado of Linkedin, Metthew Weingarten of Disney, Joe Reis and Matt Housely, authors of <em>The Fundamentals of Data Engineering, </em>Zach Wilson of Eczachly Inc, Megan Lieu of Deepnote, Erik Heintare of Bolt, Lior Solomon of Vimeo, Krishna Naidu of Canva, Mike Cohen of Substack, Jens Larsson of Ark, Gunnar Tangring of Klarna, Yoav Shmaria of Similarweb and Xiaoxu Gao of Adyen.<br><br>Check out our three most downloaded episodes:</div><ul><li><a href="https://www.dataengineeringshow.com/e/6nrrrlzn">Zach Wilson on What Makes a Great Data Engineer</a></li><li><a href="https://www.dataengineeringshow.com/e/v8w441mn">Joe Reis and Matt Housley on The Fundamentals of Data Engineering</a></li><li><a href="https://www.dataengineeringshow.com/e/p8lxxy18">Bill Inmon, The Godfather of Data Warehousing</a></li></ul>]]></content:encoded>
      <pubDate>Tue, 03 Aug 2021 07:11:54 +0000</pubDate>
      <author>The Firebolt Data Bros</author>
      <enclosure url="https://media.fame.so/87p6zvkw.mp3" length="23472357" type="audio/mpeg"/>
      <itunes:author>The Firebolt Data Bros</itunes:author>
      <itunes:image href="https://content.fameapp.so/uploads/86qywn5q/caa7efb0-6c25-11ef-9489-d768d5a0a06d/caa7edf0-6c25-11ef-a69f-3da5bd01e0ea.jpg"/>
      <itunes:duration>1464</itunes:duration>
      <itunes:summary>Substack is an amazing — if not the most amazing — content publishing platform out there. Essentially, it allows anyone to become a journalist or to start their own newsletters and charge subscriptions for them. So how did they build a data stack that can support all of their 500K paying subscribers?

Guest: Mike Cohen, Data Engineer at SubStack
Hosts: The Data Bros, Eldad and Boaz Farkash, CEO and CPO at Firebolt</itunes:summary>
      <itunes:subtitle>Substack is an amazing — if not the most amazing — content publishing platform out there. Essentially, it allows anyone to become a journalist or to start their own newsletters and charge subscriptions for them. So how did they build a data stack that can support all of their 500K paying subscribers?

Guest: Mike Cohen, Data Engineer at SubStack
Hosts: The Data Bros, Eldad and Boaz Farkash, CEO and CPO at Firebolt</itunes:subtitle>
      <itunes:keywords>data engineering, analytics, data</itunes:keywords>
      <itunes:explicit>No</itunes:explicit>
      <googleplay:explicit>No</googleplay:explicit>
    </item>
    <item>
      <title>A Technical Deep Dive to Yelp's Data Infrastructure - With Steven Moy</title>
      <link>https://podcasts.fame.so/e/68rrr4z8</link>
      <itunes:title>A Technical Deep Dive to Yelp's Data Infrastructure - With Steven Moy</itunes:title>
      <itunes:episode>3</itunes:episode>
      <itunes:block>No</itunes:block>
      <googleplay:block>No</googleplay:block>
      <guid isPermaLink="false">80566941</guid>
      <description>As an expert in query engines and performance-related challenges, Steven Moy explains how Yelp handled its huge data growth in the past ten years. 

Guest: Steven Moy, Software Engineer at Yelp
Hosts: The Data Bros, Eldad and Boaz Farkash, CEO and CPO at Firebolt</description>
      <content:encoded><![CDATA[As an expert in query engines and performance-related challenges, Steven Moy explains how Yelp handled its huge data growth in the past ten years. 

Guest: Steven Moy, Software Engineer at Yelp
Hosts: The Data Bros, Eldad and Boaz Farkash, CEO and CPO at Firebolt<div>The Data Engineering Show is brought to you by <a href="https://www.firebolt.io/">firebolt.io</a> and handcrafted by our friends over at: <a href="https://www.fame.so/?utm_medium=podcast&amp;utm_source=bcast&amp;utm_campaign=masters-of-community-with-david-spinks?utm_medium=podcast&amp;utm_source=bcast&amp;utm_campaign=confessions-of-a-b2b-marketer">fame.so</a><br><br>Previous guests include: Joseph Machado of Linkedin, Metthew Weingarten of Disney, Joe Reis and Matt Housely, authors of <em>The Fundamentals of Data Engineering, </em>Zach Wilson of Eczachly Inc, Megan Lieu of Deepnote, Erik Heintare of Bolt, Lior Solomon of Vimeo, Krishna Naidu of Canva, Mike Cohen of Substack, Jens Larsson of Ark, Gunnar Tangring of Klarna, Yoav Shmaria of Similarweb and Xiaoxu Gao of Adyen.<br><br>Check out our three most downloaded episodes:</div><ul><li><a href="https://www.dataengineeringshow.com/e/6nrrrlzn">Zach Wilson on What Makes a Great Data Engineer</a></li><li><a href="https://www.dataengineeringshow.com/e/v8w441mn">Joe Reis and Matt Housley on The Fundamentals of Data Engineering</a></li><li><a href="https://www.dataengineeringshow.com/e/p8lxxy18">Bill Inmon, The Godfather of Data Warehousing</a></li></ul>]]></content:encoded>
      <pubDate>Tue, 11 May 2021 23:50:24 +0000</pubDate>
      <author>The Firebolt Data Bros</author>
      <enclosure url="https://media.fame.so/895zy4r8.mp3" length="48189173" type="audio/mpeg"/>
      <itunes:author>The Firebolt Data Bros</itunes:author>
      <itunes:image href="https://content.fameapp.so/uploads/86qywn5q/caa7efb0-6c25-11ef-9489-d768d5a0a06d/caa7edf0-6c25-11ef-a69f-3da5bd01e0ea.jpg"/>
      <itunes:duration>3009</itunes:duration>
      <itunes:summary>As an expert in query engines and performance-related challenges, Steven Moy explains how Yelp handled its huge data growth in the past ten years. 

Guest: Steven Moy, Software Engineer at Yelp
Hosts: The Data Bros, Eldad and Boaz Farkash, CEO and CPO at Firebolt</itunes:summary>
      <itunes:subtitle>As an expert in query engines and performance-related challenges, Steven Moy explains how Yelp handled its huge data growth in the past ten years. 

Guest: Steven Moy, Software Engineer at Yelp
Hosts: The Data Bros, Eldad and Boaz Farkash, CEO and CPO at Firebolt</itunes:subtitle>
      <itunes:keywords>data engineering, analytics, data</itunes:keywords>
      <itunes:explicit>No</itunes:explicit>
      <googleplay:explicit>No</googleplay:explicit>
    </item>
    <item>
      <title>How Canva's Data Engineers and Analysts Support 55M Active Users</title>
      <link>https://podcasts.fame.so/e/183mmvq8</link>
      <itunes:title>How Canva's Data Engineers and Analysts Support 55M Active Users</itunes:title>
      <itunes:episode>2</itunes:episode>
      <itunes:block>No</itunes:block>
      <googleplay:block>No</googleplay:block>
      <guid isPermaLink="false">m0j22r40</guid>
      <description>Canva is one of the hottest, if not the hottest, graphic design platforms out there. Only a week ago it was announced that they reached a staggering 16 Billion dollar valuation, after having seen even stronger growth during the pandemic. With 55 million active users and around 500 million dollars in annual revenue, it seems that Canva is unstoppable. 

So how do Canva analysts and engineers scale their data platforms to meet the company's insane growth?

Guest: Krishna Naidu, Data Engineer at Canva
Hosts: The Data Bros, Eldad and Boaz Farkash, CEO and CPO at Firebolt</description>
      <content:encoded><![CDATA[Canva is one of the hottest, if not the hottest, graphic design platforms out there. Only a week ago it was announced that they reached a staggering 16 Billion dollar valuation, after having seen even stronger growth during the pandemic. With 55 million active users and around 500 million dollars in annual revenue, it seems that Canva is unstoppable. 

So how do Canva analysts and engineers scale their data platforms to meet the company's insane growth?

Guest: Krishna Naidu, Data Engineer at Canva
Hosts: The Data Bros, Eldad and Boaz Farkash, CEO and CPO at Firebolt<div>The Data Engineering Show is brought to you by <a href="https://www.firebolt.io/">firebolt.io</a> and handcrafted by our friends over at: <a href="https://www.fame.so/?utm_medium=podcast&amp;utm_source=bcast&amp;utm_campaign=masters-of-community-with-david-spinks?utm_medium=podcast&amp;utm_source=bcast&amp;utm_campaign=confessions-of-a-b2b-marketer">fame.so</a><br><br>Previous guests include: Joseph Machado of Linkedin, Metthew Weingarten of Disney, Joe Reis and Matt Housely, authors of <em>The Fundamentals of Data Engineering, </em>Zach Wilson of Eczachly Inc, Megan Lieu of Deepnote, Erik Heintare of Bolt, Lior Solomon of Vimeo, Krishna Naidu of Canva, Mike Cohen of Substack, Jens Larsson of Ark, Gunnar Tangring of Klarna, Yoav Shmaria of Similarweb and Xiaoxu Gao of Adyen.<br><br>Check out our three most downloaded episodes:</div><ul><li><a href="https://www.dataengineeringshow.com/e/6nrrrlzn">Zach Wilson on What Makes a Great Data Engineer</a></li><li><a href="https://www.dataengineeringshow.com/e/v8w441mn">Joe Reis and Matt Housley on The Fundamentals of Data Engineering</a></li><li><a href="https://www.dataengineeringshow.com/e/p8lxxy18">Bill Inmon, The Godfather of Data Warehousing</a></li></ul>]]></content:encoded>
      <pubDate>Tue, 11 May 2021 23:47:05 +0000</pubDate>
      <author>The Firebolt Data Bros</author>
      <enclosure url="https://media.fame.so/w6lnkx5w.mp3" length="41613749" type="audio/mpeg"/>
      <itunes:author>The Firebolt Data Bros</itunes:author>
      <itunes:image href="https://content.fameapp.so/uploads/86qywn5q/caa7efb0-6c25-11ef-9489-d768d5a0a06d/caa7edf0-6c25-11ef-a69f-3da5bd01e0ea.jpg"/>
      <itunes:duration>2598</itunes:duration>
      <itunes:summary>Canva is one of the hottest, if not the hottest, graphic design platforms out there. Only a week ago it was announced that they reached a staggering 16 Billion dollar valuation, after having seen even stronger growth during the pandemic. With 55 million active users and around 500 million dollars in annual revenue, it seems that Canva is unstoppable. 

So how do Canva analysts and engineers scale their data platforms to meet the company's insane growth?

Guest: Krishna Naidu, Data Engineer at Canva
Hosts: The Data Bros, Eldad and Boaz Farkash, CEO and CPO at Firebolt</itunes:summary>
      <itunes:subtitle>Canva is one of the hottest, if not the hottest, graphic design platforms out there. Only a week ago it was announced that they reached a staggering 16 Billion dollar valuation, after having seen even stronger growth during the pandemic. With 55 million active users and around 500 million dollars in annual revenue, it seems that Canva is unstoppable. 

So how do Canva analysts and engineers scale their data platforms to meet the company's insane growth?

Guest: Krishna Naidu, Data Engineer at Canva
Hosts: The Data Bros, Eldad and Boaz Farkash, CEO and CPO at Firebolt</itunes:subtitle>
      <itunes:keywords>data engineering, analytics, big data, canva</itunes:keywords>
      <itunes:explicit>No</itunes:explicit>
      <googleplay:explicit>No</googleplay:explicit>
    </item>
    <item>
      <title>How AppsFlyer Delivers Sub-Second BI to 1000 Looker Users - With Alexandra Sudilovsky</title>
      <link>https://podcasts.fame.so/e/2861134n</link>
      <itunes:title>How AppsFlyer Delivers Sub-Second BI to 1000 Looker Users - With Alexandra Sudilovsky</itunes:title>
      <itunes:episode>1</itunes:episode>
      <itunes:block>No</itunes:block>
      <googleplay:block>No</googleplay:block>
      <guid isPermaLink="false">j1222xz1</guid>
      <description>AppsFlyer has exploded in size, growing from a small company of 200 people to 1000 people in just three years. Dealing not only with a huge amount of data on a daily basis but doing so while growing quickly as a company can come with many challenges. 

Guest: Alexandra Sudilovsky, Senior BI Expert at AppsFlyer
Hosts: The Data Bros, Eldad and Boaz Farkash, CEO and CPO at Firebolt</description>
      <content:encoded><![CDATA[AppsFlyer has exploded in size, growing from a small company of 200 people to 1000 people in just three years. Dealing not only with a huge amount of data on a daily basis but doing so while growing quickly as a company can come with many challenges. 

Guest: Alexandra Sudilovsky, Senior BI Expert at AppsFlyer
Hosts: The Data Bros, Eldad and Boaz Farkash, CEO and CPO at Firebolt<div>The Data Engineering Show is brought to you by <a href="https://www.firebolt.io/">firebolt.io</a> and handcrafted by our friends over at: <a href="https://www.fame.so/?utm_medium=podcast&amp;utm_source=bcast&amp;utm_campaign=masters-of-community-with-david-spinks?utm_medium=podcast&amp;utm_source=bcast&amp;utm_campaign=confessions-of-a-b2b-marketer">fame.so</a><br><br>Previous guests include: Joseph Machado of Linkedin, Metthew Weingarten of Disney, Joe Reis and Matt Housely, authors of <em>The Fundamentals of Data Engineering, </em>Zach Wilson of Eczachly Inc, Megan Lieu of Deepnote, Erik Heintare of Bolt, Lior Solomon of Vimeo, Krishna Naidu of Canva, Mike Cohen of Substack, Jens Larsson of Ark, Gunnar Tangring of Klarna, Yoav Shmaria of Similarweb and Xiaoxu Gao of Adyen.<br><br>Check out our three most downloaded episodes:</div><ul><li><a href="https://www.dataengineeringshow.com/e/6nrrrlzn">Zach Wilson on What Makes a Great Data Engineer</a></li><li><a href="https://www.dataengineeringshow.com/e/v8w441mn">Joe Reis and Matt Housley on The Fundamentals of Data Engineering</a></li><li><a href="https://www.dataengineeringshow.com/e/p8lxxy18">Bill Inmon, The Godfather of Data Warehousing</a></li></ul>]]></content:encoded>
      <pubDate>Tue, 11 May 2021 23:45:33 +0000</pubDate>
      <author>The Firebolt Data Bros</author>
      <enclosure url="https://media.fame.so/8rj7x2k8.mp3" length="30542835" type="audio/mpeg"/>
      <itunes:author>The Firebolt Data Bros</itunes:author>
      <itunes:image href="https://content.fameapp.so/uploads/86qywn5q/caa7efb0-6c25-11ef-9489-d768d5a0a06d/caa7edf0-6c25-11ef-a69f-3da5bd01e0ea.jpg"/>
      <itunes:duration>1906</itunes:duration>
      <itunes:summary>AppsFlyer has exploded in size, growing from a small company of 200 people to 1000 people in just three years. Dealing not only with a huge amount of data on a daily basis but doing so while growing quickly as a company can come with many challenges. 

Guest: Alexandra Sudilovsky, Senior BI Expert at AppsFlyer
Hosts: The Data Bros, Eldad and Boaz Farkash, CEO and CPO at Firebolt</itunes:summary>
      <itunes:subtitle>AppsFlyer has exploded in size, growing from a small company of 200 people to 1000 people in just three years. Dealing not only with a huge amount of data on a daily basis but doing so while growing quickly as a company can come with many challenges. 

Guest: Alexandra Sudilovsky, Senior BI Expert at AppsFlyer
Hosts: The Data Bros, Eldad and Boaz Farkash, CEO and CPO at Firebolt</itunes:subtitle>
      <itunes:keywords>data engineering, analytics, data, appsflyer, looker</itunes:keywords>
      <itunes:explicit>No</itunes:explicit>
      <googleplay:explicit>No</googleplay:explicit>
    </item>
    <item>
      <title>The Data Engineering Show - Coming Soon...</title>
      <link>https://podcasts.fame.so/e/0nj00v0n</link>
      <itunes:title>The Data Engineering Show - Coming Soon...</itunes:title>
      <itunes:episode>0</itunes:episode>
      <itunes:block>No</itunes:block>
      <googleplay:block>No</googleplay:block>
      <guid isPermaLink="false">40pmmym1</guid>
      <description>The Data Engineering Show is a podcast for data engineering and BI practitioners to go beyond theory, and learn from the biggest influencers in tech about their practical day to day data challenges and solutions in a casual and fun setting.</description>
      <content:encoded><![CDATA[The Data Engineering Show is a podcast for data engineering and BI practitioners to go beyond theory, and learn from the biggest influencers in tech about their practical day to day data challenges and solutions in a casual and fun setting.<div>The Data Engineering Show is brought to you by <a href="https://www.firebolt.io/">firebolt.io</a> and handcrafted by our friends over at: <a href="https://www.fame.so/?utm_medium=podcast&amp;utm_source=bcast&amp;utm_campaign=masters-of-community-with-david-spinks?utm_medium=podcast&amp;utm_source=bcast&amp;utm_campaign=confessions-of-a-b2b-marketer">fame.so</a><br><br>Previous guests include: Joseph Machado of Linkedin, Metthew Weingarten of Disney, Joe Reis and Matt Housely, authors of <em>The Fundamentals of Data Engineering, </em>Zach Wilson of Eczachly Inc, Megan Lieu of Deepnote, Erik Heintare of Bolt, Lior Solomon of Vimeo, Krishna Naidu of Canva, Mike Cohen of Substack, Jens Larsson of Ark, Gunnar Tangring of Klarna, Yoav Shmaria of Similarweb and Xiaoxu Gao of Adyen.<br><br>Check out our three most downloaded episodes:</div><ul><li><a href="https://www.dataengineeringshow.com/e/6nrrrlzn">Zach Wilson on What Makes a Great Data Engineer</a></li><li><a href="https://www.dataengineeringshow.com/e/v8w441mn">Joe Reis and Matt Housley on The Fundamentals of Data Engineering</a></li><li><a href="https://www.dataengineeringshow.com/e/p8lxxy18">Bill Inmon, The Godfather of Data Warehousing</a></li></ul>]]></content:encoded>
      <pubDate>Mon, 05 Apr 2021 06:57:51 +0000</pubDate>
      <author>The Firebolt Data Bros</author>
      <enclosure url="https://media.fame.so/84v47178.mp3" length="4673424" type="audio/mpeg"/>
      <itunes:author>The Firebolt Data Bros</itunes:author>
      <itunes:image href="https://content.fameapp.so/uploads/86qywn5q/caa7efb0-6c25-11ef-9489-d768d5a0a06d/caa7edf0-6c25-11ef-a69f-3da5bd01e0ea.jpg"/>
      <itunes:duration>115</itunes:duration>
      <itunes:summary>The Data Engineering Show is a podcast for data engineering and BI practitioners to go beyond theory, and learn from the biggest influencers in tech about their practical day to day data challenges and solutions in a casual and fun setting.</itunes:summary>
      <itunes:subtitle>The Data Engineering Show is a podcast for data engineering and BI practitioners to go beyond theory, and learn from the biggest influencers in tech about their practical day to day data challenges and solutions in a casual and fun setting.</itunes:subtitle>
      <itunes:keywords>data engineering, analytics, data</itunes:keywords>
      <itunes:explicit>No</itunes:explicit>
      <googleplay:explicit>No</googleplay:explicit>
    </item>
  </channel>
</rss>
