Berlin 2024
Recordings

Opening Session

Introduction

Watch the introduction from the mainstage at Flink Forward Berlin, with emcee Karin Landers, and co-hosts Alex Walden and Dawn Leamon as they dive into the Past, Present, and Future of Apache Flink!

The Past - History of Apache Flink

Take a journey through the Past, Present, and Future of Apache Flink in the not-to-be-missed Opening Session at Flink Forward Berlin 2024! Hear from those who have been with the Flink project since its inception, including pioneers Stephan Ewen, the co-creator of Flink and Apache Flink PMC, and Feng Wang, Apache Flink Committer. Speakers: Stephan Ewen, Founder, Restate, Feng Wang, Senior Director, Alibaba, Karin Landers, Emcee, Ververica.

The Present - Current Ververica Customer Panel

Take a journey through the Past, Present, and Future of Apache Flink in the not-to-be-missed Opening Session at Flink Forward Berlin 2024! Step into the present day to learn how industry leaders from energy giant Uniper and travel experts at Booking.com are tackling complex real-time streaming data use cases with Ververica, as they share real world success stories. Distinguished Guests: Sudha Ramiaj, Senior Technical Consultant, Uniper & Siddhartha Choudhury, Senior Product Manager, Booking.com. Moderators: Ben Gamble, Field CTO, Ververica & Chris Horsnell, Head of Strategic Partner Alliance, Ververica.

The Present - Partner Panel

Take a journey through the Past, Present, and Future of Apache Flink in the not-to-be-missed Opening Session at Flink Forward Berlin 2024! In this session, meet a few Flink Partners as they share how they are helping to make real-time data projects accessible for all, and introduce the "Powered by" Partner Program. Distinguished Guests: Mark Pybus, Chief Data Officer, Evoura - Gold Sponsor, Erik Schmiegelow, CEO, Hivemind Technologies, Sijie Guo, Founder & CEO, StreamNative - Silver Sponsor, & Filip Yonov, Head of Streaming Data, Aiven. Moderators: Ben Gamble, Field CTO, Ververica & Chris Horsnell, Head of Strategic Partner Alliance, Ververica

The Future - Announcing Apache Flink 2.0

Take a journey through the Past, Present, and Future of Apache Flink in the not-to-be-missed Opening Session at Flink Forward Berlin 2024! In this surprise announcement, Feng Wong and Alex Walden took the stage to introduce the release of Apache Flink 2.0.

The Future - Introducing Ververica's Bring Your Own Cloud Deployment

Take a journey through the Past, Present, and Future of Apache Flink in the not-to-be-missed Opening Session at Flink Forward Berlin 2024! Ben Gamble, Field CTO, Ververica and Igor Kersic, Head of Product, Ververica take the stage to introduce the Bring Your Own Cloud Deployment option for Ververica's Streaming Data Platform. This deployment provides a managed experience that uses your existing cloud resources to leverage the flexibility and scalability of Ververica’s Streaming Data Platform while maintaining full control over your cloud infrastructure, ensuring you can meet zero-trust strategies by ensuring full oversight and security of your data.

The Future - Introducing Fluss

Take a journey through the Past, Present, and Future of Apache Flink in the not-to-be-missed Opening Session at Flink Forward Berlin 2024! In this session, Jark Wu, Head of Flink SQL at Alibaba Cloud (Ververica) and PMC member and Committer of Apache Flink introduced Fluss. Fluss is engineered to offer a high-performance, scalable, and fully integrated solution for real-time data processing with Apache Flink, driving our vision towards a complete unified batch and streaming data platform.

Wrap Up

Our journey through the Past, Present, and Future of Apache Flink ends with a wrap-up from the Mainstage, with emcee Karin Landers.

Expert Sessions

The Zeitgeist of AI Requires Streaming Data Platforms

Presented By: Mike Gualtieri, VP, Principal Analyst, Forrester. In this expert session, AI and streaming data expert Mike Gualtieri (Forrester) shares his research on enterprise AI and the real-time intelligence network needed to power it.

AI and Apache Flink: Expert Panel

Presented By: Mike Gualtieri, VP, Principal Analyst, Forrester, David Anderson, Apache Flink Committer & Software Practice Lead, Confluent, Ben Gamble, Field CTO, Ververica | Original creators of Apache Flink® - Flink Forward Organizer, Gunnar Morling. Software Engineer, Decodable - Gold Sponsor. Moderator: Karin Landers, Senior Product Marketing, Ververica | Original creators of Apache Flink® - Flink Forward Organizer. This informal, moderated panel session will explore how AI is shaping data streaming projects now and into the future. What innovations will redefine how we access and process information? How will AI transform the way we use data in real-time? Join us as our panel dives into these questions and more, offering insights that will guide tomorrow’s technological landscape. Be part of the conversation in this interactive session, where the future of AI meets the evolving world of data streaming.

Apache Flink PMC's Panel

Watch the informal, moderated panel session that features Apache Flink PMC experts as they share their experiences and discuss the past, present, and future of Apache Flink. Ververica. Apache Flink PMC Members and Committers: Dr. Yuan Mei - Director of Engineering, Alibaba, Jark Wu - Head of Flink SQL, Alibaba Cloud, Xintong Song - Flink 2.0 Starter and Promoter, Alibaba Cloud, Leonard Xu - Flink CDC Lead, Alibaba, Jingsong Li - Streaming Computing R&D, Alibaba, Gyula Fora - Software Engineer, Apple, Maximilian Michels - Software Engineer, Apple, Jing Ge - Head of Engineering, Ververica. Moderator: Dawn Leamon, Head of TED, Ververica

Technology Deep Dive

Spoiler Alert: Revealing the Secrets of Apache Flink 2.0

Xintong Song, Staff Software Engineer, Alibaba Cloud, This session offers an exclusive preview of Apache Flink 2.0, the most significant update since Flink 1.0, highlighting its new positioning, architectural advancements, and innovative features. Key topics will include disaggregated state management, unified stream-batch SQL, mixed execution, batch execution improvements, API evolutions, configuration changes, and streaming lakehouse capabilities.

Automate Apache Flink Tuning for Highly Elastic Scaling

Giannis Polyzos. Staff Streaming Product Architect & Ioannis Stavrakantonakis. Staff Software Engineer, Ververica | Original creators of Apache Flink® - Flink Forward Organizer, This presentation explores how different autoscaling mechanisms impact cost optimization and uptime in managing Apache Flink workloads. It focuses on strategies for automatic scaling, scheduled reactions to known patterns, and monitoring workload anomalies to achieve optimal elasticity in distributed systems.

State’ the obvious: Using Apache Flink’s State Processor API to deal with nutty issues

JinYun Soo, Software Engineer, Bloomberg, LP, This talk will explore Apache Flink's State Processor API, showcasing how it can be used to read, modify, and write states, enabling the handling of stateful streaming job changes and debugging. Through demos and code examples, practical challenges and solutions, (including code stubs for ReaderFunctions and BootstrapFunctions,) will be discussed to manage operator states effectively.

Visually Diagnosing Operator State Problems

Alex Balfur, VP Product & Colten Pilgreen, Advisor, Datorios - Platinum Sponsor, In this session, Alex Balfur and Colten Pilgreen introduce Datorios' new observability tool for Apache Flink, designed to simplify troubleshooting and enhance pipeline management. Dubbed "The X-Ray for Flink," the tool offers visual diagnosis, lineage tracing, and state evolution visualization, making it easier to analyze and troubleshoot issues with operator state, ensuring data accuracy and minimizing operational disruptions.

Zero Interference and Resource Congestion in Flink Clusters with Kafka Data Sources

Aratz Manterola Lasa. Software Engineer, Warpstream Labs - Gold Sponsor, This session explores resource congestion challenges in Apache Flink and Kafka, highlighting how Kafka congestion can impact Flink workloads. It introduces Warpstream as an alternative, as a Kafka replacement that separates computing and storage, enabling dedicated broker groups for Flink clusters to eliminate interference, reduce latency, and simplify deployment with a demo of setting it up for Flink.

From Apache Flink to Restate - Event processing for analytics and Transactions

Stephan Ewen, Founder & Co-creator of Apache Flink, Restate, Restate is an event-driven system inspired by Apache Flink, designed for transactional workflows, and serves as a counterpart to Flink for event-driven applications. This talk explores the differences between durable execution and stream processing, their use cases, and the architectural distinctions between the systems supporting them.

VERA: The Engine Revolutionizing Apache Flink

Ben Gamble, Field CTO , "Technology Sommelier, AI Whisperer", Karin Landers, Product Marketing Manager, Ververica | Original creators of Apache Flink® - Flink Forward Organizer, This talk introduces VERA, a cloud-native engine designed to modernize Apache Flink by addressing evolving workloads, deployments, and storage models. It will explore how VERA enhances stream processing, tackles Flink's limitations, and supports seamless integration of data ingestion, real-time stream processing, and lakehouse architecture, while also improving fault tolerance and handling large stateful applications.

Enabling Flink's Cloud-Native Future: Introducing ForSt DB in Flink 2.0

Yuan Mei, Director of Engineering, Flink PMC, Tech Leader of Flink State, Alibaba, Watch a comprehensive overview of disaggregated state storage proposed for Apache Flink 2.0, empowering Flink to continue as a leading stateful stream processing engine in the cloud-native landscape.

Testing Flink SQL Scripts: Simplifying Development for Non-Developers

Robin Fehr, Software Architect & Co-Founder & Flaviu Cicio, Architect & Flink Specialist, Acosom, Developing Flink SQL scripts can be challenging, especially for non-developers like data scientists. Simplifying the development process is essential to ensure correctness and reliability, with testing playing a crucial role. During this session, you will be introduced to Flink SQL Test Runner as a solution to streamline this process.

Building Copilots with Flink SQL, LLMs and vector databases

Steffen Hoellinger, CEO, Airy, Generative AI apps such as Copilots can serve as a vital link between foundational models and enterprise data, enhancing developer and employee productivity. Watch a step-by-step example of how to build and deploy a Copilot in TypeScript/JavaScript with open-source tools, integrated with Flink for stream processing, and using the latest OpenAI/Mistral models for model inference and vector stores. Review how to select and integrate the best foundation model for the relevant use case, to optimize for cost, performance and latency at inference.

Materialized Table - Making Your Data Pipeline Easier

Lincoln Lee, Staff Engineer, Alibaba Cloud, In this talk, learn about potential user issues in the data warehouse and how the new Materialized Table is designed to solve these problems, including how you can quickly build a data pipeline with Apache Paimon.

Data Lineage for Apache Flink with OpenLineage

Zhenqiu Huang, Software Engineer, Uber, & Pawel Leszczynski, Astronomer, Contractor, GetInData, Data lineage is crucial for auditing, data governance, regulatory compliance, troubleshooting and data discovery purposes. In this talk, we present an overview of Apache Flink’s built-in lineage graph and listener mechanism and its current features, highlighting the recent improvements added in FLIP-314. Then, we detail how Flink lineage is integrated with OpenLineage for lineage data management and visualization in modern data stacks.

Technology Deep Dive, Flink CDC

Transforming Your APIs Into Business Gold – Architecting a Real-Time API Usage Analytics Platform

Dunith Dhanushka, Senior Developer Advocate, Redpanda, This talk explores the architecture of a real-time API usage analytics system built with Redpanda, Apache Flink, and Apache Pinot. The system ingests high-volume API data via Redpanda, processes it through Flink for streaming ETL, and serves the analytics at scale with Apache Pinot, enabling businesses to make agile, data-driven decisions and improve user experiences.

Flink Autoscaling: A Year in Review - Performance, Challenges, and Innovations

Maximilian Michels, Software Engineer, Apple, This talk will evaluate the progress and challenges of Flink Autoscaling (FLIP-271), an open-source solution first introduced at Flink Forward Seattle 2023, which has helped users significantly reduce resource usage and operational burdens. It will cover improvements made over the past year, including enhancements to the algorithm’s robustness and metrics, and provide an in-depth look at its inner workings and unique integrations with Flink.

Is Kafka the Best Storage for Streaming Analytics?

Jark Wu, Head of Flink SQL, Apache Flink PMC member and Committer, Alibaba Cloud. This session explores the challenges faced when using Kafka and Flink SQL together for streaming analytics, highlighting Kafka’s inherent limitations. The talk introduces a next-generation streaming storage solution designed for streaming analytics, detailing its architecture, integration with Flink SQL and Lakehouses, and performance improvements from end-to-end optimization across computing, storage, and data lakes, with a glimpse into future developments.

Real-Time, Real Simple: Getting Started with Flink CDC

Dominik Žnidaršič, Developer Paurus, This talk shares how our team used Apache Flink and Change Data Capture (CDC) on PostgreSQL to deliver real-time event updates to dashboards, meeting business leaders' needs for customized data. You'll learn about the challenges we faced, the insights we gained, and how quickly we implemented the solution for seamless, real-time reporting.

Exploring Scenarios of Flink CDC in Streaming Data Integration

Leonard Xu, Apache Flink PMC Member & Committer, Flink CDC Lead, Alibaba, This session explores the challenges of building real-time data synchronization pipelines and how Flink CDC, an end-to-end streaming ETL tool, addresses these issues. It covers key design aspects of Flink CDC, including schema evolution, full database synchronization, dynamic table addition, automatic merging of sharded tables, and column projection and filtering, demonstrating how it improves the efficiency of real-time business decision-making.

Apache Paimon + Flink: Build Streaming Pipeline on Lakehouse

Jingsong Li, Staff Engineer, PMC Chair of Apache Paimon, PMC Member of Apache Flink, Alibaba, This presentation shares how the Apache Paimon Data Lake format enables building a real-time lakehouse architecture with Flink and Spark for both streaming and batch operations. It highlights methods for real-time data ingestion, building streaming pipelines to enhance data timeliness, and migrating from formats like Hive, Hudi, and Iceberg, while maintaining cost-effective storage.

Cost-Effective Streaming Pipelines with Apache Paimon

Antón Rodríguez, Principal Software Engineer & Ramesh Motaparthy, Principal Software Engineer, New Relic, This talk shares New Relic's journey in building a reliable, low-latency, and cost-effective streaming platform using Apache Flink, Paimon, Kafka, and S3. Presenters Anton and Ramesh discuss their insights on deploying Flink on Kubernetes, job autoscaling, instance optimization, monitoring, and strategies for efficient checkpointing to ensure high availability and fault tolerance.

Low latency Change Data Capture (CDC) to your data lake, using Apache Flink and Apache Paimon

Ali Alemi, Streaming Specialist Solutions Architect & Subham Rakshit, Streaming Solution Architect, AWS, This session discusses building a Change Data Capture (CDC) pipeline using Apache Paimon and Apache Flink to address challenges like data compaction, schema evolution, and low-latency syncs in CDC data processing. Learn about partial-update merge engines, changelog tracking, and a comparison of Apache Paimon with Hudi and Iceberg, including guidance on when to choose each solution.

Unleashing Potential: Promoted.ai's Complete Adoption of Apache Flink

Xingcan Cui, Software Engineer, Promoted.ai, In this presentation, Promoted.ai will share how it uses Apache Flink for real-time feature computation through streaming jobs and streamlines offline data preparation with Flink batch jobs and Apache Paimon. The session will also cover challenges faced and lessons learned from fully integrating Flink into their data processing strategy.

Intesa Sanpaolo’s Journey to Cloud – is Batch processing still relevant?

Fulvio Pascotto, Data Architect Lead & Raffaele Saggino, Senior IT Infrastructure Architect, Intesa Sanpaolo, This session covers Intesa Sanpaolo's implementation of a new cloud-based Apache Flink solution as part of their migration to off-premises systems, showcased through the release of "IsyBank." Dive into the design choices made during the transition, the approach to workload migration between old and new infrastructures, and the evolving perspective on batch processing and Flink's role within it.

Flink at Intesa Sanpaolo: Streamlining Customer Data Integration with Change Data Capture

Rozarta Caka, Software Engineer, Intesa Sanpaolo & Andrea Fonti, Data Architect, Software Engineer, Agile Lab, This presentation from Intesa Sanpaolo introduces their custom Apache Flink-based framework aimed at standardizing flow development and enhancing developer productivity. It features a use case on customer data replication using Change Data Capture from mainframe to Oracle, showcasing Flink's role in modernizing their data architecture and the results of performance testing that demonstrate its efficiency in large-scale data processing.

Comparing Apache Flink and Spark for Modern Stream Data Processing

Eric Xiao, Senior Software Engineer - Data Platform & Sharon Xie, Founding Engineer and Product Lead, Decodable - Gold Sponsor, Explore Decodable's evaluation of Apache Flink and Spark Structured Streaming, detailing why Flink was chosen for their real-time data processing needs. Gain insights into the design philosophies, stateful streaming capabilities, and production readiness of both systems, including Flink's unique features like native support for tools such as Debezium and recent advancements like the Kubernetes operator and auto-scaler.

Leveraging ML Forecast Models for Predictive Resource Provisioning in Flink Applications

Duy Canh Tran, Senior Data Engineer, Ng Shi Kai, Lead Software Engineer, Nhat Nguyen, & Senior Software Engineer, Grab, This talk explores how Grab’s Stream Processing Platform team addressed the challenges of resource provisioning for Apache Flink applications by developing a predictive provisioning approach powered by Machine Learning. The presentation highlights the limitations of static and reactive provisioning, showcases how their solution automates resource management for large-scale Flink applications, and shares the benefits and challenges faced, ultimately improving efficiency for over 70% of their users and allowing teams to focus on application development.

Use Cases

Customize Flink for Pinterest User Cases at Scale

Tucker Harvey, Software Engineer I, Pinterest, At Pinterest, the streaming data processing team built Flink as a service to support mission-critical use cases that impact revenue and user engagement. This talk highlights the customizations made to enhance reliability, development speed, and cost-efficiency.

Building a Streaming-First Platform: Tools and Lessons from Our Flink Migration Journey

Mohsin Niazi, Software Engineer & Robin Stephenson, Software Engineer, Marshall Wace , Watch Marshall Wace as they share their journey of transitioning from SQL Server to a streaming-first approach with hundreds of Flink pipelines in production. Learn how they accelerate data access with fast Kafka topic queries, improved sink compaction, and automated FlinkSQL table generation, along with tools for ensuring correctness like differential fuzzing and validation pipelines. Learn about the platform developed to support this migration and the valuable lessons learned throughout the process.

Scaling Flink in the Real World: Insights from Running Flink for five years at Stripe

Ben Augarten, Software Engineer & Pratyush Sharma, Software Engineer, Stripe, In this session, explore the challenges of running Apache Flink in production at Stripe, where it's become the backbone for hundreds of applications performing tasks like CDC, data ingestion, ML feature computation, and real-time ledgering. Dive into common Flink incidents such as application bugs, deployment failures, and Kafka issues, and hear how we’ve developed systematic resilience measures to minimize their impact. Learn about the strategies we've implemented to handle unexpected data outages, state changes, and other operational challenges, and gain insights into building more resilient Flink applications.

Streamlining Real-Time Data Processing at Uber with Protobuf Integration

Yang Yang, Senior Staff Engineer & Sai Sharath Dandi, Senior Software Engineer, Uber, At Uber, Apache Flink powers real-time analytics and decision-making, processing over 4 trillion messages daily across 2000+ Flink SQL jobs. Traditionally, Flink used Avro-formatted Kafka topics, while online applications used Protobuf, leading to inefficiencies and extra engineering effort for data conversion. In this talk, we’ll share how we elevated Protobuf to a first-class citizen in our managed Flink platform to optimize performance and cost efficiency.

Flink & Metered Billing - a reprocessing story

Pedro Mázala, Senior Staff Engineer, Evoura - Gold Sponsor, Learn how Gorgias used Apache Flink for accurate metered billing, covering key topics like session aggregation, idempotency, and data accuracy. Get insights on reprocessing 500M+ events and overcoming architectural and operational challenges to build reliable real-time systems.

Flink Is All You Need (for a Digital Bank)

Maciej Bryński, Data Architect, Cledar, This presentation showcases how to build a comprehensive streaming architecture for a Digital Bank using Apache Flink exclusively. It highlights the use of Flink CDC for real-time data synchronization, Flink SQL for complex financial data analysis, and demonstrates the design and implementation of a scalable, real-time banking data platform.

Driving the Future: Leveraging Apache Pulsar and Flink for Real-Time Vehicle Profile Management

Christos Anagnostakis, Enterprise Architect, Toyota Motor - Europe & Sijie Guo, Co-founder & CEO, StreamNative - Silver Sponsor, This talk explores the integration of Apache Pulsar and Apache Flink for real-time vehicle data processing in the automotive industry. Attendees will learn how this architecture enhances vehicle analytics, performance monitoring, and predictive maintenance, using a real-world fleet management use case to showcase its impact on vehicle health, route optimization, and driver safety.

Real-Time Incident Monitoring for Smart Access Devices: An IoT Use Case

Alireza Moosavi, Data Engineer, Alper Basaran, Data Lead, & Medi van Broekhoven, Sr. Product Owner, Salto Systems, In this session, Salto showcases how Apache Flink helped solve a challenging real-time incident monitoring use case for anomalies, processing data from 300K+ IoT devices. Key topics include Salto's IoT ecosystem, Flink’s role in the solution architecture, cloud deployment with Kubernetes, observability, and common patterns like handling out-of-order events and data enrichment with CDC connectors.

From the Pitch to the Pub - Lessons Learned from Distributing Streaming Data in the Sports Industry

Samuel von Baussnern, Consultant, Lead Developer & Tech Lead, D-ONE, In this session, we will present the tech stack and architecture that enabled D-ONE to distribute terabytes of tracking and statistical data from a sports event to TV stations and fans. We'll share insights on what worked, what didn’t, and the changes we're making to prepare for the future, including how we monitored feed quality and communicated real-time, both internally and externally, to ensure smooth operations.

Streaming Up the River: Leveraging Flink to Prevent Ship Groundings in Rivers and Ports

Roee Hasson, Engineering Lead, DockTech, In this presentation, Roee showcases how DockTech uses Apache Flink to enhance maritime safety by preventing groundings and improving navigation around ports and rivers. Learn how real-time data from multiple vessels is collected, cleaned, and processed to alert captains and fleet operators about potential risks, helping avoid costly accidents and ensuring safer operations in shallow waters.

Flink for Energy and Commodities trading business

Oleksii Pominovskyi, Information Technology Project Manager & Sudha Ramraj, Senior Technical Consultant, Uniper, Uniper is an international energy company with operations across over 40 countries. During this presentation, Uniper's Sales and Trading IT team discuss their implementation of Apache Flink, highlighting the technology's importance in safeguarding daily operations, sharing practical use cases, challenges faced, and future expectations for its role in the energy sector.

Implementing Event Tracing in Flink Applications

Ejas Khan, Senior Technical Lead, Mercedes-Benz, This talk focuses on integrating OpenTracing/OpenTelemetry and Datadog with Apache Flink to enable detailed event tracing for real-time data applications. Learn how to instrument a Flink application, configure Datadog for trace visualization, and use tracing data to identify bottlenecks and optimize performance. The session is aimed at developers, system architects, and DevOps engineers working with real-time streaming applications, offering practical implementation examples and configuration tips.

Closing Session

Full Length

Join the closing ceremonies of Flink Forward Berlin 2024. Hear from Alex Walden, Ververica CEO as he shares final thoughts on the future of Apache Flink. Hear as Field CTO Ben Gamble presents a short closing session on Ververica’s Streaming Data Platform.

Ververica's Unified Streaming Data Platform

From the mainstage at Flink Forward Berlin 2024, watch Ben Gamble, Ververica Field CTO as he presents Ververica's Unified Streaming Data Platform. Powered by VERA, the engine revolutionizing Apache Flink, derive insights, make decisions, and take action with data from any source, in the deployment method of your choice.

Berlin 2024Recordings