欢迎访问新加坡聚知刊出版有限公司官方网站
info@juzhikan.asia
A Performance Optimisation Strategy for Real-Time Data Pipelines Based on Kafka and Flink
  • ISSN:3041-0843(Online) 3041-0797(Print)
  • DOI:10.69979/3041-0843.25.04.082
  • 出版频率:Quarterly Publication
  • 语言:English
  • 收录数据库:ISSN:https://portal.issn.org/ 中国知网:https://scholar.cnki.net/journal/search

A Performance Optimisation Strategy for Real-Time Data Pipelines Based on Kafka and Flink
Xiaoxi Liu  Shuhao Dong

College of Information Engineering, Hainan Vocational University of Science and Technology,Haikou,HaiNan,570100;

Abstract: With the surging demand for real-time big data processing, Kafka-Flink-based real-time data pipelines have become core components of enterprise data architectures. However, in production environments, pipeline performance frequently encounters bottlenecks due to data skew, suboptimal resource allocation, and inadequately managed backpressure mechanisms. This paper aims to systematically analyse the key factors affecting the performance of Kafka-Flink real-time data pipelines and proposes a comprehensive end-to-end optimisation strategy covering data production, transmission, computation, and coordination. By optimising Kafka's topic partitioning and producer configuration, adjusting Flink's task scheduling and state management, and introducing dynamic backpressure awareness and resource elastic scaling mechanisms, the proposed strategy significantly enhances pipeline throughput, reduces processing latency, and improves stability. Finally, the effectiveness of the proposed strategy is validated through simulation experiments.

Keywords : real-time data pipeline; Apache Kafka; Apache Flink; performance optimisation; backpressure mechanism

References

[1] Chen Xi, Li Jinsong. Research on Performance Optimisation of Kafka-Based Real-Time Data Streaming Platforms [J]. Computer Engineering, 2021, 47(5): 26-33.

[2] Wang Lei, Zhao Zhigang. Practical Optimisation of Apache Flink State Management and Checkpoint Mechanisms [J]. Journal of Software, 2020, 31(Supplement 1): 112-120.

[3] Zhang Jun, Liu Yang. Handling Strategies for Data Skew Issues in Large-Scale Stream Processing [J]. Research and Development in Computer Science, 2019, 56(11): 2450-2460.

[4] Guo Qiang, Gao Hong. Performance Tuning of Kafka Clusters for Real-time Data Pipelines [J]. Journal of East China Normal University (Natural Science Edition), 2022, (3): 45-55.

[5] Kreps, J. (Author). Translated by Dan Wei et al. Kafka: The Definitive Guide [M]. Beijing: People's Posts and Telecommunications Press, 2017.