Sakha

Apache Beam – Making Big Data Processing Portable

If you’re tired of using multiple technologies to accomplish various big data tasks, you may want to consider Apache Beam, a new distributed processing tool from Google that’s now incubating at the ASF (Apache Software Foundation).

One of the challenges of big data development is the need to use lots of different technologies, frameworks, APIs, languages, and software development kits. Depending on what you’re trying to do–and where you’re trying to do it–you may choose MapReduce for batch processing, Apache Spark SQL for interactive queries, Apache Flink for real-time streaming, or a machine learning framework running on the cloud.

This scenario has increased pressure on the developer to pick “the right” tool for what is to be accomplished. This can be a bit overwhelming for those new to big data application development, and it could even slow or hinder adoption of open source tools.

Enter Google. The Web giant is hoping to eliminate some of this second-guessing and painful tool-jumping with Apache Beam, which it’s positioning as a single programming and runtime model that not only unifies development for batch, interactive, and streaming workflows, but also provides a single model for both cloud and on-premise development. This SlideShare explores Apache Beam in brief.