Word Count Example Using Pravega Flink Connectors

WordCountWriter and WordCountReader

Read Post

This example demonstrates how to use the Pravega Flink Connectors to write data collected from an external network stream into a Pravega Stream and read the data from the Pravega Stream.

Purpose

Flink provides a DataStream API to perform real-time operations like mapping, windowing, and filtering on continuous unbounded streams of data. Whenever we think about the streaming operations in big data, it has become custom to do the word count. In this example, we will read some lines of words from the Pravega data stream storage system and perform the word count on them.

After complete this word count code sample, you will have a better understanding of how to use the Pravega Flink Connectors to write data collected from an external network stream into a Pravega Stream and read the data from the Pravega Stream.

Note: This code sample is only designed to run on a local cluster with support from Pravega. It could be an advanced academic exercise to enhance it and run on SDP.

Design

This example consists of two applications, a WordCountWriter that reads data from a network stream, transforms the data, and writes the data to a Pravega stream; and a WordCountReader that reads from a Pravega stream and prints the word counts summary. You might want to run WordCountWriter in one window and WordCountReader in another.

Instructions

Before you start, you need to download the latest Pravega release on the github releases page. See here for the instructions to build and run Pravega in standalone mode.

Besides, you also need to get the latest Flink binary from Apache download page. Follow this tutorial to start a local Flink cluster.

2. Build pravega-samples Repository

pravega-samples repository provides code samples to connect analytics engines Flink with Pravega as a storage substrate for data streams. It has divided into sub-projects(pravega-client-examples, flink-connector-examples and hadoop-connector-examples), each one addressed to demonstrate a specific component. To build pravega-samples from source, follow the instructions to use the built-in gradle wrapper.

3. Start the Word Count program

There are multiple ways to run the program in Flink environment including submitting from terminal or Flink UI. Follow the latest instruction from the github README to learn more about IDE setup and running process.

Source

https://github.com/pravega/pravega-samples/tree/master/flink-connector-examples/doc/flink-wordcount

Post on 09 Mar 2020