This example demonstrates how to use the Pravega Flink Connectors to write data collected
from an external network stream into a Pravega Stream
and read the data from the Pravega Stream
.
Flink provides a DataStream API to perform real-time operations like mapping, windowing, and filtering on continuous unbounded streams of data. Whenever we think about the streaming operations in big data, it has become custom to do the word count. In this example, we will read some lines of words from the Pravega data stream storage system and perform the word count on them.
After complete this word count code sample, you will have a better understanding of how to use the Pravega Flink Connectors to write data collected from an external network stream into a Pravega Stream and read the data from the Pravega Stream.
Note: This code sample is only designed to run on a local cluster with support from Pravega. It could be an advanced academic exercise to enhance it and run on SDP.
This example consists of two applications, a WordCountWriter
that reads data from a network stream, transforms the data, and writes the data to a Pravega stream; and a
WordCountReader
that reads from a Pravega stream and prints the word counts summary. You might want to run WordCountWriter
in one window and WordCountReader
in another.
Before you start, you need to download the latest Pravega release on the github releases page. See here for the instructions to build and run Pravega in standalone mode.
Besides, you also need to get the latest Flink binary from Apache download page. Follow this tutorial to start a local Flink cluster.
pravega-samples
Repositorypravega-samples
repository provides code samples to connect analytics engines Flink with Pravega as a storage substrate for data streams. It has divided into sub-projects(pravega-client-examples
, flink-connector-examples
and hadoop-connector-examples
), each one addressed to demonstrate a specific component. To build pravega-samples
from source, follow the instructions to use the built-in gradle wrapper.
There are multiple ways to run the program in Flink environment including submitting from terminal or Flink UI. Follow the latest instruction from the github README to learn more about IDE setup and running process.
https://github.com/pravega/pravega-samples/tree/master/flink-connector-examples/doc/flink-wordcount