This post introduces connectors to read and write Pravega Streams with Apache Hadoop stream processing framework.
The connectors can be used to build end-to-end stream processing pipelines (see Samples) that use Pravega as the stream storage and message bus, and Apache Hadoop for computation over the streams.
Implements both the input and the output format interfaces for Hadoop. It leverages Pravega batch client to read existing events in parallel; and uses write API to write events to Pravega stream.
The build script handles Pravega as a source dependency, meaning that the connector is linked to a specific commit of Pravega (as opposed to a specific release version) in order to facilitate co-development. This is accomplished with a combination of a git submodule and the use of Gradle’s composite build feature.
More details about build and usage are showing in the README of the hadoop connector
You can find the Pravega Hadoop connector examples from the pravega-samples repo. There are two code examples to give you some basic ideas on how to use hadoop-connectors for Pravega.
Stream
filled with random text to demonstrate the usage of Hadoop connector for Pravega.Stream
and then write sorted events to one or more streams.Before running the application, you may also need to set up the following Environment Prerequisites:
1. Pravega running (see here for instructions)
2. Build pravega-samples repository
3. Setup and start HDFS (refer this document from Apache Hadoop)
To learn more about how to build and use the Flink Connector library, follow the connector documentation here.
More examples on how to use the connectors with Flink application can be found in Pravega Samples repository.
Flink connectors for Pravega is 100% open source and community-driven. All components are available under Apache 2 License on GitHub.