上QQ阅读APP看书,第一时间看更新
PredictionIO platform components
The PredictionIO platform components consist of the following components:
- PredictionIO Framework: This provides a stack to build and deploy engines with machine learning algorithms. PredictionIO uses Apache Spark (http://spark.apache.org/) for data processing and MLlib (https://spark.apache.org/docs/latest/mllib-guide.html) to assist with predictive modeling. In this chapter, we will look into the installation of the engine in detail.
Apache Spark is a fast, in-memory data processing engine with development APIs to allow data workers to execute streaming, machine learning, or SQL. It is known to run 100 times faster than Hadoop MapReduce.
- Event Server: This is a machine learning analytics layer used by the PredictionIO platform to collect events from multiple systems. This layer can use Apache HBase (Apache HBase is a data store that runs on top of the Hadoop Distributed File System (HDFS); Hadoop is a framework to handle large datasets in a distributed computing environment) or the Java Database Connectivity (JDBC) backend as its data store. We will look into details on how to set an Event Server later. An Event Server will be a REST endpoint that listens for various events.
- Template Gallery and software development kit (SDKs): Predefined Templates Gallery (https://predictionio.incubator.apache.org/gallery/template-gallery/) for developers to provide a quick start guide for various machine learning tasks such as lead scoring, NLP, classification engine, and so on. This is a place to publish and download (free or proprietary) Engine templates for different types of machine learning applications. Some of these templates are maintained by third parties, but these projects are on GitHub to use and modify. There are also SDKs (https://predictionio.incubator.apache.org/sdk/), both community provided and officially built to help with integration of applications.