Learning Salesforce Einstein
上QQ阅读APP看书,第一时间看更新

Getting started with PredictionIO

In this section, we will explore how to build and deploy an engine using the templates offered by PredictionIO.

To start with, let's pick up an e-commerce Recommendation Template to start building the Engine. We will add code snippets for both Java and Scala. If you are familiar with one of these languages, it will help you understand the supporting content.

PredictionIO, for its core engine implementation, uses Scala, even though there is no hard rule to use Scala to build on top of PredictionIO. You can use Java to code the engine and the algorithm as well. Spark MLlib also supports both Java and Scala.

The URL for the template for the Java version that we will use is located at https://github.com/apache/incubator-predictionio-template-java-ecom-recommender.

The Scala version of the template engine that we will use is located at https://github.com/apache/incubator-predictionio-template-ecom-recommender.

We will cover the explanation of code in both Scala and Java, and depending on your preferred language, you can skip the other. Scala uses functional paradigm, and you will observe that when using functional syntax, you will need to type less compared to Java. 

At this point, we can assume that you have PredictionIO set up as per the previous section.

The following are the steps to start a machine learning project using engine templates offered by PredictionIO:

  1. Add PredictionIO's binary command path to your path. If you are using macOS, it's simple to open up the .bash_profile file and add the path using the vi editor:
      vi ~/.bash_profile

The following image shows the command you will add to your .bash_profile and save:

  1. Create a new engine from the template. For this demonstration, we will pick up the Java e-commerce Template engine. You can keep the engine in any directory. Let's assume our engine name is MyECommerceRecommendation. Type the following commands to get an Engine and cd into the Engine Folder:
 $ pio template get apache/
incubator-predictionio-template-java-ecom-recommender
MyECommerceRecommendation

$ cd MyECommerceRecommendation

At this point, your project directory will look like the following screenshot:

  1. Generate the app and the secret key.

Create a new PredictionIO app to store data. The data collected will be used for machine learning modeling. Let's create an app named Recommenderapp:

      pio app new Recommenderapp

The app will be listed. Copy the app secret for viewing the Event Server data payload and other purposes, which we will later discuss. Consider the following screenshot:

  1. Collecting Event Data.

Event Data can be collected via making a REST API POST call to the Event Server at port 7070. SDKs have code samples to make a RESTful call to port 7070 to insert an event data.

A simple sample request JSON is written as follows:

    {
"event": "view",
"entityType": "user",
"entityId": "x1",
"targetEntityType": "item",
"targetEntityId": "i2",
"eventTime": "2015-02-17T02:11:21.934Z"
}

           The API endpoint is as follows:

           http://localhost:7070/events.json?accessKey=$accessKey

Replace  $accessKey  with the access key obtained during app creation.

            The CURL tool can be used to make an API call to insert data.

            Consider the following code:

        curl -i -X POST http://localhost:7070/events.json?
accessKey=$accessKey
-H "Content-Type: application/json"
-d '{
"event" : "view",
"entityType" : "user"
"entityId" : "x1",
"targetEntityType" : "item",
"targetEntityId" : "i3",
"eventTime" : "2015-02-17T02:12:21.934Z"
}'

When we deep dive into the code later, we will see how a simple Java code can be written to post

  1. Deploy the engine. The first step is to verify whether you are inside the right folder. In this case, our folder name is MyECommerceRecommendation. If you are not in the cd folder, ensure that the engine.json file has the proper project name. The engine.json file will look something like the following piece of code:
          {
"id": "default",
"description": "Default settings",
"engineFactory": "org.template.recommendation.
RecommendationEngine",
"datasource": {
"params" : {
"appName": "Recommenderapp"
}
},
"algorithms": [
{
"name": "algo",
"params": {
"seed": 1,
"rank": 10,
"iteration": 10,
"lambda": 0.01,
"appName": "Recommenderapp",
"similarItemEvents": ["view"],
"seenItemEvents": ["buy", "view"],
"unseenOnly": true
}
}
]
}

Note that appName is the app that we created before and engineFactory is the class path of the RecommendationEngine.Java file.

  1. To build and deploy the engine as a web service (Build, Train and, Deploy), use the following command line:
      pio build --verbose

This will build the engine, and once the engine is ready, the screen will display the following output:

In case you run into build issues, one of the causes for Java templates is that it still uses  io.prediction instead of  org.apache.predictionio.  Hence, you will need to change all references in your project source code to use org.apache.predictionio

            Moreover, make sure your .sbt file looks like the following piece of code:

        import AssemblyKeys._

assemblySettings

name := "barebone-template"

organization := "io.prediction"

libraryDependencies ++= Seq(
"org.apache.predictionio" %% "apache-predictionio-core" %
"0.10.0-incubating" % "provided",
"org.apache.spark" %% "spark-core" % "1.3.0" % "provided",
"org.apache.spark" %% "spark-mllib" % "1.3.0" % "provided",
"org.scalatest" % "scalatest_2.10" % "2.2.1" % "test",
"com.google.guava" % "guava" % "12.0",
"org.jblas" % "jblas" % "1.2.4"
)
The sbt  is a build tool to manage dependencies and build for Scala and Java projects. If you have used Maven or ANT before, sbt is similar to that, but with the added advantage of support for continuous compilation and testing for Scala. The engine is written in Scala.

             Make sure your template .json file is written as follows:

      {"pio": {"version": { "min": "0.10.0-incubating" }}}

Note that we are doing this as the project is being moved to the Apache open source and the templates in the repository still use the old code; therefore, there are open Pull requests to merge the right import statements. However, with the preceding instructions, you can manually change all imports as outlined at https://github.com/apache/incubator-predictionio-template-java-ecom-recommender/pull/5.

Once our build is successful, run the following train command:

      pio train

Here is what your console will display once the training is completed:

Deploy the engine using the following command:

      pio deploy

Once everything is successful, you should see the following output on your console:

Now, if you navigate to the local host on port http://0.0.0.0:8000/, you will see an engine screen like the following screenshot, detailing all the information about the project and configured parameters:

  1. Query the engine to get predicted response. One can query the engine server to receive the predicted result for a new dataset. This will be a simple REST API call as follows. In this example, we are requesting it to recommend 4 movies to a user whose ID is 1, based on the data collected during his visit to the website. Here, when the user visits the website, we assume our Event Server receives the event data:
 $ curl -H "Content-Type: application/json" 
-d '{ "userEntityId": "u1", "number": 4 }'
http://localhost:8000/queries.json

The response will be a JSON, as follows: 

        {
"itemScores":[
{"item":"22","score":4.072304374729956},
{"item":"62","score":4.058482414005789},
{"item":"75","score":4.046063009943821},
{"item":"68","score":3.8153661512945325}
]
}