Getting started with PredictionIO
In this section, we will explore how to build and deploy an engine using the templates offered by PredictionIO.
To start with, let's pick up an e-commerce Recommendation Template to start building the Engine. We will add code snippets for both Java and Scala. If you are familiar with one of these languages, it will help you understand the supporting content.
The URL for the template for the Java version that we will use is located at https://github.com/apache/incubator-predictionio-template-java-ecom-recommender.
The Scala version of the template engine that we will use is located at https://github.com/apache/incubator-predictionio-template-ecom-recommender.
At this point, we can assume that you have PredictionIO set up as per the previous section.
The following are the steps to start a machine learning project using engine templates offered by PredictionIO:
- Add PredictionIO's binary command path to your path. If you are using macOS, it's simple to open up the .bash_profile file and add the path using the vi editor:
vi ~/.bash_profile
The following image shows the command you will add to your .bash_profile and save:
- Create a new engine from the template. For this demonstration, we will pick up the Java e-commerce Template engine. You can keep the engine in any directory. Let's assume our engine name is MyECommerceRecommendation. Type the following commands to get an Engine and cd into the Engine Folder:
$ pio template get apache/
incubator-predictionio-template-java-ecom-recommender
MyECommerceRecommendation
$ cd MyECommerceRecommendation
At this point, your project directory will look like the following screenshot:
- Generate the app and the secret key.
Create a new PredictionIO app to store data. The data collected will be used for machine learning modeling. Let's create an app named Recommenderapp:
pio app new Recommenderapp
The app will be listed. Copy the app secret for viewing the Event Server data payload and other purposes, which we will later discuss. Consider the following screenshot:
- Collecting Event Data.
Event Data can be collected via making a REST API POST call to the Event Server at port 7070. SDKs have code samples to make a RESTful call to port 7070 to insert an event data.
A simple sample request JSON is written as follows:
{
"event": "view",
"entityType": "user",
"entityId": "x1",
"targetEntityType": "item",
"targetEntityId": "i2",
"eventTime": "2015-02-17T02:11:21.934Z"
}
The API endpoint is as follows:
http://localhost:7070/events.json?accessKey=$accessKey
The CURL tool can be used to make an API call to insert data.
Consider the following code:
curl -i -X POST http://localhost:7070/events.json?
accessKey=$accessKey
-H "Content-Type: application/json"
-d '{
"event" : "view",
"entityType" : "user"
"entityId" : "x1",
"targetEntityType" : "item",
"targetEntityId" : "i3",
"eventTime" : "2015-02-17T02:12:21.934Z"
}'
When we deep dive into the code later, we will see how a simple Java code can be written to post
- Deploy the engine. The first step is to verify whether you are inside the right folder. In this case, our folder name is MyECommerceRecommendation. If you are not in the cd folder, ensure that the engine.json file has the proper project name. The engine.json file will look something like the following piece of code:
{
"id": "default",
"description": "Default settings",
"engineFactory": "org.template.recommendation.
RecommendationEngine",
"datasource": {
"params" : {
"appName": "Recommenderapp"
}
},
"algorithms": [
{
"name": "algo",
"params": {
"seed": 1,
"rank": 10,
"iteration": 10,
"lambda": 0.01,
"appName": "Recommenderapp",
"similarItemEvents": ["view"],
"seenItemEvents": ["buy", "view"],
"unseenOnly": true
}
}
]
}
Note that appName is the app that we created before and engineFactory is the class path of the RecommendationEngine.Java file.
- To build and deploy the engine as a web service (Build, Train and, Deploy), use the following command line:
pio build --verbose
This will build the engine, and once the engine is ready, the screen will display the following output:
Moreover, make sure your .sbt file looks like the following piece of code:
import AssemblyKeys._
assemblySettings
name := "barebone-template"
organization := "io.prediction"
libraryDependencies ++= Seq(
"org.apache.predictionio" %% "apache-predictionio-core" %
"0.10.0-incubating" % "provided",
"org.apache.spark" %% "spark-core" % "1.3.0" % "provided",
"org.apache.spark" %% "spark-mllib" % "1.3.0" % "provided",
"org.scalatest" % "scalatest_2.10" % "2.2.1" % "test",
"com.google.guava" % "guava" % "12.0",
"org.jblas" % "jblas" % "1.2.4"
)
Make sure your template .json file is written as follows:
{"pio": {"version": { "min": "0.10.0-incubating" }}}
Note that we are doing this as the project is being moved to the Apache open source and the templates in the repository still use the old code; therefore, there are open Pull requests to merge the right import statements. However, with the preceding instructions, you can manually change all imports as outlined at https://github.com/apache/incubator-predictionio-template-java-ecom-recommender/pull/5.
Once our build is successful, run the following train command:
pio train
Here is what your console will display once the training is completed:
Deploy the engine using the following command:
pio deploy
Once everything is successful, you should see the following output on your console:
Now, if you navigate to the local host on port http://0.0.0.0:8000/, you will see an engine screen like the following screenshot, detailing all the information about the project and configured parameters:
- Query the engine to get predicted response. One can query the engine server to receive the predicted result for a new dataset. This will be a simple REST API call as follows. In this example, we are requesting it to recommend 4 movies to a user whose ID is 1, based on the data collected during his visit to the website. Here, when the user visits the website, we assume our Event Server receives the event data:
$ curl -H "Content-Type: application/json"
-d '{ "userEntityId": "u1", "number": 4 }'
http://localhost:8000/queries.json
The response will be a JSON, as follows:
{
"itemScores":[
{"item":"22","score":4.072304374729956},
{"item":"62","score":4.058482414005789},
{"item":"75","score":4.046063009943821},
{"item":"68","score":3.8153661512945325}
]
}