Python Machine Learning Cookbook(Second Edition)
上QQ阅读APP看书,第一时间看更新

How to do it...

Let's see how to implement a stacking method:

  1. We start by importing the libraries:
from heamy.dataset import Dataset
from heamy.estimator import Regressor
from heamy.pipeline import ModelsPipeline

from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_absolute_error
  1. Load the boston dataset, already used in Chapter 1, The Realm of Supervised Learning, for the Estimating housing prices recipe:
data = load_boston()
  1. Split the data:
X, y = data['data'], data['target']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1, random_state=2)
  1. Let's create the dataset:
Data = Dataset(X_train,y_train,X_test)
  1. Now we can build the two models that we will use in the stacking procedure:
RfModel = Regressor(dataset=Data, estimator=RandomForestRegressor, parameters={'n_estimators': 50},name='rf')
LRModel = Regressor(dataset=Data, estimator=LinearRegression, parameters={'normalize': True},name='lr')
  1. It's time to stack these models:
Pipeline = ModelsPipeline(RfModel,LRModel)
StackModel = Pipeline.stack(k=10,seed=2)
  1. Now we will train a LinearRegression model on stacked data:
Stacker = Regressor(dataset=StackModel, estimator=LinearRegression)
  1. Finally, we will calculate the results to validate the model:
Results = Stacker.predict()
Results = Stacker.validate(k=10,scorer=mean_absolute_error)