Python Machine Learning Cookbook（Second Edition）

上QQ阅读APP看书，第一时间看更新

How to do it...

Let's see how to implement a stacking method:

We start by importing the libraries:

from heamy.dataset import Dataset
from heamy.estimator import Regressor
from heamy.pipeline import ModelsPipeline

from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_absolute_error

Load the boston dataset, already used in Chapter 1, The Realm of Supervised Learning, for the Estimating housing prices recipe:

data = load_boston()

Split the data:

X, y = data['data'], data['target']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1, random_state=2)

Let's create the dataset:

Data = Dataset(X_train,y_train,X_test)

Now we can build the two models that we will use in the stacking procedure:

RfModel = Regressor(dataset=Data, estimator=RandomForestRegressor, parameters={'n_estimators': 50},name='rf')
LRModel = Regressor(dataset=Data, estimator=LinearRegression, parameters={'normalize': True},name='lr')

It's time to stack these models:

Pipeline = ModelsPipeline(RfModel,LRModel)
StackModel = Pipeline.stack(k=10,seed=2)

Now we will train a LinearRegression model on stacked data:

Stacker = Regressor(dataset=StackModel, estimator=LinearRegression)

Finally, we will calculate the results to validate the model:

Results = Stacker.predict()
Results = Stacker.validate(k=10,scorer=mean_absolute_error)