{ "cells": [ { "cell_type": "markdown", "id": "76dc69f3-2413-4686-bfe9-cafc08f20c27", "metadata": {}, "source": [ "# AutoML for wage prediction" ] }, { "cell_type": "markdown", "id": "e8342cb7", "metadata": { "papermill": { "duration": 0.013777, "end_time": "2021-03-24T11:24:18.450894", "exception": false, "start_time": "2021-03-24T11:24:18.437117", "status": "completed" }, "tags": [] }, "source": [ "## Automatic Machine Learning with H2O AutoML using Wage Data from 2015" ] }, { "cell_type": "markdown", "id": "1fbd05b7", "metadata": { "papermill": { "duration": 0.014076, "end_time": "2021-03-24T11:24:18.478815", "exception": false, "start_time": "2021-03-24T11:24:18.464739", "status": "completed" }, "tags": [] }, "source": [ "We illustrate how to predict an outcome variable Y in a high-dimensional setting, using the AutoML package *H2O* that covers the complete pipeline from the raw dataset to the deployable machine learning model. In last few years, AutoML or automated machine learning has become widely popular among data science community. " ] }, { "cell_type": "markdown", "id": "5333433f", "metadata": { "papermill": { "duration": 0.013915, "end_time": "2021-03-24T11:24:18.508556", "exception": false, "start_time": "2021-03-24T11:24:18.494641", "status": "completed" }, "tags": [] }, "source": [ "We can use AutoML as a benchmark and compare it to the methods that we used in the previous notebook where we applied one machine learning method after the other." ] }, { "cell_type": "code", "execution_count": 10, "id": "dd30c6a6", "metadata": {}, "outputs": [], "source": [ "# Import relevant packages\n", "import pandas as pd\n", "import numpy as np\n", "import pyreadr\n", "import os\n", "from urllib.request import urlopen\n", "from sklearn import preprocessing\n", "import patsy\n", "from h2o.automl import H2OAutoML\n", "\n", "from numpy import loadtxt\n", "from keras.models import Sequential\n", "from keras.layers import Dense\n", "import warnings\n", "warnings.filterwarnings('ignore')" ] }, { "cell_type": "code", "execution_count": 2, "id": "fb1130e2-716b-49c7-b6f3-e7061fc1dd9e", "metadata": {}, "outputs": [], "source": [ "#pip install h2o" ] }, { "cell_type": "code", "execution_count": 11, "id": "0e6d7d98", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Checking whether there is an H2O instance running at http://localhost:54321 . connected.\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
H2O_cluster_uptime:4 hours 11 mins
H2O_cluster_timezone:America/Bogota
H2O_data_parsing_timezone:UTC
H2O_cluster_version:3.36.1.3
H2O_cluster_version_age:26 days
H2O_cluster_name:H2O_from_python_User_fi8ht0
H2O_cluster_total_nodes:1
H2O_cluster_free_memory:2.467 Gb
H2O_cluster_total_cores:4
H2O_cluster_allowed_cores:4
H2O_cluster_status:locked, healthy
H2O_connection_url:http://localhost:54321
H2O_connection_proxy:{\"http\": null, \"https\": null}
H2O_internal_security:False
Python_version:3.9.12 final
" ], "text/plain": [ "-------------------------- -----------------------------\n", "H2O_cluster_uptime: 4 hours 11 mins\n", "H2O_cluster_timezone: America/Bogota\n", "H2O_data_parsing_timezone: UTC\n", "H2O_cluster_version: 3.36.1.3\n", "H2O_cluster_version_age: 26 days\n", "H2O_cluster_name: H2O_from_python_User_fi8ht0\n", "H2O_cluster_total_nodes: 1\n", "H2O_cluster_free_memory: 2.467 Gb\n", "H2O_cluster_total_cores: 4\n", "H2O_cluster_allowed_cores: 4\n", "H2O_cluster_status: locked, healthy\n", "H2O_connection_url: http://localhost:54321\n", "H2O_connection_proxy: {\"http\": null, \"https\": null}\n", "H2O_internal_security: False\n", "Python_version: 3.9.12 final\n", "-------------------------- -----------------------------" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# load the H2O package\n", "import h2o\n", "\n", "# start h2o cluster\n", "h2o.init()" ] }, { "cell_type": "code", "execution_count": 12, "id": "6de4a8bc-161e-4e0a-ae67-45bab3517933", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "pandas.core.frame.DataFrame" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "link=\"https://raw.githubusercontent.com/d2cml-ai/14.388_py/main/data/wage2015_subsample_inference.Rdata\"\n", "response = urlopen(link)\n", "content = response.read()\n", "fhandle = open( 'wage2015_subsample_inference.Rdata', 'wb')\n", "fhandle.write(content)\n", "fhandle.close()\n", "result = pyreadr.read_r(\"wage2015_subsample_inference.Rdata\")\n", "os.remove(\"wage2015_subsample_inference.Rdata\")\n", "\n", "# Extracting the data frame from rdata_read\n", "data = result[ 'data' ]\n", "n = data.shape[0]\n", "type(data)" ] }, { "cell_type": "code", "execution_count": 13, "id": "305bb8e2", "metadata": {}, "outputs": [], "source": [ "# Import relevant packages for splitting data\n", "import random\n", "import math\n", "\n", "# Set Seed\n", "# to make the results replicable (generating random numbers)\n", "np.random.seed(0)\n", "random = np.random.randint(0, data.shape[0], size=math.floor(data.shape[0]))\n", "data[\"random\"] = random\n", "random # the array does not change \n", "data_2 = data.sort_values(by=['random'])" ] }, { "cell_type": "code", "execution_count": 14, "id": "52dd607c", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(3862, 21)\n", "(1288, 21)\n" ] } ], "source": [ "# Create training and testing sample \n", "train = data_2[ : math.floor(n*3/4)] # training sample\n", "test = data_2[ math.floor(n*3/4) : ] # testing sample\n", "print(train.shape)\n", "print(test.shape)" ] }, { "cell_type": "code", "execution_count": 15, "id": "bed9f791", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Checking whether there is an H2O instance running at http://localhost:54321 . connected.\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
H2O_cluster_uptime:4 hours 12 mins
H2O_cluster_timezone:America/Bogota
H2O_data_parsing_timezone:UTC
H2O_cluster_version:3.36.1.3
H2O_cluster_version_age:26 days
H2O_cluster_name:H2O_from_python_User_fi8ht0
H2O_cluster_total_nodes:1
H2O_cluster_free_memory:2.467 Gb
H2O_cluster_total_cores:4
H2O_cluster_allowed_cores:4
H2O_cluster_status:locked, healthy
H2O_connection_url:http://localhost:54321
H2O_connection_proxy:{\"http\": null, \"https\": null}
H2O_internal_security:False
Python_version:3.9.12 final
" ], "text/plain": [ "-------------------------- -----------------------------\n", "H2O_cluster_uptime: 4 hours 12 mins\n", "H2O_cluster_timezone: America/Bogota\n", "H2O_data_parsing_timezone: UTC\n", "H2O_cluster_version: 3.36.1.3\n", "H2O_cluster_version_age: 26 days\n", "H2O_cluster_name: H2O_from_python_User_fi8ht0\n", "H2O_cluster_total_nodes: 1\n", "H2O_cluster_free_memory: 2.467 Gb\n", "H2O_cluster_total_cores: 4\n", "H2O_cluster_allowed_cores: 4\n", "H2O_cluster_status: locked, healthy\n", "H2O_connection_url: http://localhost:54321\n", "H2O_connection_proxy: {\"http\": null, \"https\": null}\n", "H2O_internal_security: False\n", "Python_version: 3.9.12 final\n", "-------------------------- -----------------------------" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# start h2o cluster\n", "h2o.init()" ] }, { "cell_type": "code", "execution_count": 16, "id": "bb269c76", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Parse progress: |████████████████████████████████████████████████████████████████| (done) 100%\n", "Parse progress: |████████████████████████████████████████████████████████████████| (done) 100%\n", "Rows:3862\n", "Cols:21\n", "\n", "\n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
wage lwage sex shs hsg scl clg ad mw so we ne exp1 exp2 exp3 exp4 occ occ2 ind ind2 random
type real real int int int int int int int int int int real real real real int int int int int
mins 3.021978021978022 1.10591159114972130.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 10.0 1.0 370.0 2.0 0.0
mean 23.4654177314672022.969427791239379 0.446918694976696 0.0235629207664422580.247022268254790260.2780942516830658 0.3125323666494045 0.138788192646297240.25530813050233040.298291040911444870.215691351631279140.2307094769549456213.6721905748317282.99230321077161378.15417316157433 24.8497603346711465243.41843604350411.69135163127914 6667.99611600207313.3337648886587061914.2263076126374
maxs 528.845673076923 6.270696655981913 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 47.0 22.09 103.823 487.9681 100000.0 22.0 100000.0 22.0 3825.0
sigma 21.4300857435067660.57508937089339250.49723877097042930.15170256601034887 0.4313356487428773 0.448118104070973130.463585519808550450.3457701367976389 0.436090737870262 0.457567162363124670.4113543571812672 0.4213414081729523 10.5986136870326553.987480265404191 14.42474448790359753.27996156215637 11579.911146621046.97834165856571255588.2642823549115.691380293915183 1104.7025392074318
zeros 0 0 2136 3771 2908 2788 2655 3326 2876 2710 3029 2971 48 48 48 48 0 0 0 0 3
missing0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 26.4423076923076933.274965291519244 1.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 29.0 8.41 24.389 70.7281 340.0 1.0 8660.0 20.0 0.0
1 19.23076923076923 2.95651156040070970.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 1.0 33.5 11.2225 37.595375 125.94450625 9620.0 22.0 1870.0 5.0 0.0
2 48.07692307692308 3.872802292274865 1.0 0.0 0.0 0.0 0.0 1.0 1.0 0.0 0.0 0.0 2.0 0.04 0.008 0.0016 3060.0 10.0 8190.0 18.0 0.0
3 12.01923076923077 2.486507931154974 0.0 0.0 1.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 29.0 8.41 24.389 70.7281 6440.0 19.0 770.0 4.0 2.0
4 39.90384615384615 3.68647271408337131.0 0.0 0.0 0.0 0.0 1.0 0.0 1.0 0.0 0.0 12.0 1.44 1.728 2.0736 1820.0 5.0 7860.0 17.0 2.0
5 13.1578947368421042.57702193869580580.0 0.0 1.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 11.0 1.21 1.331 1.4641 8810.0 21.0 3895.0 6.0 3.0
6 20.1923076923076933.005301724570142 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 17.0 2.89 4.913 8.3521 7200.0 20.0 8770.0 21.0 4.0
7 12.01923076923077 2.486507931154974 0.0 0.0 1.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 7.0 0.49 0.343 0.2401 5610.0 17.0 4265.0 7.0 5.0
8 28.8461538461538473.361976668508874 1.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 8.0 0.64 0.512 0.4096 5240.0 17.0 6970.0 12.0 7.0
9 34.13461538461539 3.530311983328089 0.0 0.0 1.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 38.0 14.44 54.872 208.5136 5550.0 17.0 6370.0 10.0 7.0
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# convert data as h2o type\n", "train_h = h2o.H2OFrame(train)\n", "test_h = h2o.H2OFrame(test)\n", "\n", "# have a look at the data\n", "train_h.describe()\n" ] }, { "cell_type": "code", "execution_count": 17, "id": "1f0bfa3f", "metadata": {}, "outputs": [], "source": [ "# define the variables\n", "y = 'lwage'\n", "\n", "data_columns = list(data)\n", "no_relev_col = ['wage','occ2', 'ind2', 'random', 'lwage']\n", "\n", "# This gives us: new_list = ['carrot' , 'lemon']\n", "x = [col for col in data_columns if col not in no_relev_col]\n" ] }, { "cell_type": "code", "execution_count": 18, "id": "57c48dce", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "AutoML progress: |█\n", "05:14:21.267: AutoML: XGBoost is not available; skipping it.\n", "\n", "██████████████████████████████████████████████████████████████| (done) 100%\n", "Model Details\n", "=============\n", "H2OStackedEnsembleEstimator : Stacked Ensemble\n", "Model Key: StackedEnsemble_AllModels_1_AutoML_2_20220804_51421\n", "\n", "No model summary for this model\n", "\n", "ModelMetricsRegressionGLM: stackedensemble\n", "** Reported on train data. **\n", "\n", "MSE: 0.14963411645818037\n", "RMSE: 0.38682569260350375\n", "MAE: 0.29493414661545647\n", "RMSLE: 0.09962964024727533\n", "R^2: 0.5474439172253015\n", "Mean Residual Deviance: 0.14963411645818037\n", "Null degrees of freedom: 3861\n", "Residual degrees of freedom: 3855\n", "Null deviance: 1276.9399854672324\n", "Residual deviance: 577.8869577614926\n", "AIC: 3639.772059449712\n", "\n", "ModelMetricsRegressionGLM: stackedensemble\n", "** Reported on cross-validation data. **\n", "\n", "MSE: 0.21847845950742845\n", "RMSE: 0.4674167942077268\n", "MAE: 0.35554195972164104\n", "RMSLE: 0.11935352477830528\n", "R^2: 0.3392298618411601\n", "Mean Residual Deviance: 0.21847845950742845\n", "Null degrees of freedom: 3861\n", "Residual degrees of freedom: 3854\n", "Null deviance: 1277.1043650451534\n", "Residual deviance: 843.7638106176887\n", "AIC: 5103.517182973992\n", "\n", "Cross-Validation Metrics Summary: \n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
meansdcv_1_validcv_2_validcv_3_validcv_4_validcv_5_valid
0mae0.3555370.0111490.3588560.3656650.3368520.3549340.361377
1mean_residual_deviance0.2186580.0181640.2322280.2262810.1915590.2088520.234372
2mse0.2186580.0181640.2322280.2262810.1915590.2088520.234372
3null_deviance255.42087018.944805270.143950279.337650235.192600252.053120240.377060
4r20.3391520.0246810.3320460.3429150.3670030.3523540.301443
5residual_deviance168.74406013.997013180.441440183.513900148.841520163.113280167.810170
6rmse0.4672780.0196860.4819010.4756900.4376750.4570030.484120
7rmsle0.1193070.0046190.1224570.1212970.1113120.1194220.122048
\n", "
" ], "text/plain": [ " mean sd cv_1_valid cv_2_valid \\\n", "0 mae 0.355537 0.011149 0.358856 0.365665 \n", "1 mean_residual_deviance 0.218658 0.018164 0.232228 0.226281 \n", "2 mse 0.218658 0.018164 0.232228 0.226281 \n", "3 null_deviance 255.420870 18.944805 270.143950 279.337650 \n", "4 r2 0.339152 0.024681 0.332046 0.342915 \n", "5 residual_deviance 168.744060 13.997013 180.441440 183.513900 \n", "6 rmse 0.467278 0.019686 0.481901 0.475690 \n", "7 rmsle 0.119307 0.004619 0.122457 0.121297 \n", "\n", " cv_3_valid cv_4_valid cv_5_valid \n", "0 0.336852 0.354934 0.361377 \n", "1 0.191559 0.208852 0.234372 \n", "2 0.191559 0.208852 0.234372 \n", "3 235.192600 252.053120 240.377060 \n", "4 0.367003 0.352354 0.301443 \n", "5 148.841520 163.113280 167.810170 \n", "6 0.437675 0.457003 0.484120 \n", "7 0.111312 0.119422 0.122048 " ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/plain": [] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# run AutoML for 10 base models and a maximal runtime of 100 seconds\n", "# Run AutoML for 30 seconds\n", "aml = H2OAutoML(max_runtime_secs = 100, max_models = 10, seed = 1)\n", "aml.train(x = x, y = y, training_frame = train_h, leaderboard_frame = test_h)\n" ] }, { "cell_type": "code", "execution_count": 22, "id": "88df97d7", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
model_id rmse mse mae rmsle mean_residual_deviance
StackedEnsemble_AllModels_1_AutoML_1_20220722_170939 0.47064 0.2215020.3539430.120368 0.221502
GBM_2_AutoML_1_20220722_170939 0.4718490.2226420.3559410.12074 0.222642
GBM_5_AutoML_1_20220722_170939 0.47191 0.2226990.35582 0.120501 0.222699
StackedEnsemble_BestOfFamily_1_AutoML_1_20220722_1709390.4737980.2244840.3575980.121491 0.224484
GBM_3_AutoML_1_20220722_170939 0.4745850.22523 0.3593090.121264 0.22523
GBM_1_AutoML_1_20220722_170939 0.4781490.2286260.3622110.122278 0.228626
GBM_4_AutoML_1_20220722_170939 0.4790720.22951 0.3629160.122456 0.22951
GBM_grid_1_AutoML_1_20220722_170939_model_1 0.48001 0.23041 0.36381 0.122671 0.23041
XRT_1_AutoML_1_20220722_170939 0.4912180.2412950.3750890.125318 0.241295
DRF_1_AutoML_1_20220722_170939 0.50224 0.2522450.3826060.12847 0.252245
" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n" ] } ], "source": [ "# AutoML Leaderboard\n", "lb = aml.leaderboard\n", "print(lb)" ] }, { "cell_type": "markdown", "id": "d295cdc1", "metadata": {}, "source": [ "We see that two Stacked Ensembles are at the top of the leaderboard. Stacked Ensembles often outperform a single model. The out-of-sample (test) MSE of the leading model is given by" ] }, { "cell_type": "code", "execution_count": 23, "id": "800f6aba", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0.22150159010610537" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "aml.leaderboard['mse'][0,0]" ] }, { "cell_type": "markdown", "id": "6c9c7ec9", "metadata": {}, "source": [ "The in-sample performance can be evaluated by" ] }, { "cell_type": "code", "execution_count": 24, "id": "7e47ac17", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Model Details\n", "=============\n", "H2OStackedEnsembleEstimator : Stacked Ensemble\n", "Model Key: StackedEnsemble_AllModels_1_AutoML_1_20220722_170939\n", "\n", "No model summary for this model\n", "\n", "ModelMetricsRegressionGLM: stackedensemble\n", "** Reported on train data. **\n", "\n", "MSE: 0.15027004489538795\n", "RMSE: 0.38764680431468534\n", "MAE: 0.2955371675604367\n", "RMSLE: 0.09990016533775842\n", "R^2: 0.5455206039510314\n", "Mean Residual Deviance: 0.15027004489538795\n", "Null degrees of freedom: 3861\n", "Residual degrees of freedom: 3854\n", "Null deviance: 1276.9399854672324\n", "Residual deviance: 580.3429133859883\n", "AIC: 3658.1503537307485\n", "\n", "ModelMetricsRegressionGLM: stackedensemble\n", "** Reported on cross-validation data. **\n", "\n", "MSE: 0.2185070080606413\n", "RMSE: 0.46744733185744175\n", "MAE: 0.35554501482828577\n", "RMSLE: 0.11935774800872184\n", "R^2: 0.3391435190891413\n", "Mean Residual Deviance: 0.2185070080606413\n", "Null degrees of freedom: 3861\n", "Residual degrees of freedom: 3853\n", "Null deviance: 1277.1043650451534\n", "Residual deviance: 843.8740651301968\n", "AIC: 5106.021797066896\n", "\n", "Cross-Validation Metrics Summary: \n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
meansdcv_1_validcv_2_validcv_3_validcv_4_validcv_5_valid
0mae0.3555750.0111500.3586940.3655970.3367750.3554300.361377
1mean_residual_deviance0.2186850.0181490.2322070.2262130.1914510.2091820.234372
2mse0.2186850.0181490.2322070.2262130.1914510.2091820.234372
3null_deviance255.42087018.944805270.143950279.337650235.192600252.053120240.377060
4r20.3390710.0246540.3321080.3431130.3673600.3513310.301443
5residual_deviance168.76434013.983558180.424550183.458680148.757460163.370830167.810170
6rmse0.4673060.0196740.4818780.4756180.4375510.4573640.484120
7rmsle0.1193100.0046320.1224370.1212790.1112710.1195140.122048
\n", "
" ], "text/plain": [ " mean sd cv_1_valid cv_2_valid \\\n", "0 mae 0.355575 0.011150 0.358694 0.365597 \n", "1 mean_residual_deviance 0.218685 0.018149 0.232207 0.226213 \n", "2 mse 0.218685 0.018149 0.232207 0.226213 \n", "3 null_deviance 255.420870 18.944805 270.143950 279.337650 \n", "4 r2 0.339071 0.024654 0.332108 0.343113 \n", "5 residual_deviance 168.764340 13.983558 180.424550 183.458680 \n", "6 rmse 0.467306 0.019674 0.481878 0.475618 \n", "7 rmsle 0.119310 0.004632 0.122437 0.121279 \n", "\n", " cv_3_valid cv_4_valid cv_5_valid \n", "0 0.336775 0.355430 0.361377 \n", "1 0.191451 0.209182 0.234372 \n", "2 0.191451 0.209182 0.234372 \n", "3 235.192600 252.053120 240.377060 \n", "4 0.367360 0.351331 0.301443 \n", "5 148.757460 163.370830 167.810170 \n", "6 0.437551 0.457364 0.484120 \n", "7 0.111271 0.119514 0.122048 " ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/plain": [] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "aml.leader" ] }, { "cell_type": "markdown", "id": "ca0b36af", "metadata": { "papermill": { "duration": 0.027663, "end_time": "2021-03-24T11:25:13.491063", "exception": false, "start_time": "2021-03-24T11:25:13.463400", "status": "completed" }, "tags": [] }, "source": [ "This is in line with our previous results. To understand how the ensemble works, let's take a peek inside the Stacked Ensemble \"All Models\" model. The \"All Models\" ensemble is an ensemble of all of the individual models in the AutoML run. This is often the top performing model on the leaderboard." ] }, { "cell_type": "code", "execution_count": 25, "id": "95549783", "metadata": {}, "outputs": [], "source": [ "model_ids = h2o.as_list(aml.leaderboard['model_id'][0], use_pandas=True)\n" ] }, { "cell_type": "code", "execution_count": 26, "id": "c2236931", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'StackedEnsemble_AllModels_1_AutoML_1_20220722_170939'" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "model = model_ids[model_ids['model_id'].str.contains(\"StackedEnsemble_AllModels\")].values.tolist()\n", "model_id = model[0][0]\n", "model_id" ] }, { "cell_type": "code", "execution_count": 30, "id": "615b33c4", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Model Details\n", "=============\n", "H2OStackedEnsembleEstimator : Stacked Ensemble\n", "Model Key: StackedEnsemble_AllModels_1_AutoML_1_20220722_170939\n", "\n", "No model summary for this model\n", "\n", "ModelMetricsRegressionGLM: stackedensemble\n", "** Reported on train data. **\n", "\n", "MSE: 0.15027004489538795\n", "RMSE: 0.38764680431468534\n", "MAE: 0.2955371675604367\n", "RMSLE: 0.09990016533775842\n", "R^2: 0.5455206039510314\n", "Mean Residual Deviance: 0.15027004489538795\n", "Null degrees of freedom: 3861\n", "Residual degrees of freedom: 3854\n", "Null deviance: 1276.9399854672324\n", "Residual deviance: 580.3429133859883\n", "AIC: 3658.1503537307485\n", "\n", "ModelMetricsRegressionGLM: stackedensemble\n", "** Reported on cross-validation data. **\n", "\n", "MSE: 0.2185070080606413\n", "RMSE: 0.46744733185744175\n", "MAE: 0.35554501482828577\n", "RMSLE: 0.11935774800872184\n", "R^2: 0.3391435190891413\n", "Mean Residual Deviance: 0.2185070080606413\n", "Null degrees of freedom: 3861\n", "Residual degrees of freedom: 3853\n", "Null deviance: 1277.1043650451534\n", "Residual deviance: 843.8740651301968\n", "AIC: 5106.021797066896\n", "\n", "Cross-Validation Metrics Summary: \n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
meansdcv_1_validcv_2_validcv_3_validcv_4_validcv_5_valid
0mae0.3555750.0111500.3586940.3655970.3367750.3554300.361377
1mean_residual_deviance0.2186850.0181490.2322070.2262130.1914510.2091820.234372
2mse0.2186850.0181490.2322070.2262130.1914510.2091820.234372
3null_deviance255.42087018.944805270.143950279.337650235.192600252.053120240.377060
4r20.3390710.0246540.3321080.3431130.3673600.3513310.301443
5residual_deviance168.76434013.983558180.424550183.458680148.757460163.370830167.810170
6rmse0.4673060.0196740.4818780.4756180.4375510.4573640.484120
7rmsle0.1193100.0046320.1224370.1212790.1112710.1195140.122048
\n", "
" ], "text/plain": [ " mean sd cv_1_valid cv_2_valid \\\n", "0 mae 0.355575 0.011150 0.358694 0.365597 \n", "1 mean_residual_deviance 0.218685 0.018149 0.232207 0.226213 \n", "2 mse 0.218685 0.018149 0.232207 0.226213 \n", "3 null_deviance 255.420870 18.944805 270.143950 279.337650 \n", "4 r2 0.339071 0.024654 0.332108 0.343113 \n", "5 residual_deviance 168.764340 13.983558 180.424550 183.458680 \n", "6 rmse 0.467306 0.019674 0.481878 0.475618 \n", "7 rmsle 0.119310 0.004632 0.122437 0.121279 \n", "\n", " cv_3_valid cv_4_valid cv_5_valid \n", "0 0.336775 0.355430 0.361377 \n", "1 0.191451 0.209182 0.234372 \n", "2 0.191451 0.209182 0.234372 \n", "3 235.192600 252.053120 240.377060 \n", "4 0.367360 0.351331 0.301443 \n", "5 148.757460 163.370830 167.810170 \n", "6 0.437551 0.457364 0.484120 \n", "7 0.111271 0.119514 0.122048 " ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/plain": [] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "se = h2o.get_model(model_id)\n", "se" ] }, { "cell_type": "code", "execution_count": 31, "id": "439e8999", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Model Details\n", "=============\n", "H2OGeneralizedLinearEstimator : Generalized Linear Modeling\n", "Model Key: metalearner_AUTO_StackedEnsemble_AllModels_1_AutoML_1_20220722_170939\n", "\n", "\n", "GLM Model: summary\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
familylinkregularizationlambda_searchnumber_of_predictors_totalnumber_of_active_predictorsnumber_of_iterationstraining_frame
0gaussianidentityElastic Net (alpha = 0.5, lambda = 0.002551 )nlambda = 100, lambda.max = 0.2219, lambda.min = 0.002551, lambda....10749levelone_training_StackedEnsemble_AllModels_1_AutoML_1_20220722_17...
\n", "
" ], "text/plain": [ " family link regularization \\\n", "0 gaussian identity Elastic Net (alpha = 0.5, lambda = 0.002551 ) \n", "\n", " lambda_search \\\n", "0 nlambda = 100, lambda.max = 0.2219, lambda.min = 0.002551, lambda.... \n", "\n", " number_of_predictors_total number_of_active_predictors \\\n", "0 10 7 \n", "\n", " number_of_iterations \\\n", "0 49 \n", "\n", " training_frame \n", "0 levelone_training_StackedEnsemble_AllModels_1_AutoML_1_20220722_17... " ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "\n", "ModelMetricsRegressionGLM: glm\n", "** Reported on train data. **\n", "\n", "MSE: 0.21727419974117612\n", "RMSE: 0.4661268065035266\n", "MAE: 0.35448759076481845\n", "RMSLE: 0.11902010738109643\n", "R^2: 0.3428720464937892\n", "Mean Residual Deviance: 0.21727419974117612\n", "Null degrees of freedom: 3861\n", "Residual degrees of freedom: 3854\n", "Null deviance: 1276.9399854672324\n", "Residual deviance: 839.1129594004221\n", "AIC: 5082.170839083037\n", "\n", "ModelMetricsRegressionGLM: glm\n", "** Reported on cross-validation data. **\n", "\n", "MSE: 0.2185070080606413\n", "RMSE: 0.46744733185744175\n", "MAE: 0.35554501482828577\n", "RMSLE: 0.11935774800872184\n", "R^2: 0.3391435190891413\n", "Mean Residual Deviance: 0.2185070080606413\n", "Null degrees of freedom: 3861\n", "Residual degrees of freedom: 3853\n", "Null deviance: 1277.1043650451534\n", "Residual deviance: 843.8740651301968\n", "AIC: 5106.021797066896\n", "\n", "Cross-Validation Metrics Summary: \n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
meansdcv_1_validcv_2_validcv_3_validcv_4_validcv_5_valid
0mae0.3555750.0111500.3586940.3655970.3367750.3554300.361377
1mean_residual_deviance0.2186850.0181490.2322070.2262130.1914510.2091820.234372
2mse0.2186850.0181490.2322070.2262130.1914510.2091820.234372
3null_deviance255.42087018.944805270.143950279.337650235.192600252.053120240.377060
4r20.3390710.0246540.3321080.3431130.3673600.3513310.301443
5residual_deviance168.76434013.983558180.424550183.458680148.757460163.370830167.810170
6rmse0.4673060.0196740.4818780.4756180.4375510.4573640.484120
7rmsle0.1193100.0046320.1224370.1212790.1112710.1195140.122048
\n", "
" ], "text/plain": [ " mean sd cv_1_valid cv_2_valid \\\n", "0 mae 0.355575 0.011150 0.358694 0.365597 \n", "1 mean_residual_deviance 0.218685 0.018149 0.232207 0.226213 \n", "2 mse 0.218685 0.018149 0.232207 0.226213 \n", "3 null_deviance 255.420870 18.944805 270.143950 279.337650 \n", "4 r2 0.339071 0.024654 0.332108 0.343113 \n", "5 residual_deviance 168.764340 13.983558 180.424550 183.458680 \n", "6 rmse 0.467306 0.019674 0.481878 0.475618 \n", "7 rmsle 0.119310 0.004632 0.122437 0.121279 \n", "\n", " cv_3_valid cv_4_valid cv_5_valid \n", "0 0.336775 0.355430 0.361377 \n", "1 0.191451 0.209182 0.234372 \n", "2 0.191451 0.209182 0.234372 \n", "3 235.192600 252.053120 240.377060 \n", "4 0.367360 0.351331 0.301443 \n", "5 148.757460 163.370830 167.810170 \n", "6 0.437551 0.457364 0.484120 \n", "7 0.111271 0.119514 0.122048 " ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Scoring History: \n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
timestampdurationiterationlambdapredictorsdeviance_traindeviance_xvaldeviance_sealphaiterationstraining_rmsetraining_deviancetraining_maetraining_r2
02022-07-22 17:10:180.000 sec1,22E010.3306420.3306520.0082160.5NaN
12022-07-22 17:10:180.000 sec2,2E050.3160660.3306090.0082200.5NaN
22022-07-22 17:10:180.000 sec3,18E060.3020490.3306270.0082160.5NaN
32022-07-22 17:10:180.000 sec4,17E060.2898480.3221570.0085120.5NaN
42022-07-22 17:10:180.016 sec5,15E070.2792870.3076940.0084070.55.00.5284760.2792870.4077320.155318
52022-07-22 17:10:180.016 sec6,14E070.2700070.2948010.0083230.5NaN
62022-07-22 17:10:180.016 sec7,13E070.2620860.2836210.0082530.5NaN
72022-07-22 17:10:180.016 sec8,12E070.2553350.2738750.0082260.5NaN
82022-07-22 17:10:180.016 sec9,11E080.2495860.2654710.0082060.5NaN
92022-07-22 17:10:180.016 sec10,96E-180.2445980.2583040.0081880.510.00.4945680.2445980.3775880.260234
102022-07-22 17:10:180.032 sec11,88E-180.2404600.2522040.0081750.5NaN
112022-07-22 17:10:180.032 sec12,8E-180.2369260.2469500.0081890.5NaN
122022-07-22 17:10:180.032 sec13,73E-180.2339430.2425300.0081940.5NaN
132022-07-22 17:10:180.032 sec14,66E-180.2314230.2387740.0081960.5NaN
142022-07-22 17:10:180.032 sec15,6E-180.2292930.2356080.0082040.515.00.4788460.2292930.3640870.306521
152022-07-22 17:10:180.032 sec16,55E-180.2274950.2329350.0082100.5NaN
162022-07-22 17:10:180.032 sec17,5E-180.2261540.2306820.0082160.5NaN
172022-07-22 17:10:180.047 sec18,46E-180.2246420.2287830.0082200.5NaN
182022-07-22 17:10:180.047 sec19,42E-180.2237370.2273370.0082110.5NaN
192022-07-22 17:10:180.047 sec20,38E-180.2226420.2258250.0082380.520.00.4718490.2226420.3583770.326639
\n", "
" ], "text/plain": [ " timestamp duration iteration lambda predictors \\\n", "0 2022-07-22 17:10:18 0.000 sec 1 ,22E0 1 \n", "1 2022-07-22 17:10:18 0.000 sec 2 ,2E0 5 \n", "2 2022-07-22 17:10:18 0.000 sec 3 ,18E0 6 \n", "3 2022-07-22 17:10:18 0.000 sec 4 ,17E0 6 \n", "4 2022-07-22 17:10:18 0.016 sec 5 ,15E0 7 \n", "5 2022-07-22 17:10:18 0.016 sec 6 ,14E0 7 \n", "6 2022-07-22 17:10:18 0.016 sec 7 ,13E0 7 \n", "7 2022-07-22 17:10:18 0.016 sec 8 ,12E0 7 \n", "8 2022-07-22 17:10:18 0.016 sec 9 ,11E0 8 \n", "9 2022-07-22 17:10:18 0.016 sec 10 ,96E-1 8 \n", "10 2022-07-22 17:10:18 0.032 sec 11 ,88E-1 8 \n", "11 2022-07-22 17:10:18 0.032 sec 12 ,8E-1 8 \n", "12 2022-07-22 17:10:18 0.032 sec 13 ,73E-1 8 \n", "13 2022-07-22 17:10:18 0.032 sec 14 ,66E-1 8 \n", "14 2022-07-22 17:10:18 0.032 sec 15 ,6E-1 8 \n", "15 2022-07-22 17:10:18 0.032 sec 16 ,55E-1 8 \n", "16 2022-07-22 17:10:18 0.032 sec 17 ,5E-1 8 \n", "17 2022-07-22 17:10:18 0.047 sec 18 ,46E-1 8 \n", "18 2022-07-22 17:10:18 0.047 sec 19 ,42E-1 8 \n", "19 2022-07-22 17:10:18 0.047 sec 20 ,38E-1 8 \n", "\n", " deviance_train deviance_xval deviance_se alpha iterations \\\n", "0 0.330642 0.330652 0.008216 0.5 NaN \n", "1 0.316066 0.330609 0.008220 0.5 NaN \n", "2 0.302049 0.330627 0.008216 0.5 NaN \n", "3 0.289848 0.322157 0.008512 0.5 NaN \n", "4 0.279287 0.307694 0.008407 0.5 5.0 \n", "5 0.270007 0.294801 0.008323 0.5 NaN \n", "6 0.262086 0.283621 0.008253 0.5 NaN \n", "7 0.255335 0.273875 0.008226 0.5 NaN \n", "8 0.249586 0.265471 0.008206 0.5 NaN \n", "9 0.244598 0.258304 0.008188 0.5 10.0 \n", "10 0.240460 0.252204 0.008175 0.5 NaN \n", "11 0.236926 0.246950 0.008189 0.5 NaN \n", "12 0.233943 0.242530 0.008194 0.5 NaN \n", "13 0.231423 0.238774 0.008196 0.5 NaN \n", "14 0.229293 0.235608 0.008204 0.5 15.0 \n", "15 0.227495 0.232935 0.008210 0.5 NaN \n", "16 0.226154 0.230682 0.008216 0.5 NaN \n", "17 0.224642 0.228783 0.008220 0.5 NaN \n", "18 0.223737 0.227337 0.008211 0.5 NaN \n", "19 0.222642 0.225825 0.008238 0.5 20.0 \n", "\n", " training_rmse training_deviance training_mae training_r2 \n", "0 \n", "1 \n", "2 \n", "3 \n", "4 0.528476 0.279287 0.407732 0.155318 \n", "5 \n", "6 \n", "7 \n", "8 \n", "9 0.494568 0.244598 0.377588 0.260234 \n", "10 \n", "11 \n", "12 \n", "13 \n", "14 0.478846 0.229293 0.364087 0.306521 \n", "15 \n", "16 \n", "17 \n", "18 \n", "19 0.471849 0.222642 0.358377 0.326639 " ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "See the whole table with table.as_data_frame()\n", "\n", "Variable Importances: \n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
variablerelative_importancescaled_importancepercentage
0GBM_5_AutoML_1_20220722_1709390.1564441.0000000.459637
1GBM_3_AutoML_1_20220722_1709390.0641770.4102200.188552
2GBM_1_AutoML_1_20220722_1709390.0450770.2881320.132436
3GBM_grid_1_AutoML_1_20220722_170939_model_10.0429870.2747770.126298
4GBM_4_AutoML_1_20220722_1709390.0222510.1422270.065373
5DeepLearning_1_AutoML_1_20220722_1709390.0094290.0602680.027701
6DRF_1_AutoML_1_20220722_1709390.0000010.0000070.000003
7GBM_2_AutoML_1_20220722_1709390.0000000.0000000.000000
8XRT_1_AutoML_1_20220722_1709390.0000000.0000000.000000
9GLM_1_AutoML_1_20220722_1709390.0000000.0000000.000000
\n", "
" ], "text/plain": [ " variable relative_importance \\\n", "0 GBM_5_AutoML_1_20220722_170939 0.156444 \n", "1 GBM_3_AutoML_1_20220722_170939 0.064177 \n", "2 GBM_1_AutoML_1_20220722_170939 0.045077 \n", "3 GBM_grid_1_AutoML_1_20220722_170939_model_1 0.042987 \n", "4 GBM_4_AutoML_1_20220722_170939 0.022251 \n", "5 DeepLearning_1_AutoML_1_20220722_170939 0.009429 \n", "6 DRF_1_AutoML_1_20220722_170939 0.000001 \n", "7 GBM_2_AutoML_1_20220722_170939 0.000000 \n", "8 XRT_1_AutoML_1_20220722_170939 0.000000 \n", "9 GLM_1_AutoML_1_20220722_170939 0.000000 \n", "\n", " scaled_importance percentage \n", "0 1.000000 0.459637 \n", "1 0.410220 0.188552 \n", "2 0.288132 0.132436 \n", "3 0.274777 0.126298 \n", "4 0.142227 0.065373 \n", "5 0.060268 0.027701 \n", "6 0.000007 0.000003 \n", "7 0.000000 0.000000 \n", "8 0.000000 0.000000 \n", "9 0.000000 0.000000 " ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/plain": [] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Get the Stacked Ensemble metalearner model\n", "metalearner = se.metalearner()\n", "metalearner" ] }, { "cell_type": "markdown", "id": "06402906", "metadata": {}, "source": [ "Examine the variable importance of the metalearner (combiner) algorithm in the ensemble. This shows us how much each base learner is contributing to the ensemble. The AutoML Stacked Ensembles use the default metalearner algorithm (GLM with non-negative weights), so the variable importance of the metalearner is actually the standardized coefficient magnitudes of the GLM." ] }, { "cell_type": "markdown", "id": "7425b332", "metadata": {}, "source": [ "The table above gives us the variable importance of the metalearner in the ensemble. The AutoML Stacked Ensembles use the default metalearner algorithm (GLM with non-negative weights), so the variable importance of the metalearner is actually the standardized coefficient magnitudes of the GLM. \n" ] }, { "cell_type": "code", "execution_count": 32, "id": "4d86b390", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'Intercept': 2.969427791239395,\n", " 'GBM_2_AutoML_1_20220722_170939': 0.0,\n", " 'GBM_5_AutoML_1_20220722_170939': 0.1564442116943833,\n", " 'GBM_3_AutoML_1_20220722_170939': 0.06417658529927933,\n", " 'GBM_1_AutoML_1_20220722_170939': 0.04507655475235933,\n", " 'GBM_4_AutoML_1_20220722_170939': 0.02225052417703666,\n", " 'GBM_grid_1_AutoML_1_20220722_170939_model_1': 0.0429872925804153,\n", " 'XRT_1_AutoML_1_20220722_170939': 0.0,\n", " 'DRF_1_AutoML_1_20220722_170939': 1.1143189727895769e-06,\n", " 'GLM_1_AutoML_1_20220722_170939': 0.0,\n", " 'DeepLearning_1_AutoML_1_20220722_170939': 0.009428554928469175}" ] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "source": [ "metalearner.coef_norm()" ] }, { "cell_type": "code", "execution_count": 33, "id": "b4160a06", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 33, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "metalearner.std_coef_plot()" ] }, { "cell_type": "code", "execution_count": 34, "id": "d08c1586", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Model Details\n", "=============\n", "H2OGeneralizedLinearEstimator : Generalized Linear Modeling\n", "Model Key: metalearner_AUTO_StackedEnsemble_AllModels_1_AutoML_1_20220722_170939\n", "\n", "\n", "GLM Model: summary\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
familylinkregularizationlambda_searchnumber_of_predictors_totalnumber_of_active_predictorsnumber_of_iterationstraining_frame
0gaussianidentityElastic Net (alpha = 0.5, lambda = 0.002551 )nlambda = 100, lambda.max = 0.2219, lambda.min = 0.002551, lambda....10749levelone_training_StackedEnsemble_AllModels_1_AutoML_1_20220722_17...
\n", "
" ], "text/plain": [ " family link regularization \\\n", "0 gaussian identity Elastic Net (alpha = 0.5, lambda = 0.002551 ) \n", "\n", " lambda_search \\\n", "0 nlambda = 100, lambda.max = 0.2219, lambda.min = 0.002551, lambda.... \n", "\n", " number_of_predictors_total number_of_active_predictors \\\n", "0 10 7 \n", "\n", " number_of_iterations \\\n", "0 49 \n", "\n", " training_frame \n", "0 levelone_training_StackedEnsemble_AllModels_1_AutoML_1_20220722_17... " ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "\n", "ModelMetricsRegressionGLM: glm\n", "** Reported on train data. **\n", "\n", "MSE: 0.21727419974117612\n", "RMSE: 0.4661268065035266\n", "MAE: 0.35448759076481845\n", "RMSLE: 0.11902010738109643\n", "R^2: 0.3428720464937892\n", "Mean Residual Deviance: 0.21727419974117612\n", "Null degrees of freedom: 3861\n", "Residual degrees of freedom: 3854\n", "Null deviance: 1276.9399854672324\n", "Residual deviance: 839.1129594004221\n", "AIC: 5082.170839083037\n", "\n", "ModelMetricsRegressionGLM: glm\n", "** Reported on cross-validation data. **\n", "\n", "MSE: 0.2185070080606413\n", "RMSE: 0.46744733185744175\n", "MAE: 0.35554501482828577\n", "RMSLE: 0.11935774800872184\n", "R^2: 0.3391435190891413\n", "Mean Residual Deviance: 0.2185070080606413\n", "Null degrees of freedom: 3861\n", "Residual degrees of freedom: 3853\n", "Null deviance: 1277.1043650451534\n", "Residual deviance: 843.8740651301968\n", "AIC: 5106.021797066896\n", "\n", "Cross-Validation Metrics Summary: \n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
meansdcv_1_validcv_2_validcv_3_validcv_4_validcv_5_valid
0mae0.3555750.0111500.3586940.3655970.3367750.3554300.361377
1mean_residual_deviance0.2186850.0181490.2322070.2262130.1914510.2091820.234372
2mse0.2186850.0181490.2322070.2262130.1914510.2091820.234372
3null_deviance255.42087018.944805270.143950279.337650235.192600252.053120240.377060
4r20.3390710.0246540.3321080.3431130.3673600.3513310.301443
5residual_deviance168.76434013.983558180.424550183.458680148.757460163.370830167.810170
6rmse0.4673060.0196740.4818780.4756180.4375510.4573640.484120
7rmsle0.1193100.0046320.1224370.1212790.1112710.1195140.122048
\n", "
" ], "text/plain": [ " mean sd cv_1_valid cv_2_valid \\\n", "0 mae 0.355575 0.011150 0.358694 0.365597 \n", "1 mean_residual_deviance 0.218685 0.018149 0.232207 0.226213 \n", "2 mse 0.218685 0.018149 0.232207 0.226213 \n", "3 null_deviance 255.420870 18.944805 270.143950 279.337650 \n", "4 r2 0.339071 0.024654 0.332108 0.343113 \n", "5 residual_deviance 168.764340 13.983558 180.424550 183.458680 \n", "6 rmse 0.467306 0.019674 0.481878 0.475618 \n", "7 rmsle 0.119310 0.004632 0.122437 0.121279 \n", "\n", " cv_3_valid cv_4_valid cv_5_valid \n", "0 0.336775 0.355430 0.361377 \n", "1 0.191451 0.209182 0.234372 \n", "2 0.191451 0.209182 0.234372 \n", "3 235.192600 252.053120 240.377060 \n", "4 0.367360 0.351331 0.301443 \n", "5 148.757460 163.370830 167.810170 \n", "6 0.437551 0.457364 0.484120 \n", "7 0.111271 0.119514 0.122048 " ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Scoring History: \n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
timestampdurationiterationlambdapredictorsdeviance_traindeviance_xvaldeviance_sealphaiterationstraining_rmsetraining_deviancetraining_maetraining_r2
02022-07-22 17:10:180.000 sec1,22E010.3306420.3306520.0082160.5NaN
12022-07-22 17:10:180.000 sec2,2E050.3160660.3306090.0082200.5NaN
22022-07-22 17:10:180.000 sec3,18E060.3020490.3306270.0082160.5NaN
32022-07-22 17:10:180.000 sec4,17E060.2898480.3221570.0085120.5NaN
42022-07-22 17:10:180.016 sec5,15E070.2792870.3076940.0084070.55.00.5284760.2792870.4077320.155318
52022-07-22 17:10:180.016 sec6,14E070.2700070.2948010.0083230.5NaN
62022-07-22 17:10:180.016 sec7,13E070.2620860.2836210.0082530.5NaN
72022-07-22 17:10:180.016 sec8,12E070.2553350.2738750.0082260.5NaN
82022-07-22 17:10:180.016 sec9,11E080.2495860.2654710.0082060.5NaN
92022-07-22 17:10:180.016 sec10,96E-180.2445980.2583040.0081880.510.00.4945680.2445980.3775880.260234
102022-07-22 17:10:180.032 sec11,88E-180.2404600.2522040.0081750.5NaN
112022-07-22 17:10:180.032 sec12,8E-180.2369260.2469500.0081890.5NaN
122022-07-22 17:10:180.032 sec13,73E-180.2339430.2425300.0081940.5NaN
132022-07-22 17:10:180.032 sec14,66E-180.2314230.2387740.0081960.5NaN
142022-07-22 17:10:180.032 sec15,6E-180.2292930.2356080.0082040.515.00.4788460.2292930.3640870.306521
152022-07-22 17:10:180.032 sec16,55E-180.2274950.2329350.0082100.5NaN
162022-07-22 17:10:180.032 sec17,5E-180.2261540.2306820.0082160.5NaN
172022-07-22 17:10:180.047 sec18,46E-180.2246420.2287830.0082200.5NaN
182022-07-22 17:10:180.047 sec19,42E-180.2237370.2273370.0082110.5NaN
192022-07-22 17:10:180.047 sec20,38E-180.2226420.2258250.0082380.520.00.4718490.2226420.3583770.326639
\n", "
" ], "text/plain": [ " timestamp duration iteration lambda predictors \\\n", "0 2022-07-22 17:10:18 0.000 sec 1 ,22E0 1 \n", "1 2022-07-22 17:10:18 0.000 sec 2 ,2E0 5 \n", "2 2022-07-22 17:10:18 0.000 sec 3 ,18E0 6 \n", "3 2022-07-22 17:10:18 0.000 sec 4 ,17E0 6 \n", "4 2022-07-22 17:10:18 0.016 sec 5 ,15E0 7 \n", "5 2022-07-22 17:10:18 0.016 sec 6 ,14E0 7 \n", "6 2022-07-22 17:10:18 0.016 sec 7 ,13E0 7 \n", "7 2022-07-22 17:10:18 0.016 sec 8 ,12E0 7 \n", "8 2022-07-22 17:10:18 0.016 sec 9 ,11E0 8 \n", "9 2022-07-22 17:10:18 0.016 sec 10 ,96E-1 8 \n", "10 2022-07-22 17:10:18 0.032 sec 11 ,88E-1 8 \n", "11 2022-07-22 17:10:18 0.032 sec 12 ,8E-1 8 \n", "12 2022-07-22 17:10:18 0.032 sec 13 ,73E-1 8 \n", "13 2022-07-22 17:10:18 0.032 sec 14 ,66E-1 8 \n", "14 2022-07-22 17:10:18 0.032 sec 15 ,6E-1 8 \n", "15 2022-07-22 17:10:18 0.032 sec 16 ,55E-1 8 \n", "16 2022-07-22 17:10:18 0.032 sec 17 ,5E-1 8 \n", "17 2022-07-22 17:10:18 0.047 sec 18 ,46E-1 8 \n", "18 2022-07-22 17:10:18 0.047 sec 19 ,42E-1 8 \n", "19 2022-07-22 17:10:18 0.047 sec 20 ,38E-1 8 \n", "\n", " deviance_train deviance_xval deviance_se alpha iterations \\\n", "0 0.330642 0.330652 0.008216 0.5 NaN \n", "1 0.316066 0.330609 0.008220 0.5 NaN \n", "2 0.302049 0.330627 0.008216 0.5 NaN \n", "3 0.289848 0.322157 0.008512 0.5 NaN \n", "4 0.279287 0.307694 0.008407 0.5 5.0 \n", "5 0.270007 0.294801 0.008323 0.5 NaN \n", "6 0.262086 0.283621 0.008253 0.5 NaN \n", "7 0.255335 0.273875 0.008226 0.5 NaN \n", "8 0.249586 0.265471 0.008206 0.5 NaN \n", "9 0.244598 0.258304 0.008188 0.5 10.0 \n", "10 0.240460 0.252204 0.008175 0.5 NaN \n", "11 0.236926 0.246950 0.008189 0.5 NaN \n", "12 0.233943 0.242530 0.008194 0.5 NaN \n", "13 0.231423 0.238774 0.008196 0.5 NaN \n", "14 0.229293 0.235608 0.008204 0.5 15.0 \n", "15 0.227495 0.232935 0.008210 0.5 NaN \n", "16 0.226154 0.230682 0.008216 0.5 NaN \n", "17 0.224642 0.228783 0.008220 0.5 NaN \n", "18 0.223737 0.227337 0.008211 0.5 NaN \n", "19 0.222642 0.225825 0.008238 0.5 20.0 \n", "\n", " training_rmse training_deviance training_mae training_r2 \n", "0 \n", "1 \n", "2 \n", "3 \n", "4 0.528476 0.279287 0.407732 0.155318 \n", "5 \n", "6 \n", "7 \n", "8 \n", "9 0.494568 0.244598 0.377588 0.260234 \n", "10 \n", "11 \n", "12 \n", "13 \n", "14 0.478846 0.229293 0.364087 0.306521 \n", "15 \n", "16 \n", "17 \n", "18 \n", "19 0.471849 0.222642 0.358377 0.326639 " ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "See the whole table with table.as_data_frame()\n", "\n", "Variable Importances: \n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
variablerelative_importancescaled_importancepercentage
0GBM_5_AutoML_1_20220722_1709390.1564441.0000000.459637
1GBM_3_AutoML_1_20220722_1709390.0641770.4102200.188552
2GBM_1_AutoML_1_20220722_1709390.0450770.2881320.132436
3GBM_grid_1_AutoML_1_20220722_170939_model_10.0429870.2747770.126298
4GBM_4_AutoML_1_20220722_1709390.0222510.1422270.065373
5DeepLearning_1_AutoML_1_20220722_1709390.0094290.0602680.027701
6DRF_1_AutoML_1_20220722_1709390.0000010.0000070.000003
7GBM_2_AutoML_1_20220722_1709390.0000000.0000000.000000
8XRT_1_AutoML_1_20220722_1709390.0000000.0000000.000000
9GLM_1_AutoML_1_20220722_1709390.0000000.0000000.000000
\n", "
" ], "text/plain": [ " variable relative_importance \\\n", "0 GBM_5_AutoML_1_20220722_170939 0.156444 \n", "1 GBM_3_AutoML_1_20220722_170939 0.064177 \n", "2 GBM_1_AutoML_1_20220722_170939 0.045077 \n", "3 GBM_grid_1_AutoML_1_20220722_170939_model_1 0.042987 \n", "4 GBM_4_AutoML_1_20220722_170939 0.022251 \n", "5 DeepLearning_1_AutoML_1_20220722_170939 0.009429 \n", "6 DRF_1_AutoML_1_20220722_170939 0.000001 \n", "7 GBM_2_AutoML_1_20220722_170939 0.000000 \n", "8 XRT_1_AutoML_1_20220722_170939 0.000000 \n", "9 GLM_1_AutoML_1_20220722_170939 0.000000 \n", "\n", " scaled_importance percentage \n", "0 1.000000 0.459637 \n", "1 0.410220 0.188552 \n", "2 0.288132 0.132436 \n", "3 0.274777 0.126298 \n", "4 0.142227 0.065373 \n", "5 0.060268 0.027701 \n", "6 0.000007 0.000003 \n", "7 0.000000 0.000000 \n", "8 0.000000 0.000000 \n", "9 0.000000 0.000000 " ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/plain": [] }, "execution_count": 34, "metadata": {}, "output_type": "execute_result" } ], "source": [ "h2o.get_model(model_id).metalearner()" ] }, { "cell_type": "markdown", "id": "a8f423ad", "metadata": { "papermill": { "duration": 0.030956, "end_time": "2021-03-24T11:25:14.345344", "exception": false, "start_time": "2021-03-24T11:25:14.314388", "status": "completed" }, "tags": [] }, "source": [ "## Generating Predictions Using Leader Model\n", "\n", "We can also generate predictions on a test sample using the leader model object." ] }, { "cell_type": "code", "execution_count": 35, "id": "90dd2625", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "stackedensemble prediction progress: |███████████████████████████████████████████| (done) 100%\n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
predict
2.88073
3.32158
3.08133
2.73111
2.51044
3.13355
3.18375
3.74316
2.58496
3.3082
" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/plain": [] }, "execution_count": 35, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pred = aml.predict(test_h)\n", "pred.head()" ] }, { "cell_type": "markdown", "id": "cbb85276", "metadata": {}, "source": [ "This allows us to estimate the out-of-sample (test) MSE and the standard error as well." ] }, { "cell_type": "code", "execution_count": 36, "id": "d38d7686", "metadata": {}, "outputs": [], "source": [ "pred_2 = pred.as_data_frame()\n", "pred_aml = pred_2.to_numpy()" ] }, { "cell_type": "code", "execution_count": 37, "id": "3b5cb41a", "metadata": {}, "outputs": [], "source": [ "Y_test = test_h['lwage'].as_data_frame().to_numpy()" ] }, { "cell_type": "code", "execution_count": 38, "id": "3b070b47", "metadata": {}, "outputs": [], "source": [ "import statsmodels.api as sm\n", "import statsmodels.formula.api as smf" ] }, { "cell_type": "code", "execution_count": 39, "id": "7602a9e8", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Coef. 0.221502\n", "Std.Err. 0.012942\n", "Name: const, dtype: float64" ] }, "execution_count": 39, "metadata": {}, "output_type": "execute_result" } ], "source": [ "resid_basic = (Y_test-pred_aml)**2\n", "\n", "MSE_aml_basic = sm.OLS( resid_basic , np.ones( resid_basic.shape[0] ) ).fit().summary2().tables[1].iloc[0, 0:2]\n", "MSE_aml_basic" ] }, { "cell_type": "markdown", "id": "6725164f", "metadata": {}, "source": [ "We observe both a lower MSE and a lower standard error compared to our previous results (see [here](https://www.kaggle.com/janniskueck/pm3-notebook-newdata))." ] }, { "cell_type": "markdown", "id": "e03b4c5b", "metadata": { "tags": [] }, "source": [ "### By using model_performance()\n", "If needed, the standard model_performance() method can be applied to the AutoML leader model and a test set to generate an H2O model performance object.\n", "\n" ] }, { "cell_type": "code", "execution_count": 40, "id": "caaedabd", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "ModelMetricsRegressionGLM: stackedensemble\n", "** Reported on test data. **\n", "\n", "MSE: 0.22150159010610537\n", "RMSE: 0.4706395543365489\n", "MAE: 0.353942656628514\n", "RMSLE: 0.12036767274774818\n", "R^2: 0.2835426043371053\n", "Mean Residual Deviance: 0.22150159010610537\n", "Null degrees of freedom: 1287\n", "Residual degrees of freedom: 1280\n", "Null deviance: 398.23902107893576\n", "Residual deviance: 285.2940480566637\n", "AIC: 1731.7504037354604\n" ] }, { "data": { "text/plain": [] }, "execution_count": 40, "metadata": {}, "output_type": "execute_result" } ], "source": [ "perf = aml.leader.model_performance(test_h)\n", "perf" ] } ], "metadata": { "hide_input": false, "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.12" } }, "nbformat": 4, "nbformat_minor": 5 }