{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# V0.1.6 - Presenting main functionality\n", "\n", "Example created by Wilson Rocha Lacerda Junior" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here we import the NARMAX model, the metric for model evaluation and the methods to generate sample data for tests. Also, we import pandas for specific usage." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "pip install sysidentpy" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import pandas as pd\n", "import matplotlib.pyplot as plt\n", "from sysidentpy.polynomial_basis import PolynomialNarmax\n", "from sysidentpy.metrics import root_relative_squared_error\n", "from sysidentpy.utils.generate_data import get_miso_data, get_siso_data\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Generating 1 input 1 output sample data \n", "\n", "The data is generated by simulating the following model:\n", "\n", "$y_k = 0.2y_{k-1} + 0.1y_{k-1}x_{k-1} + 0.9x_{k-1} + e_{k}$\n", "\n", "If *colored_noise* is set to True:\n", "\n", "$e_{k} = 0.8\\nu_{k-1} + \\nu_{k}$\n", "\n", "where $x$ is a uniformly distributed random variable and $\\nu$ is a gaussian distributed variable with $\\mu=0$ and $\\sigma=0.1$\n", "\n", "In the next example we will generate a data with 1000 samples with white noise and selecting 90% of the data to train the model. " ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "x_train, x_valid, y_train, y_valid = get_siso_data(n=1000,\n", " colored_noise=False,\n", " sigma=0.001,\n", " train_percentage=90)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To obtain a NARMAX model we have to choose some values, *e.g*, the nonlinearity degree (*non_degree*), the maximum lag for the inputs and output (*xlag* and *ylag*). \n", "\n", "In addition, you can select the information criteria to be used with the Error Reduction Ratio to select the model order and the method to estimate the model paramaters:\n", "\n", "- Information Criteria: aic, bic, lilc, fpe\n", "- Parameter Estimation: least_squares, total_least_squares, recursive_least_squares, least_mean_squres and many other (see the docs)\n", "\n", "The *n_terms* values is optional. It refer to the number of terms to inclued in the final model. You can set this value based on the information criteria (see below) or based on priori information about the model struture. The default value is *n_terms=None*, so the algorithm will choose the minimum value reached by the information criteria.\n", "\n", "To use information criteria you have to set *order_selection=True*. You can also select *n_info_values* (default = 15)." ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "scrolled": false }, "outputs": [], "source": [ "model = PolynomialNarmax(non_degree=2,\n", " order_selection=True,\n", " n_info_values=10,\n", " extended_least_squares=False,\n", " ylag=2, xlag=2,\n", " info_criteria='aic',\n", " estimator='least_squares',\n", " )" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Model Structure Selection\n", "\n", "The *fit* method executes the Error Reduction Ratio algorithm using Househoulder reflection to select the model structure. " ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "model.fit(x_train, y_train)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Free run simulation\n", "\n", "The *predict* method is use to generate the predictions. For now we only support *free run simulation* (also known as *infinity steps ahead*). Soon will let the user define a *one-step ahead* or *k-step ahead* prediction." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "yhat = model.predict(x_valid, y_valid)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Evaluate the model\n", "\n", "In this example we use the *root_relative_squared_error* metric because it is often used in System Idenfication. More metrics and information about it can be found on documentation." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0.0018401326800931033\n" ] } ], "source": [ "rrse = root_relative_squared_error(y_valid, yhat)\n", "print(rrse)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "*model_object.results* return the selected model regressors, the estimated parameters and the ERR values. As shown below, the algorithm detect the exact model that was used for simulate the data." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " Regressors Parameters ERR\n", "0 x1(k-2) 0.8999 0.95739001\n", "1 y(k-1) 0.2000 0.03917632\n", "2 x1(k-1)y(k-1) 0.0999 0.00343057\n", "3 x1(k-2)^2 -0.0002 0.00000002\n", "4 x1(k-1) 0.0001 0.00000001\n", "5 x1(k-1)y(k-2) 0.0002 0.00000001\n" ] } ], "source": [ "results = pd.DataFrame(model.results(err_precision=8,\n", " dtype='dec'),\n", " columns=['Regressors', 'Parameters', 'ERR'])\n", "\n", "print(results)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In addition, you can access the *residuals* and *plot_result* methods to take a look at the prediction and two residual analysis. The *extras* and *lam* values below contain another residues analysis so you can plot it mannualy. This method will be improved soon. " ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "scrolled": false }, "outputs": [ { "data": { "image/png": 