Presenting main functionality¶
Example created by Wilson Rocha Lacerda Junior
Here we import the NARMAX model, the metric for model evaluation and the methods to generate sample data for tests. Also, we import pandas for specific usage.
pip install sysidentpy
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sysidentpy.model_structure_selection import FROLS
from sysidentpy.basis_function._basis_function import Polynomial
from sysidentpy.metrics import root_relative_squared_error
from sysidentpy.utils.generate_data import get_siso_data
from sysidentpy.utils.display_results import results
from sysidentpy.utils.plotting import plot_residues_correlation, plot_results
from sysidentpy.residues.residues_correlation import (
compute_residues_autocorrelation,
compute_cross_correlation,
)
Generating 1 input 1 output sample data¶
The data is generated by simulating the following model:
$y_k = 0.2y_{k-1} + 0.1y_{k-1}x_{k-1} + 0.9x_{k-1} + e_{k}$
If colored_noise is set to True:
$e_{k} = 0.8\nu_{k-1} + \nu_{k}$
where $x$ is a uniformly distributed random variable and $\nu$ is a gaussian distributed variable with $\mu=0$ and $\sigma=0.1$
In the next example we will generate a data with 1000 samples with white noise and selecting 90% of the data to train the model.
x_train, x_valid, y_train, y_valid = get_siso_data(
n=1000, colored_noise=False, sigma=0.0001, train_percentage=90
)
To obtain a NARMAX model we have to choose some values, e.g, the nonlinearity degree (non_degree), the maximum lag for the inputs and output (xlag and ylag).
In addition, you can select the information criteria to be used with the Error Reduction Ratio to select the model order and the method to estimate the model parameters:
- Information Criteria: aic, bic, lilc, fpe
- Parameter Estimation: least_squares, total_least_squares, recursive_least_squares, least_mean_squares and many other (see the docs)
The n_terms values is optional. It refer to the number of terms to include in the final model. You can set this value based on the information criteria (see below) or based on priori information about the model structure. The default value is n_terms=None, so the algorithm will choose the minimum value reached by the information criteria.
To use information criteria you have to set order_selection=True. You can also select n_info_values (default = 15).
basis_function = Polynomial(degree=2)
model = FROLS(
order_selection=True,
n_info_values=3,
extended_least_squares=False,
ylag=2,
xlag=2,
info_criteria="aic",
estimator="least_squares",
basis_function=basis_function,
)
C:\Users\wilso\Desktop\projects\GitHub\sysidentpy\sysidentpy\utils\deprecation.py:40: FutureWarning: Passing a string to define the estimator will rise an error in v0.4.0. You'll have to use FROLS(estimator=LeastSquares()) instead. The only change is that you'll have to define the estimator first instead of passing a string like 'least_squares'. This change will make easier to implement new estimators and it'll improve code readability. warnings.warn(message, FutureWarning, stacklevel=1)
Model Structure Selection¶
The fit method executes the Error Reduction Ratio algorithm using Househoulder reflection to select the model structure.
Enforcing keyword-only arguments in fit and predict methods as well. This is an effort to promote clear and non-ambiguous use of the library.
model.fit(X=x_train, y=y_train)
<sysidentpy.model_structure_selection.forward_regression_orthogonal_least_squares.FROLS at 0x20eb3f8ced0>
Free run simulation¶
The predict method is use to generate the predictions. For now we only support free run simulation (also known as infinity steps ahead). Soon will let the user define a one-step ahead or k-step ahead prediction.
yhat = model.predict(X=x_valid, y=y_valid)
Evaluate the model¶
In this example we use the root_relative_squared_error metric because it is often used in System Identification. More metrics and information about it can be found on documentation.
rrse = root_relative_squared_error(y_valid, yhat)
print(rrse)
0.000236625360270206
model_object.results return the selected model regressors, the estimated parameters and the ERR values. As shown below, the algorithm detect the exact model that was used for simulate the data.
r = pd.DataFrame(
results(
model.final_model,
model.theta,
model.err,
model.n_terms,
err_precision=8,
dtype="sci",
),
columns=["Regressors", "Parameters", "ERR"],
)
print(r)
Regressors Parameters ERR 0 x1(k-2) 9.0001E-01 9.56885108E-01 1 y(k-1) 2.0000E-01 3.96313039E-02 2 x1(k-1)y(k-1) 1.0001E-01 3.48355000E-03
In addition, you can access the residuals and plot_result methods to take a look at the prediction and two residual analysis. The extras and lam values below contain another residues analysis so you can plot it manually. This method will be improved soon.
plt.style.available
['Solarize_Light2', '_classic_test_patch', '_mpl-gallery', '_mpl-gallery-nogrid', 'bmh', 'classic', 'dark_background', 'fast', 'fivethirtyeight', 'ggplot', 'grayscale', 'seaborn-v0_8', 'seaborn-v0_8-bright', 'seaborn-v0_8-colorblind', 'seaborn-v0_8-dark', 'seaborn-v0_8-dark-palette', 'seaborn-v0_8-darkgrid', 'seaborn-v0_8-deep', 'seaborn-v0_8-muted', 'seaborn-v0_8-notebook', 'seaborn-v0_8-paper', 'seaborn-v0_8-pastel', 'seaborn-v0_8-poster', 'seaborn-v0_8-talk', 'seaborn-v0_8-ticks', 'seaborn-v0_8-white', 'seaborn-v0_8-whitegrid', 'tableau-colorblind10']
plot_results(
y=y_valid,
yhat=yhat,
n=1000,
title="test",
xlabel="Samples",
ylabel=r"y, $\hat{y}$",
data_color="#1f77b4",
model_color="#ff7f0e",
marker="o",
model_marker="*",
linewidth=1.5,
figsize=(10, 6),
style="seaborn-v0_8-notebook",
facecolor="white",
)
ee = compute_residues_autocorrelation(y_valid, yhat)
plot_residues_correlation(
data=ee, title="Residues", ylabel="$e^2$", style="seaborn-v0_8-notebook"
)
x1e = compute_cross_correlation(y_valid, yhat, x_valid)
plot_residues_correlation(
data=x1e, title="Residues", ylabel="$x_1e$", style="seaborn-v0_8-notebook"
)