NFIR Example¶

Example created by Wilson Rocha Lacerda Junior

This example shows how to use SysIdentPy to build NFIR models. NFIR models are models with no output feedback. In other words, there are no $y(k-n_y)$ regressors, only $x(k-n_x)$.

The NFIR model can be described as:

$$ y_k= F^\ell[x_{k-d}, x_{k-d-1}, \dotsc, x_{k-d-n_x}, e_{k-1}, \dotsc, e_{k-n_e}] + e_k $$

where $n_x \in \mathbb{N}$ is the maximum lag for the system input; $x_k \in \mathbb{R}^{n_x}$ is the system input at discrete time $k \in \mathbb{N}^n$; $e_k \in \mathbb{R}^{n_e}$ stands for uncertainties and possible noise at discrete time $k$. In this case, $\mathcal{F}^\ell$ is some nonlinear function of the input regressors with nonlinearity degree $\ell \in \mathbb{N}$ and $d$ is a time delay typically set to $d=1$.

It is important to note that NFIR model size is generally significantly higher compared to their counterpart NARMAX model size. This drawback can be noted in linear models dimensionality and it leads to even more complex scenarios in the nonlinear case.

So, if you are looking for parsimonious and compact models, consider using NARMAX models. However, when comparing NFIR and NARMAX models, it's generally more challenging to establish stability, particularly in a control-oriented context, with NARMAX models than with NFIR models.

In [ ]:

  Copied!     
 
pip install sysidentpy
pip install sysidentpy

In [1]:

  Copied!     
 
import pandas as pd
import numpy as np
from sysidentpy.utils.generate_data import get_siso_data
from sysidentpy.metrics import root_relative_squared_error
from sysidentpy.basis_function._basis_function import Polynomial, Fourier
from sysidentpy.utils.display_results import results
from sysidentpy.utils.plotting import plot_residues_correlation, plot_results
from sysidentpy.residues.residues_correlation import (
    compute_residues_autocorrelation,
    compute_cross_correlation,
)
from sysidentpy.model_structure_selection import AOLS, FROLS
import pandas as pd import numpy as np from sysidentpy.utils.generate_data import get_siso_data from sysidentpy.metrics import root_relative_squared_error from sysidentpy.basis_function._basis_function import Polynomial, Fourier from sysidentpy.utils.display_results import results from sysidentpy.utils.plotting import plot_residues_correlation, plot_results from sysidentpy.residues.residues_correlation import ( compute_residues_autocorrelation, compute_cross_correlation, ) from sysidentpy.model_structure_selection import AOLS, FROLS

NFIR x NARMAX

We will reproduce the same example provided in the "presenting main functionality section". In that example, we use a NARX model with xlag and ylag equal to 2 and a nonlinearity degree equal to 2. That resulted in a model with 3 regressors and a RRSE (validation metric) equal to $0.000184$

Regressors	Parameters	ERR
x1(k-2)	9.0001E-01	9.57505011E-01
y(k-1)	2.0001E-01	3.89117583E-02
x1(k-1)y(k-1)	9.9992E-02	3.58319976E-03

So, what happens if I use a NFIR model with the same configuration?¶

In [2]:

  Copied!     
 
np.random.seed(seed=42)
# generating simulated data
x_train, x_test, y_train, y_test = get_siso_data(
    n=1000, colored_noise=False, sigma=0.001, train_percentage=90
)

basis_function = Polynomial(degree=2)
model = FROLS(
    order_selection=True,
    xlag=2,
    info_criteria="aic",
    estimator="least_squares",
    basis_function=basis_function,
    model_type="NFIR",
)

model.fit(X=x_train, y=y_train)
yhat = model.predict(X=x_test, y=y_test)
rrse = root_relative_squared_error(y_test, yhat)
print(rrse)

r = pd.DataFrame(
    results(
        model.final_model,
        model.theta,
        model.err,
        model.n_terms,
        err_precision=8,
        dtype="sci",
    ),
    columns=["Regressors", "Parameters", "ERR"],
)
print(r)

plot_results(y=y_test, yhat=yhat, n=1000)
np.random.seed(seed=42) # generating simulated data x_train, x_test, y_train, y_test = get_siso_data( n=1000, colored_noise=False, sigma=0.001, train_percentage=90 ) basis_function = Polynomial(degree=2) model = FROLS( order_selection=True, xlag=2, info_criteria="aic", estimator="least_squares", basis_function=basis_function, model_type="NFIR", ) model.fit(X=x_train, y=y_train) yhat = model.predict(X=x_test, y=y_test) rrse = root_relative_squared_error(y_test, yhat) print(rrse) r = pd.DataFrame( results( model.final_model, model.theta, model.err, model.n_terms, err_precision=8, dtype="sci", ), columns=["Regressors", "Parameters", "ERR"], ) print(r) plot_results(y=y_test, yhat=yhat, n=1000)

C:\Users\wilso\Desktop\projects\GitHub\sysidentpy\sysidentpy\utils\deprecation.py:40: FutureWarning: Passing a string to define the estimator will rise an error in v0.4.0. 
 You'll have to use FROLS(estimator=LeastSquares()) instead. 
 The only change is that you'll have to define the estimator first instead of passing a string like 'least_squares'. 
 This change will make easier to implement new estimators and it'll improve code readability.
  warnings.warn(message, FutureWarning, stacklevel=1)
C:\Users\wilso\Desktop\projects\GitHub\sysidentpy\sysidentpy\model_structure_selection\forward_regression_orthogonal_least_squares.py:589: UserWarning: n_info_values is greater than the maximum number of all regressors space considering the chosen y_lag, u_lag, and non_degree. We set as 6
  self.info_values = self.information_criterion(reg_matrix, y)

0.2129700627690414
  Regressors  Parameters             ERR
0    x1(k-2)  8.9017E-01  9.55432286E-01

No description has been provided for this image

In the NFIR case, we got a model with 1 regressor, but with significantly worse RRSE ($0.21$)

Regressors	Parameters	ERR
x1(k-2)	8.9017E-01	9.55432286E-01

So, to get a better NFIR model, we have to set a higher order model. In other words, we have to set a higher maximum lag to build the model.

Lets set xlag=3.

In [3]:

  Copied!     
 
np.random.seed(seed=42)
# generating simulated data
x_train, x_test, y_train, y_test = get_siso_data(
    n=1000, colored_noise=False, sigma=0.001, train_percentage=90
)

basis_function = Polynomial(degree=2)
model = FROLS(
    order_selection=True,
    xlag=3,
    info_criteria="aic",
    estimator="least_squares",
    basis_function=basis_function,
    model_type="NFIR",
)

model.fit(X=x_train, y=y_train)
yhat = model.predict(X=x_test, y=y_test)
rrse = root_relative_squared_error(y_test, yhat)
print(rrse)

r = pd.DataFrame(
    results(
        model.final_model,
        model.theta,
        model.err,
        model.n_terms,
        err_precision=8,
        dtype="sci",
    ),
    columns=["Regressors", "Parameters", "ERR"],
)
print(r)

plot_results(y=y_test, yhat=yhat, n=1000)
np.random.seed(seed=42) # generating simulated data x_train, x_test, y_train, y_test = get_siso_data( n=1000, colored_noise=False, sigma=0.001, train_percentage=90 ) basis_function = Polynomial(degree=2) model = FROLS( order_selection=True, xlag=3, info_criteria="aic", estimator="least_squares", basis_function=basis_function, model_type="NFIR", ) model.fit(X=x_train, y=y_train) yhat = model.predict(X=x_test, y=y_test) rrse = root_relative_squared_error(y_test, yhat) print(rrse) r = pd.DataFrame( results( model.final_model, model.theta, model.err, model.n_terms, err_precision=8, dtype="sci", ), columns=["Regressors", "Parameters", "ERR"], ) print(r) plot_results(y=y_test, yhat=yhat, n=1000)

0.04314951932710626
       Regressors  Parameters             ERR
0         x1(k-2)  8.9980E-01  9.55367779E-01
1         x1(k-3)  1.7832E-01  3.94348076E-02
2  x1(k-3)x1(k-1)  9.1104E-02  3.33315478E-03

C:\Users\wilso\Desktop\projects\GitHub\sysidentpy\sysidentpy\utils\deprecation.py:40: FutureWarning: Passing a string to define the estimator will rise an error in v0.4.0. 
 You'll have to use FROLS(estimator=LeastSquares()) instead. 
 The only change is that you'll have to define the estimator first instead of passing a string like 'least_squares'. 
 This change will make easier to implement new estimators and it'll improve code readability.
  warnings.warn(message, FutureWarning, stacklevel=1)

Now, the model have 3 regressors, but the RRSE is still worse ($0.04$).

Regressors	Parameters	ERR
x1(k-2)	8.9980E-01	9.55367779E-01
x1(k-3)	1.7832E-01	3.94348076E-02
x1(k-3)x1(k-1)	9.1104E-02	3.33315478E-03

Lets set xlag=5.

In [4]:

  Copied!     
 
np.random.seed(seed=42)
# generating simulated data
x_train, x_test, y_train, y_test = get_siso_data(
    n=1000, colored_noise=False, sigma=0.001, train_percentage=90
)

basis_function = Polynomial(degree=2)
model = FROLS(
    order_selection=True,
    xlag=5,
    info_criteria="aic",
    estimator="least_squares",
    basis_function=basis_function,
    model_type="NFIR",
)

model.fit(X=x_train, y=y_train)
yhat = model.predict(X=x_test, y=y_test)
rrse = root_relative_squared_error(y_test, yhat)
print(rrse)

r = pd.DataFrame(
    results(
        model.final_model,
        model.theta,
        model.err,
        model.n_terms,
        err_precision=8,
        dtype="sci",
    ),
    columns=["Regressors", "Parameters", "ERR"],
)
print(r)

plot_results(y=y_test, yhat=yhat, n=1000)
np.random.seed(seed=42) # generating simulated data x_train, x_test, y_train, y_test = get_siso_data( n=1000, colored_noise=False, sigma=0.001, train_percentage=90 ) basis_function = Polynomial(degree=2) model = FROLS( order_selection=True, xlag=5, info_criteria="aic", estimator="least_squares", basis_function=basis_function, model_type="NFIR", ) model.fit(X=x_train, y=y_train) yhat = model.predict(X=x_test, y=y_test) rrse = root_relative_squared_error(y_test, yhat) print(rrse) r = pd.DataFrame( results( model.final_model, model.theta, model.err, model.n_terms, err_precision=8, dtype="sci", ), columns=["Regressors", "Parameters", "ERR"], ) print(r) plot_results(y=y_test, yhat=yhat, n=1000)

0.004209451216121233
       Regressors  Parameters             ERR
0         x1(k-2)  8.9978E-01  9.55485306E-01
1         x1(k-3)  1.7979E-01  3.93181813E-02
2  x1(k-3)x1(k-1)  8.9706E-02  3.33141271E-03
3         x1(k-4)  3.5772E-02  1.54789285E-03
4  x1(k-4)x1(k-2)  1.7615E-02  1.09675506E-04
5  x1(k-4)x1(k-1)  1.7871E-02  1.13215338E-04
6         x1(k-5)  6.9594E-03  6.23773643E-05
7  x1(k-5)x1(k-1)  4.1353E-03  6.10794551E-06
8  x1(k-5)x1(k-3)  3.4007E-03  3.98364615E-06
9  x1(k-5)x1(k-2)  2.9798E-03  3.42693984E-06

C:\Users\wilso\Desktop\projects\GitHub\sysidentpy\sysidentpy\utils\deprecation.py:40: FutureWarning: Passing a string to define the estimator will rise an error in v0.4.0. 
 You'll have to use FROLS(estimator=LeastSquares()) instead. 
 The only change is that you'll have to define the estimator first instead of passing a string like 'least_squares'. 
 This change will make easier to implement new estimators and it'll improve code readability.
  warnings.warn(message, FutureWarning, stacklevel=1)

Now the RRSE is closer to the NARMAX model, but the NFIR model have 10 regressors. So, as mentioned before, the order of NFIR models is generally higher than NARMAX model to get comparable results.

Here's the markdown code for the provided table:

Regressors	Parameters	ERR
x1(k-2)	8.9978E-01	9.55485306E-01
x1(k-3)	1.7979E-01	3.93181813E-02
x1(k-3)x1(k-1)	8.9706E-02	3.33141271E-03
x1(k-4)	3.5772E-02	1.54789285E-03
x1(k-4)x1(k-2)	1.7615E-02	1.09675506E-04
x1(k-4)x1(k-1)	1.7871E-02	1.13215338E-04
x1(k-5)	6.9594E-03	6.23773643E-05
x1(k-5)x1(k-1)	4.1353E-03	6.10794551E-06
x1(k-5)x1(k-3)	3.4007E-03	3.98364615E-06
x1(k-5)x1(k-2)	2.9798E-03	3.42693984E-06

xlag = 35

In [5]:

  Copied!     
 
np.random.seed(seed=42)
# generating simulated data
x_train, x_test, y_train, y_test = get_siso_data(
    n=1000, colored_noise=False, sigma=0.001, train_percentage=90
)

basis_function = Polynomial(degree=2)
model = FROLS(
    order_selection=True,
    xlag=35,
    n_info_values=200,
    info_criteria="aic",
    estimator="recursive_least_squares",
    basis_function=basis_function,
    model_type="NFIR",
)

model.fit(X=x_train, y=y_train)
yhat = model.predict(X=x_test, y=y_test)
rrse = root_relative_squared_error(y_test, yhat)
print(rrse)

r = pd.DataFrame(
    results(
        model.final_model,
        model.theta,
        model.err,
        model.n_terms,
        err_precision=8,
        dtype="sci",
    ),
    columns=["Regressors", "Parameters", "ERR"],
)
print(r)

plot_results(y=y_test, yhat=yhat, n=1000)
np.random.seed(seed=42) # generating simulated data x_train, x_test, y_train, y_test = get_siso_data( n=1000, colored_noise=False, sigma=0.001, train_percentage=90 ) basis_function = Polynomial(degree=2) model = FROLS( order_selection=True, xlag=35, n_info_values=200, info_criteria="aic", estimator="recursive_least_squares", basis_function=basis_function, model_type="NFIR", ) model.fit(X=x_train, y=y_train) yhat = model.predict(X=x_test, y=y_test) rrse = root_relative_squared_error(y_test, yhat) print(rrse) r = pd.DataFrame( results( model.final_model, model.theta, model.err, model.n_terms, err_precision=8, dtype="sci", ), columns=["Regressors", "Parameters", "ERR"], ) print(r) plot_results(y=y_test, yhat=yhat, n=1000)

C:\Users\wilso\Desktop\projects\GitHub\sysidentpy\sysidentpy\utils\deprecation.py:40: FutureWarning: Passing a string to define the estimator will rise an error in v0.4.0. 
 You'll have to use FROLS(estimator=LeastSquares()) instead. 
 The only change is that you'll have to define the estimator first instead of passing a string like 'least_squares'. 
 This change will make easier to implement new estimators and it'll improve code readability.
  warnings.warn(message, FutureWarning, stacklevel=1)

0.0038421247000040536
          Regressors   Parameters             ERR
0            x1(k-2)   8.9996E-01  9.55386378E-01
1            x1(k-3)   1.8005E-01  3.94178379E-02
2     x1(k-3)x1(k-1)   9.0796E-02  3.32874170E-03
3            x1(k-4)   3.5468E-02  1.54540871E-03
4     x1(k-4)x1(k-2)   1.8637E-02  1.12751104E-04
5     x1(k-4)x1(k-1)   1.8337E-02  1.13878189E-04
6            x1(k-5)   7.0057E-03  6.27406151E-05
7     x1(k-5)x1(k-1)   4.3851E-03  6.32909200E-06
8     x1(k-5)x1(k-3)   3.8095E-03  3.95779051E-06
9     x1(k-5)x1(k-2)   4.8296E-03  3.39231220E-06
10           x1(k-6)   1.0914E-03  2.37202762E-06
11    x1(k-6)x1(k-2)   2.4944E-03  3.65196792E-07
12    x1(k-6)x1(k-3)   9.8303E-04  2.92529290E-07
13    x1(k-6)x1(k-1)   7.9133E-04  2.55676107E-07
14    x1(k-6)x1(k-4)   4.4939E-04  3.03449240E-07
15         x1(k-3)^2   1.1930E-03  2.23371798E-07
16   x1(k-33)x1(k-2)   1.0851E-03  1.83491782E-07
17   x1(k-11)x1(k-3)   2.0589E-03  1.65999771E-07
18   x1(k-26)x1(k-1)  -1.8883E-04  1.59339882E-07
19   x1(k-14)x1(k-8)  -9.4483E-04  1.36917637E-07
20   x1(k-18)x1(k-9)   5.2852E-04  1.43200990E-07
21  x1(k-29)x1(k-22)   2.9389E-04  1.32772541E-07
22   x1(k-35)x1(k-4)  -3.7559E-04  1.37651677E-07
23  x1(k-19)x1(k-17)  -2.9479E-04  1.40605871E-07
24  x1(k-33)x1(k-28)  -1.3863E-03  1.21220227E-07
25  x1(k-20)x1(k-12)   3.9743E-04  1.24584593E-07

Now the RRSE is closer to the NARMAX model, but the NFIR model have 25 regressors. So, as you can check in these examples, the order of NFIR models is generally higher than NARMAX model to get comparable results.

Here's the markdown code for the provided table:

Regressors	Parameters	ERR
x1(k-2)	8.9996E-01	9.55386378E-01
x1(k-3)	1.8005E-01	3.94178379E-02
x1(k-3)x1(k-1)	9.0796E-02	3.32874170E-03
x1(k-4)	3.5468E-02	1.54540871E-03
x1(k-4)x1(k-2)	1.8637E-02	1.12751104E-04
x1(k-4)x1(k-1)	1.8337E-02	1.13878189E-04
x1(k-5)	7.0057E-03	6.27406151E-05
x1(k-5)x1(k-1)	4.3851E-03	6.32909200E-06
x1(k-5)x1(k-3)	3.8095E-03	3.95779051E-06
x1(k-5)x1(k-2)	4.8296E-03	3.39231220E-06
x1(k-6)	1.0914E-03	2.37202762E-06
x1(k-6)x1(k-2)	2.4944E-03	3.65196792E-07
x1(k-6)x1(k-3)	9.8303E-04	2.92529290E-07
x1(k-6)x1(k-1)	7.9133E-04	2.55676107E-07
x1(k-6)x1(k-4)	4.4939E-04	3.03449240E-07
x1(k-3)^2	1.1930E-03	2.23371798E-07
x1(k-33)x1(k-2)	1.0851E-03	1.83491782E-07
x1(k-11)x1(k-3)	2.0589E-03	1.65999771E-07
x1(k-26)x1(k-1)	-1.8883E-04	1.59339882E-07
x1(k-14)x1(k-8)	-9.4483E-04	1.36917637E-07
x1(k-18)x1(k-9)	5.2852E-04	1.43200990E-07
x1(k-29)x1(k-22)	2.9389E-04	1.32772541E-07
x1(k-35)x1(k-4)	-3.7559E-04	1.37651677E-07
x1(k-19)x1(k-17)	-2.9479E-04	1.40605871E-07
x1(k-33)x1(k-28)	-1.3863E-03	1.21220227E-07
x1(k-20)x1(k-12)	3.9743E-04	1.24584593E-07