PV forecasting benchmark¶

Example created by Wilson Rocha Lacerda Junior

Note¶

The following example is not intended to say that one library is better than another. The main focus of these examples is to show that SysIdentPy can be a good alternative for people looking to model time series.

We will compare the results obtained against neural prophet library.

For the sake of brevity, from SysIdentPy only the MetaMSS, AOLS and FROLS (with polynomial base function) methods will be used. See the SysIdentPy documentation to learn other ways of modeling with the library.

We will compare a 1-step ahead forecaster on solar irradiance data (that can be a proxy for solar PV production). The config of the neuralprophet model was taken from the neuralprophet documentation (https://neuralprophet.com/html/example_links/energy_data_example.html)

The training will occur on 80% of the data, reserving the last 20% for the validation.

Note: the data used in this example can be found in neuralprophet github.

In [1]:

  Copied!     
 
from warnings import simplefilter
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

from sysidentpy.model_structure_selection import FROLS
from sysidentpy.model_structure_selection import AOLS
from sysidentpy.model_structure_selection import MetaMSS
from sysidentpy.basis_function import Polynomial
from sysidentpy.utils.plotting import plot_results
from sysidentpy.neural_network import NARXNN
from sysidentpy.metrics import mean_squared_error

from sktime.datasets import load_airline
from neuralprophet import NeuralProphet
from neuralprophet import set_random_seed


simplefilter("ignore", FutureWarning)
np.seterr(all="ignore")

%matplotlib inline

loss = mean_squared_error

data_location = r".\datasets"
from warnings import simplefilter import matplotlib.pyplot as plt import numpy as np import pandas as pd from sysidentpy.model_structure_selection import FROLS from sysidentpy.model_structure_selection import AOLS from sysidentpy.model_structure_selection import MetaMSS from sysidentpy.basis_function import Polynomial from sysidentpy.utils.plotting import plot_results from sysidentpy.neural_network import NARXNN from sysidentpy.metrics import mean_squared_error from sktime.datasets import load_airline from neuralprophet import NeuralProphet from neuralprophet import set_random_seed simplefilter("ignore", FutureWarning) np.seterr(all="ignore") %matplotlib inline loss = mean_squared_error data_location = r".\datasets"

c:\Users\wilso\miniconda3\envs\neural_prophet\lib\site-packages\sktime\datatypes\_series\_check.py:43: FutureWarning: pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead.
  VALID_INDEX_TYPES = (pd.Int64Index, pd.RangeIndex, pd.PeriodIndex, pd.DatetimeIndex)
c:\Users\wilso\miniconda3\envs\neural_prophet\lib\site-packages\sktime\datatypes\_panel\_check.py:45: FutureWarning: pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead.
  VALID_INDEX_TYPES = (pd.Int64Index, pd.RangeIndex, pd.PeriodIndex, pd.DatetimeIndex)
c:\Users\wilso\miniconda3\envs\neural_prophet\lib\site-packages\sktime\datatypes\_panel\_check.py:46: FutureWarning: pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead.
  VALID_MULTIINDEX_TYPES = (pd.Int64Index, pd.RangeIndex)

FROLS¶

In [2]:

  Copied!     
 
files = ["\SanFrancisco_PV_GHI.csv", "\SanFrancisco_Hospital.csv"]
raw = pd.read_csv(data_location + files[0])
df = pd.DataFrame()
df["ds"] = pd.date_range("1/1/2015 1:00:00", freq=str(60) + "Min", periods=(8760))
df["y"] = raw.iloc[:, 0].values

df_train, df_val = df.iloc[:7008, :], df.iloc[7008:, :]

y = df["y"].values.reshape(-1, 1)
y_train = df_train["y"].values.reshape(-1, 1)
y_test = df_val["y"].values.reshape(-1, 1)

x_train = df_train["ds"].dt.hour.values.reshape(-1, 1)
x_test = df_val["ds"].dt.hour.values.reshape(-1, 1)

basis_function = Polynomial(degree=1)
sysidentpy = FROLS(
    order_selection=True,
    ylag=24,
    xlag=24,
    info_criteria="bic",
    estimator="recursive_least_squares",
    basis_function=basis_function,
    model_type="NARMAX",
)
sysidentpy.fit(X=x_train, y=y_train)
x_test = np.concatenate([x_train[-sysidentpy.max_lag :], x_test])
y_test = np.concatenate([y_train[-sysidentpy.max_lag :], y_test])

yhat = sysidentpy.predict(X=x_test, y=y_test, steps_ahead=1)
sysidentpy_loss = loss(
    pd.Series(y_test.flatten()[sysidentpy.max_lag :]),
    pd.Series(yhat.flatten()[sysidentpy.max_lag :]),
)
print(sysidentpy_loss)


plot_results(y=y_test[-104:], yhat=yhat[-104:])
files = ["\SanFrancisco_PV_GHI.csv", "\SanFrancisco_Hospital.csv"] raw = pd.read_csv(data_location + files[0]) df = pd.DataFrame() df["ds"] = pd.date_range("1/1/2015 1:00:00", freq=str(60) + "Min", periods=(8760)) df["y"] = raw.iloc[:, 0].values df_train, df_val = df.iloc[:7008, :], df.iloc[7008:, :] y = df["y"].values.reshape(-1, 1) y_train = df_train["y"].values.reshape(-1, 1) y_test = df_val["y"].values.reshape(-1, 1) x_train = df_train["ds"].dt.hour.values.reshape(-1, 1) x_test = df_val["ds"].dt.hour.values.reshape(-1, 1) basis_function = Polynomial(degree=1) sysidentpy = FROLS( order_selection=True, ylag=24, xlag=24, info_criteria="bic", estimator="recursive_least_squares", basis_function=basis_function, model_type="NARMAX", ) sysidentpy.fit(X=x_train, y=y_train) x_test = np.concatenate([x_train[-sysidentpy.max_lag :], x_test]) y_test = np.concatenate([y_train[-sysidentpy.max_lag :], y_test]) yhat = sysidentpy.predict(X=x_test, y=y_test, steps_ahead=1) sysidentpy_loss = loss( pd.Series(y_test.flatten()[sysidentpy.max_lag :]), pd.Series(yhat.flatten()[sysidentpy.max_lag :]), ) print(sysidentpy_loss) plot_results(y=y_test[-104:], yhat=yhat[-104:])

3755.242530079209

No description has been provided for this image

MetaMSS¶

In [3]:

  Copied!     
 
set_random_seed(42)
files = ["\SanFrancisco_PV_GHI.csv", "\SanFrancisco_Hospital.csv"]
raw = pd.read_csv(data_location + files[0])
df = pd.DataFrame()
df["ds"] = pd.date_range("1/1/2015 1:00:00", freq=str(60) + "Min", periods=(8760))
df["y"] = raw.iloc[:, 0].values

x_train, x_test, x_validation, _ = np.split(df["ds"].dt.hour.values, [5000, 7008, 8760])
y_train, y_test, y_validation, _ = np.split(df["y"].values, [5000, 7008, 8760])

x_train = x_train.reshape(-1, 1)
x_test = x_test.reshape(-1, 1)
x_validation = x_validation.reshape(-1, 1)

y_train = y_train.reshape(-1, 1)
y_test = y_test.reshape(-1, 1)
y_validation = y_validation.reshape(-1, 1)

basis_function = Polynomial(degree=1)
sysidentpy_metamss = MetaMSS(
    basis_function=basis_function,
    norm=-2,
    xlag=24,
    ylag=24,
    estimator="least_squares",
    k_agents_percent=2,
    estimate_parameter=True,
    maxiter=10,
    steps_ahead=1,
    n_agents=15,
    p_value=0.05,
    loss_func="metamss_loss",
    p_ones=0.5,
    p_zeros=0.5,
    model_type="NARMAX",
    random_state=42
)
sysidentpy_metamss.fit(X=x_train, X_test=x_test, y=y_train, y_test=y_test)
x_validation = np.concatenate([x_test[-sysidentpy_metamss.max_lag :], x_validation])
y_validation = np.concatenate([y_test[-sysidentpy_metamss.max_lag :], y_validation])

yhat = sysidentpy_metamss.predict(X=x_validation, y=y_validation, steps_ahead=1)
metamss_loss = loss(
    pd.Series(y_validation.flatten()[sysidentpy_metamss.max_lag :]),
    pd.Series(yhat.flatten()[sysidentpy_metamss.max_lag :]),
)
print(metamss_loss)


plot_results(y=y_validation[-104:], yhat=yhat[-104:])
set_random_seed(42) files = ["\SanFrancisco_PV_GHI.csv", "\SanFrancisco_Hospital.csv"] raw = pd.read_csv(data_location + files[0]) df = pd.DataFrame() df["ds"] = pd.date_range("1/1/2015 1:00:00", freq=str(60) + "Min", periods=(8760)) df["y"] = raw.iloc[:, 0].values x_train, x_test, x_validation, _ = np.split(df["ds"].dt.hour.values, [5000, 7008, 8760]) y_train, y_test, y_validation, _ = np.split(df["y"].values, [5000, 7008, 8760]) x_train = x_train.reshape(-1, 1) x_test = x_test.reshape(-1, 1) x_validation = x_validation.reshape(-1, 1) y_train = y_train.reshape(-1, 1) y_test = y_test.reshape(-1, 1) y_validation = y_validation.reshape(-1, 1) basis_function = Polynomial(degree=1) sysidentpy_metamss = MetaMSS( basis_function=basis_function, norm=-2, xlag=24, ylag=24, estimator="least_squares", k_agents_percent=2, estimate_parameter=True, maxiter=10, steps_ahead=1, n_agents=15, p_value=0.05, loss_func="metamss_loss", p_ones=0.5, p_zeros=0.5, model_type="NARMAX", random_state=42 ) sysidentpy_metamss.fit(X=x_train, X_test=x_test, y=y_train, y_test=y_test) x_validation = np.concatenate([x_test[-sysidentpy_metamss.max_lag :], x_validation]) y_validation = np.concatenate([y_test[-sysidentpy_metamss.max_lag :], y_validation]) yhat = sysidentpy_metamss.predict(X=x_validation, y=y_validation, steps_ahead=1) metamss_loss = loss( pd.Series(y_validation.flatten()[sysidentpy_metamss.max_lag :]), pd.Series(yhat.flatten()[sysidentpy_metamss.max_lag :]), ) print(metamss_loss) plot_results(y=y_validation[-104:], yhat=yhat[-104:])

2203.92644661679

AOLS¶

In [4]:

  Copied!     
 
set_random_seed(42)
files = ["\SanFrancisco_PV_GHI.csv", "\SanFrancisco_Hospital.csv"]
raw = pd.read_csv(data_location + files[0])
df = pd.DataFrame()
df["ds"] = pd.date_range("1/1/2015 1:00:00", freq=str(60) + "Min", periods=(8760))
df["y"] = raw.iloc[:, 0].values

df_train, df_val = df.iloc[:7008, :], df.iloc[7008:, :]

y = df["y"].values.reshape(-1, 1)
y_train = df_train["y"].values.reshape(-1, 1)
y_test = df_val["y"].values.reshape(-1, 1)

x_train = df_train["ds"].dt.hour.values.reshape(-1, 1)
x_test = df_val["ds"].dt.hour.values.reshape(-1, 1)
basis_function = Polynomial(degree=1)
sysidentpy_AOLS = AOLS(
    ylag=24, xlag=24, k=2, L=1, model_type="NARMAX", basis_function=basis_function
)
sysidentpy_AOLS.fit(X=x_train, y=y_train)
x_test = np.concatenate([x_train[-sysidentpy_AOLS.max_lag :], x_test])
y_test = np.concatenate([y_train[-sysidentpy_AOLS.max_lag :], y_test])

yhat = sysidentpy_AOLS.predict(X=x_test, y=y_test, steps_ahead=1)
aols_loss = loss(
    pd.Series(y_test.flatten()[sysidentpy_AOLS.max_lag :]),
    pd.Series(yhat.flatten()[sysidentpy_AOLS.max_lag :]),
)
print(aols_loss)


plot_results(y=y_test[-104:], yhat=yhat[-104:])
set_random_seed(42) files = ["\SanFrancisco_PV_GHI.csv", "\SanFrancisco_Hospital.csv"] raw = pd.read_csv(data_location + files[0]) df = pd.DataFrame() df["ds"] = pd.date_range("1/1/2015 1:00:00", freq=str(60) + "Min", periods=(8760)) df["y"] = raw.iloc[:, 0].values df_train, df_val = df.iloc[:7008, :], df.iloc[7008:, :] y = df["y"].values.reshape(-1, 1) y_train = df_train["y"].values.reshape(-1, 1) y_test = df_val["y"].values.reshape(-1, 1) x_train = df_train["ds"].dt.hour.values.reshape(-1, 1) x_test = df_val["ds"].dt.hour.values.reshape(-1, 1) basis_function = Polynomial(degree=1) sysidentpy_AOLS = AOLS( ylag=24, xlag=24, k=2, L=1, model_type="NARMAX", basis_function=basis_function ) sysidentpy_AOLS.fit(X=x_train, y=y_train) x_test = np.concatenate([x_train[-sysidentpy_AOLS.max_lag :], x_test]) y_test = np.concatenate([y_train[-sysidentpy_AOLS.max_lag :], y_test]) yhat = sysidentpy_AOLS.predict(X=x_test, y=y_test, steps_ahead=1) aols_loss = loss( pd.Series(y_test.flatten()[sysidentpy_AOLS.max_lag :]), pd.Series(yhat.flatten()[sysidentpy_AOLS.max_lag :]), ) print(aols_loss) plot_results(y=y_test[-104:], yhat=yhat[-104:])

2361.561682547365

Neural Prophet¶

In [6]:

  Copied!     
 
set_random_seed(42)

# set_log_level("ERROR")
files = ["\SanFrancisco_PV_GHI.csv", "\SanFrancisco_Hospital.csv"]
raw = pd.read_csv(data_location + files[0])
df = pd.DataFrame()
df["ds"] = pd.date_range("1/1/2015 1:00:00", freq=str(60) + "Min", periods=(8760))
df["y"] = raw.iloc[:, 0].values

m = NeuralProphet(
    n_lags=24,
    ar_sparsity=0.5,
    # num_hidden_layers = 2,
    # d_hidden=20,
)
metrics = m.fit(df, freq="H", valid_p=0.2)

df_train, df_val = m.split_df(df, valid_p=0.2)
m.test(df_val)

future = m.make_future_dataframe(df_val, n_historic_predictions=True)
forecast = m.predict(future)
# fig = m.plot(forecast)
print(loss(forecast["y"][24:-1], forecast["yhat1"][24:-1]))
set_random_seed(42) # set_log_level("ERROR") files = ["\SanFrancisco_PV_GHI.csv", "\SanFrancisco_Hospital.csv"] raw = pd.read_csv(data_location + files[0]) df = pd.DataFrame() df["ds"] = pd.date_range("1/1/2015 1:00:00", freq=str(60) + "Min", periods=(8760)) df["y"] = raw.iloc[:, 0].values m = NeuralProphet( n_lags=24, ar_sparsity=0.5, # num_hidden_layers = 2, # d_hidden=20, ) metrics = m.fit(df, freq="H", valid_p=0.2) df_train, df_val = m.split_df(df, valid_p=0.2) m.test(df_val) future = m.make_future_dataframe(df_val, n_historic_predictions=True) forecast = m.predict(future) # fig = m.plot(forecast) print(loss(forecast["y"][24:-1], forecast["yhat1"][24:-1]))

WARNING: nprophet - fit: Parts of code may break if using other than daily data.

04-26 20:16:52 - WARNING - Parts of code may break if using other than daily data.

INFO: nprophet.utils - set_auto_seasonalities: Disabling yearly seasonality. Run NeuralProphet with yearly_seasonality=True to override this.

04-26 20:16:52 - INFO - Disabling yearly seasonality. Run NeuralProphet with yearly_seasonality=True to override this.

INFO: nprophet.config - set_auto_batch_epoch: Auto-set batch_size to 32

04-26 20:16:52 - INFO - Auto-set batch_size to 32

INFO: nprophet.config - set_auto_batch_epoch: Auto-set epochs to 7

04-26 20:16:52 - INFO - Auto-set epochs to 7

 87%|████████▋ | 87/100 [00:00<00:00, 617.37it/s]
INFO: nprophet - _lr_range_test: learning rate range test found optimal lr: 1.23E-01

04-26 20:16:52 - INFO - learning rate range test found optimal lr: 1.23E-01

Epoch[7/7]: 100%|██████████| 7/7 [00:02<00:00,  2.61it/s, SmoothL1Loss=0.00415, MAE=58.8, RegLoss=0.0112]
INFO: nprophet - _evaluate: Validation metrics:    SmoothL1Loss    MAE
1         0.003 48.746

04-26 20:16:55 - INFO - Validation metrics:    SmoothL1Loss    MAE
1         0.003 48.746
4642.234763049609

In [7]:

  Copied!     
 
plt.plot(forecast["y"][-104:], "ro-")
plt.plot(forecast["yhat1"][-104:], "k*-")
plt.plot(forecast["y"][-104:], "ro-") plt.plot(forecast["yhat1"][-104:], "k*-")

Out[7]:

[<matplotlib.lines.Line2D at 0x2618e76ebe0>]