Skip to content
Snippets Groups Projects
Commit dccc03da authored by GILSON Matthieu's avatar GILSON Matthieu
Browse files

classif rocket

parent b9755d15
No related branches found
No related tags found
No related merge requests found
%% Cell type:markdown id:d8d8bfdf-e92f-4cf1-9ef4-21531d69edb7 tags:
## Example of Rocket classifier
Adapted from http://www.sktime.net/en/latest/api_reference/auto_generated/sktime.transformations.panel.rocket.Rocket.html#sktime.transformations.panel.rocket.Rocketperformance
%% Cell type:code id:7f795133 tags:
``` python
import numpy as np
import pandas as pd
from scipy.io import arff
from sktime.datasets import load_from_arff_to_dataframe, load_basic_motions
#from sktime.transformations.panel.rocket import Rocket
#from sklearn.linear_model import RidgeClassifierCV
from sklearn.model_selection import StratifiedShuffleSplit
from sktime.classification.kernel_based import RocketClassifier
from sktime.classification.ensemble import WeightedEnsembleClassifier
import matplotlib.pyplot as plt
import seaborn as sb
```
%% Cell type:markdown id:cdf3fbae-fc89-40d1-b47c-38189a97a491 tags:
We firstly load the data.
%% Cell type:code id:04f19b56 tags:
``` python
if True:
base_dir = '../data/data_fMRI_ARCHIsoc/'
X = pd.read_hdf(base_dir+'df_ts_dataset0.hdf')
y = np.load(base_dir+'y_task_dataset0.npy')
n_sample = y.size
else:
# adapt the code for those datasets taken from https://www.timeseriesclassification.com/dataset.php
base_dir = '../data/time_series/'
dataset = 'FordA'
#dataset = 'Cricket'
#dataset = 'Phoneme'
X_train, y_train = load_from_arff_to_dataframe('{0}/{0}_TRAIN.arff'.format(dataset))
X_test, y_test = load_from_arff_to_dataframe('{0}/{0}_TEST.arff'.format(dataset))
```
%% Cell type:code id:f9a0b4f5 tags:
``` python
# example of time series for a sample and input dimension
plt.plot(X.iloc[0,0])
plt.xlabel('time')
plt.show()
```
%% Output
%% Cell type:code id:6ecc573b-dd58-4dd7-a99b-313af68d4c27 tags:
``` python
plt.figure(figsize=[8,6])
plt.axes([0.2,0.2,0.5,0.7])
plt.imshow(X.map(np.mean), aspect=0.3, interpolation='nearest')
plt.xlabel('feature')
plt.ylabel('sample')
plt.axes([0.6,0.2,0.3,0.7])
plt.imshow(y.reshape([n_sample,1]), aspect=0.2, interpolation='nearest', cmap='bwr')
plt.xticks([0],[' '])
plt.xlabel('label')
plt.show()
```
%% Output
%% Cell type:markdown id:2e5c466d-56a0-4d24-8314-cce7f3f46197 tags:
Here we use `RocketClassifier` that combines `Rocket` as a transformer and `RidgeClassifierCV` for the classification based on the transformed features (http://www.sktime.net/en/latest/api_reference/auto_generated/sktime.classification.kernel_based.RocketClassifier.html).
%% Cell type:code id:b23b1bbb tags:
``` python
# cv split scheme
cv = StratifiedShuffleSplit(n_splits=10, test_size=0.2)
# Rocket classifier
clf = RocketClassifier(num_kernels=200)
acc = pd.DataFrame(columns=['type', 'score'])
for train_ind, test_ind in cv.split(X, y):
# split data
X_train, y_train = X.iloc[train_ind], y[train_ind]
X_test, y_test = X.iloc[test_ind], y[test_ind]
# train classifier on data
clf.fit(X_train, y_train)
# accuracy on test set
d = {'type': ['train'],
'score': [clf.score(X_train, y_train)]}
acc = pd.concat((acc, pd.DataFrame(data=d)), ignore_index=True)
# accuracy on test set
d = {'type': ['test'],
'score': [clf.score(X_test, y_test)]}
acc = pd.concat((acc, pd.DataFrame(data=d)), ignore_index=True)
# shuffling
y_train_shuf = np.random.permutation(y_train)
clf.fit(X_train, y_train_shuf)
# accuracy on test set
d = {'type': ['shuf'],
'score': [clf.score(X_test, y_test)]}
acc = pd.concat((acc, pd.DataFrame(data=d)), ignore_index=True)
```
%% Output
/tmp/ipykernel_79313/724194369.py:19: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
acc = pd.concat((acc, pd.DataFrame(data=d)), ignore_index=True)
%% Cell type:code id:3872292c tags:
``` python
sb.violinplot(data=acc, x='type', y='score', hue='type', density_norm='width', palette=['brown','yellow','gray'])
sb.swarmplot(data=acc, x='type', y='score', hue='type', palette=['black','black','black'])
plt.yticks([0,1])
plt.ylabel('accuracy')
plt.axis(ymax=1.02)
plt.savefig('classif_rocket')
plt.show()
```
%% Output
%% Cell type:markdown id:387e04fb-86d0-49a4-8506-f44165557c35 tags:
Exercises:
- Evaluate the effect of the number of kernels (`num_kernels` in [10,100,1000,10000]) on the classification accuracy (train and test). More kernels correspond to more resources (projection of data in higher dimension). Note that more kernels mean longer computation time...
- Try other datasets, cf. https://www.timeseriesclassification.com/dataset.php.
- Try other classifiers, cf. http://www.sktime.net/en/latest/api_reference/classification.html.
For example, the CNN relies on kernels that are trained to extract information from the time series. For the CNN, the number and size of kernels (xxx) are resources for the optimization, like `num_kernels` for `RocketClassifier`. You can thus compare the accuracy of `CNNClassifier` and `RocketClassifier`, as well as the speed for training.
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment