Skip to content
Snippets Groups Projects
Commit 22501f0b authored by GILSON Matthieu's avatar GILSON Matthieu
Browse files

Upload New File

parent 09ef3612
Branches
No related tags found
No related merge requests found
%% Cell type:code id:866dca52 tags:
``` python
import numpy as np
import scipy.spatial.distance as ssd
import sklearn.discriminant_analysis as sda
import sklearn.metrics as sm
import matplotlib.pyplot as plt
```
%% Cell type:markdown id:16450559 tags:
We generate samples that are divides in groups, each following a certain distribution.
%% Cell type:code id:a4614df7 tags:
``` python
# sample properties
d = 4 # number of dimensions
c = 3 # number of classes
n = 30 # number of samples per class
# properties of groups (TO DO: play with variance scaling)
# means
m_gp = np.zeros([c,d])
m_gp[:,0] = np.linspace(0,1,c) # difference in mean between groups along 1st dimension
# standard deviations
std_gp = np.ones([c,d])
std_gp[:,0] *= 0.2 # along dimension 0
std_gp[:,1:] *= 2.0 # along dimension 1 (and higher if there are)
# samples in d dimensions and labels
x = np.zeros([n*c,d])
x_label = np.zeros([n*c])
for i_c in range(c):
for i_n in range(n):
x[i_c*n+i_n,:] = m_gp[i_c,:] + np.random.randn(d) * std_gp[i_c,:]
x_label[i_c*n:(i_c+1)*n] = i_c
```
%% Cell type:markdown id:4c7d3b5d tags:
Let visualize the data with a plot in a 2-dimension plane (projection).
%% Cell type:code id:bfb881fd tags:
``` python
# dimensions to plot
d0 = 0
d1 = 1
# colors and markers for plots
mks = ['x','o','v','s']
cols = ['r','g','b','m']
plt.figure()
for i_c in range(c):
plt.scatter(x[i_c*n:(i_c+1)*n,d0], x[i_c*n:(i_c+1)*n,d1], marker=mks[i_c], color=cols[i_c])
plt.xlabel('dim {}'.format(d0))
plt.ylabel('dim {}'.format(d1))
plt.show()
```
%% Output
%% Cell type:code id:3c1eeb2c tags:
``` python
# dimensions to plot
d0 = 2
d1 = 3
plt.figure()
for i_c in range(c):
plt.scatter(x[i_c*n:(i_c+1)*n,d0], x[i_c*n:(i_c+1)*n,d1], marker=mks[i_c], color=cols[i_c])
plt.xlabel('dim {}'.format(d0))
plt.ylabel('dim {}'.format(d1))
plt.show()
```
%% Output
%% Cell type:markdown id:f93e41a9 tags:
Depending on the chosen dimensions for the plot, the sample clouds appear separated or overlapping...
Only the first dimension is informative to discriminate the classes; the other dimensions are not.
The variance is thus larger for the non-informative dimenions.
%% Cell type:code id:d888443d tags:
``` python
# principal component analysis: first two components
D,U = np.linalg.eig(np.corrcoef(x,rowvar=False)) # D is vector of eigenvalues, U is passage matrix with columns = eigenvectors
ord_eig = np.argsort(np.real(D))[::-1] # sort eigenvalues from largest positive to smallest negative
D = D[ord_eig]
U = U[:,ord_eig]
# new coordinates along 2 first components
y_pca = np.real(np.dot(U.T,x.T))[:2,:].T
# linear discriminant analysis
lda = sda.LinearDiscriminantAnalysis(n_components=2)
y_lda = lda.fit(x, x_label).transform(x)
# PCA space (1st and 2nd principal components)
plt.figure()
for i_c in range(c):
plt.scatter(y_pca[i_c*n:(i_c+1)*n,0], y_pca[i_c*n:(i_c+1)*n,1], marker=mks[i_c], color=cols[i_c])
plt.xticks(fontsize=8)
plt.yticks(fontsize=8)
plt.xlabel('PC1',fontsize=8)
plt.ylabel('PC2',fontsize=8)
plt.title('PCA space',fontsize=8)
# LDA space
plt.figure()
for i_c in range(c):
plt.scatter(y_lda[i_c*n:(i_c+1)*n,0], y_lda[i_c*n:(i_c+1)*n,1], marker=mks[i_c], color=cols[i_c])
plt.xticks(fontsize=8)
plt.yticks(fontsize=8)
plt.xlabel('C1',fontsize=8)
plt.ylabel('C2',fontsize=8)
plt.title('LDA space',fontsize=8)
plt.show()
```
%% Output
%% Cell type:markdown id:d3d0a622 tags:
PCA projects in the directions of largest variance, but doesn't care about classes labels (unsupervised learning).
LDA finds a suitable direction to separate the classes (supervised learning).
%% Cell type:markdown id:e5c261d2 tags:
## Exercices
- check other synthetic datasets for clustering: https://scikit-learn.org/stable/datasets/sample_generators.html#generators-for-classification-and-clustering
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment