In this example I will handle the EEG data
data retrieved from: https://www.kaggle.com/nnair25/Alcoholics
import os, glob
import pandas as pd
import matplotlib.pyplot as plt
import mne
reading in the EEG data shows that we have thousands of rows so it is wisest to check the different sensor positions to see how many different types we are actually seeing on this.
df = pd.read_csv('archive (1)/SMNI_CMI_TRAIN/Data99.csv')
df
Unnamed: 0 | trial number | sensor position | sample num | sensor value | subject identifier | matching condition | channel | name | time | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 5 | 12 | FP1 | 0 | -3.174 | a | S1 obj | 0 | co2a0000369 | 0.000000 |
1 | 6 | 12 | FP1 | 1 | -0.732 | a | S1 obj | 0 | co2a0000369 | 0.003906 |
2 | 7 | 12 | FP1 | 2 | 3.174 | a | S1 obj | 0 | co2a0000369 | 0.007812 |
3 | 8 | 12 | FP1 | 3 | 7.080 | a | S1 obj | 0 | co2a0000369 | 0.011719 |
4 | 9 | 12 | FP1 | 4 | 10.010 | a | S1 obj | 0 | co2a0000369 | 0.015625 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
16379 | 16447 | 12 | Y | 251 | 10.447 | a | S1 obj | 63 | co2a0000369 | 0.980469 |
16380 | 16448 | 12 | Y | 252 | 11.424 | a | S1 obj | 63 | co2a0000369 | 0.984375 |
16381 | 16449 | 12 | Y | 253 | 10.935 | a | S1 obj | 63 | co2a0000369 | 0.988281 |
16382 | 16450 | 12 | Y | 254 | 9.959 | a | S1 obj | 63 | co2a0000369 | 0.992188 |
16383 | 16451 | 12 | Y | 255 | 8.494 | a | S1 obj | 63 | co2a0000369 | 0.996094 |
16384 rows × 10 columns
We can check where the sensor position is located using the .unique method on the dataframe column called 'sensor position' This shows us we are dealing with multiple different areas that data is being processed from!
df['sensor position'].unique()
array(['FP1', 'FP2', 'F7', 'F8', 'AF1', 'AF2', 'FZ', 'F4', 'F3', 'FC6', 'FC5', 'FC2', 'FC1', 'T8', 'T7', 'CZ', 'C3', 'C4', 'CP5', 'CP6', 'CP1', 'CP2', 'P3', 'P4', 'PZ', 'P8', 'P7', 'PO2', 'PO1', 'O2', 'O1', 'X', 'AF7', 'AF8', 'F5', 'F6', 'FT7', 'FT8', 'FPZ', 'FC4', 'FC3', 'C6', 'C5', 'F2', 'F1', 'TP8', 'TP7', 'AFZ', 'CP3', 'CP4', 'P5', 'P6', 'C1', 'C2', 'PO7', 'PO8', 'FCZ', 'POZ', 'OZ', 'P2', 'P1', 'CPZ', 'nd', 'Y'], dtype=object)
df2 = df[df['sensor position'] == 'FP1']
Here we declare a variable info that uses mne.create_info() to use the number of channels and the sfreq to describe the properties of the EEG data. The original dataset on Kaggle claims the electrodes were sampled at 256 Hz (3.9-msec epoch) for 1 second, so we can set our sfreq to 256.
info = mne.create_info([str(i) for i in df['channel'].unique()], sfreq=256)
print(info)
<Info | 7 non-empty values bads: [] ch_names: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, ... chs: 64 MISC custom_ref_applied: False highpass: 0.0 Hz lowpass: 128.0 Hz meas_date: unspecified nchan: 64 projs: [] sfreq: 256.0 Hz >
Here we can pivot the table to have it indexed by time, sensor position as our columns and values being the sensor values! This will allow us to plot all the sensor positions overlapping!
df = df.pivot(index='time', columns='sensor position', values='sensor value')
df
sensor position | AF1 | AF2 | AF7 | AF8 | AFZ | C1 | C2 | C3 | C4 | C5 | ... | PO8 | POZ | PZ | T7 | T8 | TP7 | TP8 | X | Y | nd |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
time | |||||||||||||||||||||
0.000000 | -3.316 | -3.642 | -3.510 | -2.899 | -2.625 | 1.017 | 1.750 | 2.035 | 1.373 | 3.977 | ... | -0.682 | 7.426 | 8.230 | 6.226 | -1.567 | 8.779 | -2.960 | -1.383 | 1.658 | 9.176 |
0.003906 | -1.363 | -2.177 | -1.068 | -1.923 | -1.160 | 1.994 | 1.261 | 3.499 | 0.885 | 5.442 | ... | 1.272 | 7.914 | 6.765 | 8.667 | -1.078 | 12.197 | -2.472 | -1.383 | 4.588 | 11.617 |
0.007812 | 1.567 | 0.753 | 3.326 | 0.031 | 2.258 | 2.970 | 0.773 | 4.964 | 0.397 | 8.372 | ... | 2.248 | 8.403 | 5.788 | 12.085 | 0.387 | 15.615 | -0.519 | 1.058 | 8.982 | 12.594 |
0.011719 | 5.473 | 3.682 | 7.721 | 2.472 | 5.188 | 3.459 | 0.285 | 5.941 | 0.397 | 9.837 | ... | 3.225 | 8.403 | 4.812 | 15.015 | 3.316 | 17.568 | 2.411 | 5.452 | 12.400 | 12.105 |
0.015625 | 7.914 | 6.612 | 10.651 | 4.425 | 7.629 | 3.947 | -0.203 | 5.941 | -0.092 | 10.325 | ... | 6.154 | 7.426 | 3.347 | 15.991 | 5.758 | 17.080 | 4.852 | 9.847 | 14.353 | 9.664 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
0.980469 | 6.938 | 5.636 | 10.162 | 6.866 | 7.629 | 5.412 | 5.168 | 5.941 | 8.698 | 5.931 | ... | 3.713 | 0.102 | -1.048 | 9.155 | 16.988 | 8.779 | 14.618 | 15.706 | 10.447 | 0.387 |
0.984375 | 8.403 | 8.077 | 12.115 | 9.308 | 10.071 | 4.924 | 4.679 | 4.964 | 7.721 | 5.442 | ... | 0.295 | -2.340 | -3.001 | 9.155 | 16.988 | 7.802 | 13.641 | 20.589 | 11.424 | -0.590 |
0.988281 | 9.379 | 9.542 | 11.627 | 10.773 | 11.047 | 4.435 | 4.679 | 3.988 | 6.744 | 4.466 | ... | -0.682 | -2.340 | -3.489 | 8.667 | 16.012 | 7.314 | 12.665 | 23.031 | 10.935 | 1.851 |
0.992188 | 8.403 | 9.542 | 9.674 | 10.773 | 10.559 | 3.947 | 4.191 | 3.011 | 5.768 | 3.977 | ... | -0.682 | -0.875 | -2.513 | 8.667 | 13.570 | 8.291 | 10.223 | 21.566 | 9.959 | 5.269 |
0.996094 | 6.449 | 8.565 | 7.233 | 10.773 | 9.094 | 3.459 | 3.215 | 2.035 | 4.303 | 3.489 | ... | -0.682 | 0.590 | -1.536 | 8.179 | 10.640 | 8.779 | 6.805 | 17.660 | 8.494 | 6.246 |
256 rows × 64 columns
Here we can see the column size, if we want to plot all of the EEG data size by side then the image would be very crowded and not tell us much! Lets filter it down to the first 4 instead for the purposes of demonstration.
df.columns.size
64
Here I create a figure and make a counter variable, using the column names i then iterate through the columns as I'm adding subplots to the positions of the counter variable. I then plot the index and the current column of the dataframe, and assign the color to orange. Setting the axis and legend is followed by increasing the counter and calling plt.tight_layout() for improved aesthetics and readability. When the counter is equal to 5 the loop breaks and the figure is shown. This cell shows understanding of how to manipulate EEG data when plotting.
fig = plt.figure()
counter = 1
for i in df.columns:
ax = fig.add_subplot(2,2,counter)
ax.plot(df.index, df[i], color='orange')
ax.set(xlabel="time", ylabel='sensor value')
ax.legend([i], loc='upper right')
counter += 1
plt.tight_layout()
if counter == 5:
break
plt.show()