Spampinato mice retina mea252ch pair recording - 2¶

Part 2) Ground-truth comparison¶

This set of notebooks the dataset is from paired juxtacellular/extracellular recordings from mice retina in vitro. The MEA has 252 channels.

The official publication of this open dataset can be found at the following address: https://zenodo.org/record/1205233#.W9mq1HWLTIF

These datasets were used by Pierre Yger et al in the following "spyking circus" paper: https://elifesciences.org/articles/34518

After inspecting the juxta-cellular data, we found that some recordings don't have a good enough quality to be considered as "ground truth". To be "ground truth", a unit is required to be stable in the detection, peak signal-to-noise ratio (SNR) and amplitude.

At the end of our quality assessment ("spampinato-mice-retina-mea252ch-pair-recording-part1"), some files are removed for this main study.

Author: Samuel Garcia, CRNL, Lyon

Requirements¶

For this need you will need the following Python packages:

numpy
pandas
matplotlib
seaborn
spikeinterface

To run the MATLAB-based sorters, you would also need a MATLAB license. For other sorters, please refer to the documentation on how to install sorters.

Installation and imports¶

to create a folder basedir.
download all files from the zenodo link
move them in a subfolder basedir/original_files (20160415_patch2.tar.gz, ...)
and then execute cells by cell this notebook

In [16]:

# import everything
import os, getpass
import re

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from pathlib import Path


import spikeinterface as si
import spikeinterface.extractors as se
import spikeinterface.sorters as ss
import spikeinterface.widgets as sw

from spikeinterface.comparison import GroundTruthStudy

In [ ]:

# clone and install MATLAB sorters

# kilosort2
!git clone https://github.com/MouseLand/Kilosort2.git
kilosort2_path = './Kilosort2'
ss.Kilosort2Sorter.set_kilosort2_path(kilosort2_path)

# kilosort
!git clone https://github.com/cortex-lab/KiloSort.git
kilosort_path = './KiloSort'
ss.KilosortSorter.set_kilosort_path(kilosort_path)

# ironclust
!git clone https://github.com/flatironinstitute/ironclust.git
ironclust_path = './ironclust'
ss.IronclustSorter.set_ironclust_path(ironclust_path)

In [17]:

si.print_spikeinterface_version()
ss.print_sorter_versions()

spikeinterface: 0.9.9
  * spikeextractor: 0.8.4
  * spiketoolkit: 0.6.3
  * spikesorters: 0.3.3
  * spikecomparison: 0.2.6
  * spikewidgets: 0.4.3

herdingspikes: 0.3.7+git.45665a2b6438
ironclust: 5.9.8
kilosort: git-cd040da1963d
kilosort2: git-e243c934339e
spykingcircus: 0.9.7
tridesclous: 1.6.1.dev

In [18]:

# my working path
basedir = '/media/samuel/dataspikesorting/DataSpikeSortingHD2/Pierre/zenodo/'

# input file
recording_folder = basedir + 'original_files/'

# ground truth information
ground_truth_folder = basedir + 'ground_truth/'

# where output will be
study_folder = basedir + 'study_gt252/'

# sorter list
sorter_list = ['tridesclous']

# selected recordings (8/19)
rec_names = [
    '20160415_patch2',
    '20170803_patch1',
    '20160426_patch3', 
    '20170725_patch1',
    '20170621_patch1',
    '20160426_patch2', 
    '20170728_patch2',
    '20170713_patch1',
]

In [19]:

%matplotlib inline

# some matplotlib hack to prettify figure
SMALL_SIZE = 12
MEDIUM_SIZE = 14
BIGGER_SIZE = 16

plt.rc('font', size=SMALL_SIZE)          # controls default text sizes
plt.rc('axes', titlesize=SMALL_SIZE)     # fontsize of the axes title
plt.rc('axes', labelsize=MEDIUM_SIZE)    # fontsize of the x and y labels
plt.rc('xtick', labelsize=SMALL_SIZE)    # fontsize of the tick labels
plt.rc('ytick', labelsize=SMALL_SIZE)    # fontsize of the tick labels
plt.rc('legend', fontsize=SMALL_SIZE)    # legend fontsize
plt.rc('figure', titlesize=BIGGER_SIZE)  # fontsize of the figure title

def clear_axes(ax):
    ax.spines['top'].set_visible(False)
    ax.spines['right'].set_visible(False)

Step 1 : Clean original dataset¶

Setup study¶

In this step:

we create a dict of (recordings+sorting)
and call the function setup_comparison_study(study_folder, gt_dict)

Internally, spikeinterface copies all recordings and ground-truth sorting into an organised folder.

Important note:

the file have 256 channels but only 252 are useful. The PRB file contains all channels needed so we need to explicitly use grouping_property='group' to be sure to only take in account the channels in the unique group.
This step has to run only once and takes some times because of copy.

In [ ]:

gt_dict = {}
for rec_name in rec_names:

    # find raw file
    dirname = recording_folder + rec_name + '/'
    for f in os.listdir(dirname):
        if f.endswith('.raw') and not f.endswith('juxta.raw'):
            mea_filename = dirname + f

    # raw files have an internal offset that depend on the channel count
    # a simple built header can be parsed to get it
    with open(mea_filename.replace('.raw', '.txt'), mode='r') as f:
        offset = int(re.findall(r'padding = (\d+)', f.read())[0])

    # recording
    rec = se.BinDatRecordingExtractor(mea_filename, 20000., 256, 'uint16', offset=offset, time_axis=0)

    # this reduce channel count to 252
    rec = se.load_probe_file(rec, basedir + 'mea_256.prb')

    # gt sorting
    gt_indexes = np.fromfile(ground_truth_folder + rec_name + '/juxta_peak_indexes.raw', dtype='int64')
    sorting_gt = se.NumpySortingExtractor()
    sorting_gt.set_times_labels(gt_indexes, np.zeros(gt_indexes.size, dtype='int64'))
    sorting_gt.set_sampling_frequency(20000.0)

    gt_dict[rec_name] = (rec, sorting_gt)


study = GroundTruthStudy.create(study_folder, gt_dict)

Get signal to noise ratio for all units¶

In [20]:

study = GroundTruthStudy(study_folder)
snr = study.concat_all_snr()
snr

Out[20]:

		snr
rec_name	gt_unit_id
20160415_patch2	0	7.647741
20170803_patch1	0	13.230232
20160426_patch3	0	7.270211
20170725_patch1	0	28.532387
20170621_patch1	0	15.133425
20160426_patch2	0	4.661795
20170728_patch2	0	23.652615
20170713_patch1	0	14.934497

In [21]:

fig, ax = plt.subplots()
ax.hist(snr['snr'].values, bins=np.arange(0, 40, 5))
ax.set_xlabel('GT units SNR')

Out[21]:

Text(0.5, 0, 'GT units SNR')

Run all sorters¶

In [22]:

sorter_list = ['herdingspikes', 'ironclust', 'kilosort2',  
               'spykingcircus', 'tridesclous']

In [ ]:

study = GroundTruthStudy(study_folder)

study.run_sorters(sorter_list, mode='keep', verbose=False)

Run comparison with ground truth and retreive result tables¶

In [23]:

# this copy sorting is necessary to copy results from sorter
# into a centralize folder with all results
study.copy_sortings()

In [24]:

# this run all comparison to GT
# exhaustive_gt=False because it is a pair recording so only one GT units
study.run_comparisons(exhaustive_gt=False, match_score=0.1, overmerged_score=0.2)

In [25]:

# this retrieve results
comparisons = study.comparisons
dataframes = study.aggregate_dataframes()

In [26]:

## Run times

In [27]:

dataframes['run_times'].set_index(['rec_name', 'sorter_name']).unstack('sorter_name')

Out[27]:

	run_time
sorter_name	herdingspikes	ironclust	kilosort2	spykingcircus	tridesclous
rec_name
20160415_patch2	142.888554	108.045776	229.796090	745.985181	391.462280
20160426_patch2	85.068679	78.040375	50.278513	315.495001	109.498236
20160426_patch3	72.955193	65.033961	46.270309	265.166210	96.272756
20170621_patch1	162.408083	118.569352	190.309174	603.793946	343.447162
20170713_patch1	169.336252	133.652825	246.374550	799.341534	464.072922
20170725_patch1	152.863248	101.050823	201.603548	710.756077	390.276237
20170728_patch2	150.883882	99.282168	150.312809	487.173840	309.182291
20170803_patch1	148.814701	114.053511	216.422934	753.743665	433.630640

In [28]:

sns.set_palette(sns.color_palette("Set1"))

#fig, ax = plt.subplots()
g = sns.catplot(data=dataframes['run_times'], x='sorter_name', y='run_time',
                hue="rec_name",
                order=sorter_list,
                kind='bar',
               legend=False)
g.fig.set_size_inches(12,5)

#ax.set_yscale('log')
#ax.set_ylabel('Run time (s)');
#ax.set_xlabel(None);
fig.axes[0].set_xticklabels(sorter_list, rotation=40, ha='right');

accuracy/precision/recall scores per sorters¶

In [ ]:

In [29]:

#fig, ax = plt.subplots(figsize=(12,4))
#fig.tight_layout()

sns.set_palette(sns.color_palette("Set1"))

df = pd.melt(dataframes['perf_by_units'], id_vars=['rec_name', 'sorter_name'],
            var_name='metric', value_name='score', 
            value_vars=('accuracy','precision', 'recall'))
display(df)

# df.assign(Generation=df.metric.map({'metric':'Metric','accuracy': 'Accuracy', 'precision': 'Precision', 'recall': 'Recall'}))
g = sns.catplot(data=df, x='sorter_name', y='score', hue='metric', kind='swarm', dodge=True, # ax=ax, 
                order=sorter_list, legend_out=True, s=4)
g.fig.set_size_inches(12,5)

#plt.close(p.fig)

#ax.legend(['Accuracy', 'Precision', 'Recall'], labelspacing=0.2,
#          bbox_to_anchor=(1, 0.5), loc=2, borderaxespad=0., frameon=False)
#ax.set_xticklabels(sorter_list, rotation=30, ha='center')
#ax.set_xlabel(None);
#ax.set_ylabel('Score');
#clear_axes(ax)

	rec_name	sorter_name	metric	score
0	20160426_patch3	herdingspikes	accuracy	0.694297
1	20170725_patch1	herdingspikes	accuracy	0.919708
2	20170713_patch1	spykingcircus	accuracy	0.933659
3	20160426_patch3	kilosort2	accuracy	0.703873
4	20160426_patch2	tridesclous	accuracy	0.426621
...	...	...	...	...
115	20170803_patch1	spykingcircus	recall	0.970153
116	20160415_patch2	spykingcircus	recall	0.937962
117	20170728_patch2	herdingspikes	recall	0.994103
118	20170728_patch2	kilosort2	recall	0.998736
119	20170713_patch1	herdingspikes	recall	0.973463

120 rows × 4 columns

In [30]:

sns.set_palette(sns.color_palette("deep"))
g = sns.relplot(data=dataframes['perf_by_units'], x='precision', y='recall', col='sorter_name', 
                col_wrap=3, col_order=sorter_list, s=60)

Accuracy vns SNR¶

In [31]:

df = dataframes['perf_by_units']
df

Out[31]:

	rec_name	sorter_name	accuracy	recall	precision	false_discovery_rate	miss_rate
0	20160426_patch3	herdingspikes	0.694297	0.698403	0.991604	0.00839631	0.301597
1	20170725_patch1	herdingspikes	0.919708	0.994737	0.924205	0.0757946	0.00526316
2	20170713_patch1	spykingcircus	0.933659	0.933659	1	0	0.0663413
3	20160426_patch3	kilosort2	0.703873	0.999409	0.704167	0.295833	0.000591366
4	20160426_patch2	tridesclous	0.426621	0.426621	1	0	0.573379
5	20160415_patch2	ironclust	0.683278	0.987194	0.689388	0.310612	0.0128059
6	20170621_patch1	kilosort2	0.411829	0.985783	0.414289	0.585711	0.0142171
7	20170803_patch1	kilosort2	0.998299	0.998822	0.999476	0.000523972	0.00117816
8	20170713_patch1	tridesclous	0.988047	0.995882	0.9921	0.00790033	0.00411774
9	20170728_patch2	ironclust	0.999789	1	0.999789	0.000210571	0
10	20170713_patch1	kilosort2	0.998627	0.998627	1	0	0.00137258
11	20170621_patch1	herdingspikes	0.897158	0.960753	0.931289	0.0687112	0.0392471
12	20160426_patch2	spykingcircus	0.993174	0.993174	1	0	0.00682594
13	20170725_patch1	kilosort2	1	1	1	0	0
14	20170621_patch1	spykingcircus	0.984591	0.985182	0.999391	0.000609385	0.0148178
15	20160426_patch3	ironclust	0.956397	0.985807	0.96975	0.0302501	0.0141928
16	20170725_patch1	tridesclous	0.994737	0.994737	1	0	0.00526316
17	20160415_patch2	herdingspikes	0.694369	0.803643	0.836245	0.163755	0.196357
18	20160426_patch2	herdingspikes	0	0	0	0	0
19	20160426_patch3	tridesclous	0.77836	0.791248	0.979502	0.0204978	0.208752
20	20160415_patch2	kilosort2	0.973838	0.995731	0.977921	0.0220794	0.00426864
21	20170621_patch1	tridesclous	0.975485	0.995995	0.979327	0.0206734	0.00400481
22	20160415_patch2	tridesclous	0.692271	0.818156	0.818156	0.181844	0.181844
23	20170725_patch1	ironclust	0.979058	0.984211	0.994681	0.00531915	0.0157895
24	20160426_patch3	spykingcircus	0.977568	0.979302	0.998192	0.00180832	0.0206978
25	20170803_patch1	ironclust	0.997775	0.998167	0.999607	0.000393288	0.0018327
26	20170803_patch1	tridesclous	0.989504	0.999607	0.989889	0.0101115	0.000392722
27	20160426_patch2	kilosort2	0.997725	0.997725	1	0	0.00227531
28	20160426_patch2	ironclust	0.993197	0.996587	0.996587	0.00341297	0.00341297
29	20170713_patch1	ironclust	0.94115	0.941437	0.999676	0.000323887	0.0585634
30	20170803_patch1	herdingspikes	0.939895	0.960073	0.978128	0.0218725	0.0399267
31	20170728_patch2	spykingcircus	0.852991	0.852991	1	0	0.147009
32	20170621_patch1	ironclust	0.914097	0.997197	0.916452	0.083548	0.00280336
33	20170728_patch2	tridesclous	1	1	1	0	0
34	20170725_patch1	spykingcircus	0.952632	0.952632	1	0	0.0473684
35	20170803_patch1	spykingcircus	0.970153	0.970153	1	0	0.0298468
36	20160415_patch2	spykingcircus	0.916064	0.937962	0.975148	0.0248521	0.0620376
37	20170728_patch2	herdingspikes	0.981085	0.994103	0.986828	0.0131716	0.00589722
38	20170728_patch2	kilosort2	0.998736	0.998736	1	0	0.00126369
39	20170713_patch1	herdingspikes	0.968295	0.973463	0.994547	0.00545341	0.0265365

In [32]:

df = dataframes['perf_by_units']

# add snr to the by-unit table
df['snr'] = None
for rec_name, gt_id in snr.index:
    df['snr'].loc[(df['gt_unit_id']==gt_id) & (df['rec_name']==rec_name)] = snr.at[(rec_name, gt_id), 'snr']

sns.set_palette(sns.color_palette("deep"))
g = sns.relplot(data=df, x='snr', y='accuracy', col='sorter_name',
        col_wrap=3, col_order=sorter_list, s=80)

/home/samuel/.virtualenvs/py37/lib/python3.7/site-packages/pandas-1.0.3-py3.7-linux-x86_64.egg/pandas/core/indexing.py:671: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._setitem_with_indexer(indexer, value)