Introduction tutorial


The following tutorial gives a basic introduction to the data formats used by Neural Decoding Toolbox (NDT) and shows how to run a simple decoding analysis. The tutorial is based on a dataset collected by Ying Zhang in Bob Desimone’s lab at MIT which is downloaded here.

Overview of the NDT

Neural decoding is a process in which a pattern classifier learns the relationship between neural activity and experimental conditions using a training set of data. The reliability of the relationship between the neural activity and experimental conditions is evaluated by having the classifier predict what experimental conditions were present on a second test set of data.

The NDT is built around 4 different object classes that allow users to apply neural decoding in a flexible and robust way. The four types of objects are:

  1. Datasources (DS) which generate training and test splits of the data.
  2. Feature preprocessors (FP) which apply preprocessing to the training and test splits.
  3. Classifiers (CL) which learn the relationship between experimental conditions and data on the training set, and then predict experimental conditions on the test data.
  4. Cross-validators (CV) which take the DS, FP and CL objects and run a cross-validation decoding procedure.

The NDT comes with a few implementations of each of these objects, and defines interfaces that allow one to create new objects that extend the basic functionality of the four object classes. More information about the design of the NDT and these four objects classes can be found here.

The following tutorial explains the data formats used by the Neural Decoding Toolbox, and how to run a decoding experiment using the basic versions of the four object classes.

About the data

The data used in this tutorial was collected by Ying Zhang in Bob Desimone’s lab at MIT and was used in the supplemental figures in the paper Object decoding with attention in inferior temporal cortex, PNAS, 2011. The data consists of single unit recordings from the 132 neurons in inferior temporal cortex (IT). The recordings were made while a monkey viewed 7 different objects that were presented at three different locations (the monkey was also shown images that consisted of three objects shown simultaneously and had to perform an attention task, however for the purposes of this tutorial we are only going to analyze data from trials when single objects were shown). Each object was presented approximately 20 times at each of the three locations. The data can be downloaded here.

Adding the toolbox path

Before using any of the functions in the NDT, the path must be set so that Matlab knows where to find these functions. The function add_ndt_paths_and_init_rand_generator adds the path and the appropriate directories that contain the different NDT functions. Additionally, this function initializes the random number generator (to the current time on the CPU’s clock) so that each time the toolbox is used a different sequence of random numbers will be generated (by default Matlab uses the same seed to initialize the random number generator, which leads to the same sequence of random numbers every time Matlab is started). The following lines show how to use add_ndt_paths_and_init_rand_generator:

% add the path to the NDT so add_ndt_paths_and_init_rand_generator can be called
toolbox_basedir_name = 'ndt.1.0.4/'
addpath(toolbox_basedir_name);

% add the NDT paths using add_ndt_paths_and_init_rand_generator
add_ndt_paths_and_init_rand_generator

Data formats

In order to use the NDT, the neural data must be in a usable format. Typically this involves putting the data in raster-format and then converting it to binned-format using the create_binned_data_from_raster_data function that is found in the tools directory. Information about these data formats is described here.

Raster format

To run a decoding analysis using the NDT you first need to have your data in a usable format. In this tutorial we will use data collected by Ying Zhang in Bob Desimone’s lab at MIT. The directory Zhang_Desimone_7objects_raster_data/ contains data in raster-format. Each file in this directory contains data from one neuron. To start, let us load one of these files and examine its contents by typing the command:

load bp1021spk_04B_raster_data.mat

Data that is in raster-format contains three variables: raster_site_info, raster_labels, and raster_data. The variable raster_data is a matrix where each row corresponds to the data from one trial, and each column corresponds to data from one time point (the rows are also in order so that the first trial is in the first row, and the last trial is in the last row). Because we are dealing with neural spiking data in this tutorial each column in the matrix that was just loaded corresponds to a time when a spike occurred. We can view the spike rasters from each trial and a peri-stimulus time histogram (PSTH) of the data by typing the following commands:

% view the rasters from one neuron
subplot(1, 2, 1)
imagesc(~raster_data); colormap gray
line([500 500], get(gca, 'YLim'), 'color', [1 0 0]);
ylabel('Trials')
xlabel('Time (ms)')
title('rasters')

% view the PSTH for one neuron
subplot(1, 2, 2)
bar(sum(raster_data));
line([500 500], get(gca, 'YLim'), 'color', [1 0 0]);
ylabel('Number of spikes')
xlabel('Time (ms)')
title('PSTH')

From looking at the PSTH, one can see that this cell increased its firing rate shortly after stimulus onset (the stimulus onset was at 500 ms).

The structure raster_labels is a structure that contains cell arrays that lists the experimental conditions that were present on each trial (each cell array has as many entries as there are rows in the raster_data matrix, so that there is an experimental condition label for each trial). For example, the variable raster_labels.stimulus_ID contains the labels for which of the 7 stimuli was shown on each trial, and the variable raster_labels.stimulus_position contains the position where the stimulus was shown.

The structure rater_site_info contains any additional information about the recording site that the experimenter wants to record. For example, one could keep a record of the quality of the spike sorting isolation in this structure, or information about the position of where the neuron was recorded relative to a grid system used, etc.. For the purposes of this tutorial we will ignore this structure.

Binning the data

The NDT decoding objects operate on data that is in binned-format. To convert data in raster-format to binned-format, we can use the tool create_binned_data_from_raster_data, which calculates the average firing rate of neurons over specified intervals and sampled with a specified frequency (i.e., a boxcar filter is used). create_binned_data_from_raster_data takes in four arguments: 1) the name of the directory where the raster-format data is stored, 2) the name (potentially including a directory) that the binned data should be saved as, 3) a bin size that specifies how much time the firing rates should be calculated over, and 4) a sampling interval that specifies how frequently to calculate these firing rates. To calculate the average firing rates in 150 ms bins sampled every 50 ms, the following commands can be used:

raster_file_directory_name = 'Zhang_Desimone_7objects_raster_data/'
save_prefix_name = 'Binned_Zhang_Desimone_7object_data';
bin_width = 150;
step_size = 50;

create_binned_data_from_raster_data(raster_file_directory_name, save_prefix_name, bin_width, step_size);

The output of this function will be a file called Binned_Zhang_Desimone_7object_data_150ms_bins_50ms_sampled.mat. Data in binned-format has similar fields to data in raster-format except that data from all the neurons are now grouped together into single structures. The three variables for binned-format data are: 1) the_data{} which is a cell array where each entry contains a [num_trials x num_bins] matrix of data, which is a binned version of the raster_data for each neuron; 2) binned_labels which is a structure that contains cell array for the labels for each neuron, and 3) binned_site_info which contains all the extra info for each neuron.

Determining number of condition repetitions

Before beginning the decoding analysis it is useful to know how many times each experimental condition (e.g., stimulus) was presented to each site (e.g., neuron). In particular, it is useful to know how many times the condition that has the fewest repetitions was presented. To do this we will use the tool find_sites_with_k_label_repetitions which finds all sites that have at least k repetitions using data that is in binned-format. Below we count the number of sites with k repetitions for different numbers of k, and store them in the variable num_sites_with_k_repeats.

% load the binned data
load Binned_Zhang_Desimone_7object_data_150ms_bins_50ms_sampled.mat

for k = 1:65
    inds_of_sites_with_at_least_k_repeats = find_sites_with_k_label_repetitions(binned_labels.stimulus_ID, k);
    num_sites_with_k_repeats(k) = length(inds_of_sites_with_at_least_k_repeats);
end

Based on these results we see that all of the 132 sites have 59 repetitions of all 7 of the stimuli, and that 125 sites have 60 repetitions of all 7 stimuli. This information is useful when deciding how many cross-validations splits to use, as described below.

Running the analysis

Performing a decoding analyses involves several steps:

  1. creating a datasource (DS) object that generates training and test splits of the data.
  2. optionally creating feature-preprocessor (FP) objects that learn parameters from the training data, and preprocess the training and test data.
  3. creating a classifier (CL) object that learns the relationship between the training data and training labels, and then evaluates the strength of this relationship on the test data.
  4. running a cross-validator object that using the datasource (DS), the feature-preprocessor (FP) and the classifier (CL) objects to do a cross-validation procedure that estimates the decoding accuracy.

Below we describe how to create and run these objects on the Zhang-Desimone dataset.

Datasources (DS)

A datasource object is used by the cross-validator to generate training and test splits of the data. Below we create a basic_DS object that takes binned-format data, a cell array of labels, and a scalar that specifies how many cross-validation splits to use. The default behavior of this datasource is to create test splits that have one example of each object in them and num_cv_splits - 1 examples of each object in the training set.

As calculated above, all 132 neurons have 59 repetitions of each stimulus, and 125 neurons have 60 repetitions of each stimulus. Thus we can use up to 59 cross-validation splits using all neurons, or we could set the datasource to use only a subset of neurons and use 60 cross-validation splits. For the purpose of this tutorial, we will use all the neurons and only 20 cross-validation splits (to make the code run a little faster). The basic_DS datasource object also has many more properties that can be set, including specifying that only certain labels or neurons should be used. More information about this object can be found here.

% the name of the file that has the data in binned-format
binned_format_file_name = 'Binned_Zhang_Desimone_7object_data_150ms_bins_50ms_sampled.mat'

% will decode the identity of which object was shown (regardless of its position)
specific_label_name_to_use = 'stimulus_ID';

num_cv_splits = 20;

ds = basic_DS(binned_format_file_name, specific_label_name_to_use, num_cv_splits)

Feature-preprocessors (FP)

Feature preprocessors use the training set to learn particular parameters about the data, and then applying some preprocessing to the training and test sets using these parameters. Below will we create a zscore_normalize_FP that zscore normalizes the data so that each neuron’s activity has approximately zero mean and a standard deviation of 1 over all trials. This feature-preprocessor is useful so that neurons with high firing rates do not end up contributing more to the decoding results than neurons with lower firing rates when a max_correlation_coefficient_CL is used.

% create a feature preprocessor that z-score normalizes each neuron

% note that the FP objects are stored in a cell array 
% which allows multiple FP objects to be used in one analysis

the_feature_preprocessors{1} = zscore_normalize_FP;

Classifiers (CL)

Classifiers take a “training set” of data and learn the relationship between the neural responses and the experimental conditions (labels) that were present on particular trials. The classifier is then used to make predictions about what experimental conditions are present on trials from a different “test set” of neural data. Below we create a max_correlation_coefficient_CL classifier which learns prototypes of each class k that consists of the mean of all training data from class k. The predicted class for a new test point x is the class that has the maximum correlation coefficient value (i.e., the smallest angle) between the x and each class prototype.

% create the CL object
the_classifier = max_correlation_coefficient_CL;

Cross-validators (CV)

Cross-validator objects take a datasource, a classifier and optionally feature-preprocessor objects and run a decoding procedure by generating training and test data from the datasource, preprocessing this data with the feature-preprocessors and then training and testing the classifier on the resulting data. This procedure is run in two nested loops. The inner ‘cross-validation’ loop runs a cross-validation procedure where the classifier is trained and tested on different divisions of the data. The outer, ‘resample’ loop generates new splits (and also potentially pseudo-populations) of data, which are then run in a cross-validation procedure by the inner loop. Below we create a standard_resample_CV object that runs this decoding procedure.

% create the CV object
the_cross_validator = standard_resample_CV(ds, the_classifier, the_feature_preprocessors);

% set how many times the outer 'resample' loop is run
% generally we use more than 2 resample runs which will give more accurate results
% but to save time in this tutorial we are using a small number.

the_cross_validator.num_resample_runs = 2;

Running the decoding

To run the decoding procedure we call the cross-validator’s run_cv_decoding method, and the results are saved to a structure DECODING_RESULTS.

% run the decoding analysis
DECODING_RESULTS = the_cross_validator.run_cv_decoding;

save_file_name = 'Zhang_Desimone_basic_7object_results'

save(save_file_name, 'DECODING_RESULTS');

Plotting the results

Below we show how to plot the decoding accuracies as function of time using the plot_standard_results_object which is useful when comparing decoding accuracies from different analyses. We also show how to plot the results when training the classifier at one time and testing the classifier at a second time (ie., a temporal-cross-training plot) using the plot_standard_results_TCT_object object, which is useful for testing where information is contained in a dynamic population code.

Plot decoding accuracy

To plot basic decoding results as a function of time, we will use the plot_standard_results_object. This object takes the decoding result files that were created by the standard_resample_CV object and plots them in a nice way. There are many properties that can be set for this object, so we recommend you read the documentation to see all the possibilities. Below we show how to plot the results we created above setting only a few of the possible parameters.

result_names{1} = save_file_name;  

% create the plot results object
plot_obj = plot_standard_results_object(result_names);

% put a line at the time when the stimulus was shown
plot_obj.significant_event_times = 0;

% display the results
plot_obj.plot_results;

Other measures of decoding accuracy can be plotted by setting the property plot_obj.result_type_to_plot. For example, if this property is set to 6, then mutual information will be plotted, and if this property is to 2 normalized rank results will be plotted.

Temporal-cross-decoding

To plot a matrix of decoding accuracies showing the results when the classifier was trained at time t1 and tested at time t2 we will use the plot_standard_results_TCT_object. The basic functions of this object are similar to the plot_standard_results_object, namely it takes the name of a decoding result file that was generated by running standard_resample_CV object and plots the full temporal-cross-training (TCT) matrix. There are also many properties that can be set for this object, so we again recommend you read the documentation to see all the possibilities. Below we show again how to plot the results we created above setting only a few of the possible parameters.

% create the plot results object
% note that this object takes a string in its constructor not a cell array
plot_obj = plot_standard_results_TCT_object(save_file_name);

% put a line at the time when the stimulus was shown
plot_obj.significant_event_times = 0;

% display the results
plot_obj.plot_results;

Conclusion

This concludes the introductory tutorial. You should now understand the design of the Neural Decoding Toolbox and how to do a basic decoding analysis. We recommend trying out this tutorial yourself in Matlab and experimenting with different datasource, feature-preprocessor, cross-validator and plotting parameters. Once you feel comfortable with this tutorial you can look at the generalization analysis tutorial which shows how to test whether neural representations contain information in an abstract/invariant format, or you can look at the getting started with your own data tutorial which shows the steps necessary to being analyzing your own data.