find_sites_with_k_label_repetitions
This helper function takes in labels in binned-format, and an integer k, and returns the indices for all sites (e.g. neurons) that have at least k repetitions of each experimental condition. The function has the following form:
[inds_of_sites_with_at_least_k_repeats, min_num_repeats_all_sites, num_repeats_matrix label_names_used] = find_sites_with_k_label_repetitions(the_labels, k, label_names_to_use)
The arguments to this function are:-
the_labels
The labels in binned-format that should be used (e.g., binned_labels.the_labels_to_use).
-
k
An integer specifying that each site returned should have at least k repetitions of each condition.
-
label_names_to_use
This specifies what label names (or numbers) to use. For example, if the_labels contains have strings consisting of ‘red’, ‘green’, ‘blue’, but you only want to know which sites have k repeats of ‘red’ and ‘green’ trials, then setting this to label_names_to_use = {‘red’, ‘green’} will accomplish this goal. If this argument is not specified, then any label that was presented to any site will be used.
-
inds_of_sites_with_at_least_k_repeats
The indices of sites that have at least k repetitions of each condition.
-
min_num_repeats_all_sites
This vector lists, for each site, the number of repetitions present for the label that has the minimum number of repetitions.
-
num_repeats_matrix
A [num_sites x num_labels] matrix that specifies for each site, the number of repetitions of each condition. This variable could be useful for determining if particular conditions should be excluded based on whether a specific condition was presented only a few times to many of the sites.
-
label_names_used
A specifies what label names were used when counting repetitions. This variable is equal to label_names_to_use if label_names_to_use was passed as an input argument. [added in NDT version 1.4]
Example
Suppose we had an experiment in which a number of different stimuli were shown when recordings were made from a number of different sites, and this information was contained in the variable binned_labels.stimulus_ID. The following command would find all sites in which each stimulus condition was presented at least 20 times:
inds_of_sites_with_at_least_k_repeats = find_sites_with_k_label_repetitions(binned_labels.stimulus_ID, 20)
When one is first starting to analyze a new dataset, one can also use this function to assess how many times each condition has been presented to each site in order to determine how many cross-validation splits to use. Examining the variable min_num_repeats_all_sites could be useful for this purpose, or one could run the following command:
for k = 0:60
inds_of_sites_with_at_least_k_repeats = find_sites_with_k_label_repetitions(binned_labels.stimulus_ID, k);
num_sites_with_k_repeats(k + 1) = length(inds_of_sites_with_at_least_k_repeats);
end
The variable num_sites_with_k_repeats(i) indicates how many sites have at least i - 1 repetitions, i.e., num_sites_with_k_repeats(1) gives the total number of sites, num_sites_with_k_repeats(2) gives how many sites have at least one presentation of each stimulus, etc.. Note that 2 repetitions is the minimum needed to do a decoding analyses, although to get reasonable results usually needs at least 5 repetitions of each condition.