This helper function takes in labels in binned-format, and an integer k, and returns the indices for all sites (e.g. neurons) that have at least k repetitions of each experimental condition. The function has the following form:

[inds_of_sites_with_at_least_k_repeats, min_num_repeats_all_sites, num_repeats_matrix label_names_used] = find_sites_with_k_label_repetitions(the_labels, k, label_numbers_to_use)

The arguments to this function are:

  1. the_labels

    The labels in binned-format that should be used (e.g., binned_labels.the_labels_to_use).

  2. k

    An integer specifying that each site returned should have at least k repetitions of each condition.

Optional input arguments:

  1. label_names_to_use

    This specifies what label names (or numbers) to use. For example, if the_labels contains have strings consisting of ‘red’, ‘green’, ‘blue’, but you only wants to know which sites have k repeats of ‘red’ and ‘green’ trials, then setting this to label_names_to_use = {'red', 'green'} will accomplish this goal. If this argument is not specified, then any label that was presented to any site will be used.

Returned values:

  1. inds_of_sites_with_at_least_k_repeats

    The indices of sites that have at least k repetitions of each condition.

  2. min_num_repeats_all_sites

    This vector lists, for each sites, the number of repetitions present for the label that has the minium number of repetitions.

  3. num_repeats_matrix

    A [num_sites x num_labels] matrix that specifies for each site, the number of repetitions of each condition. This variable could be useful for determining if particular conditions should be excluded based on whether a specific condition was presented only a few times to many of the sites.

  4. label_names_used

    A specifies what label names were used when counting repetitions. This variable is equal to label_names_to_use if label_names_to_use was passed as an input argument. [added in NDT version 1.4]


Suppose we had an experiment in which a number of of different stimuli were shown when recordings were made from a number of different sites, and this information was contained in the variable binned_labels.stimulus_ID. The the following command would find all sites in which each stimulus condition was presented at least 20 times:

inds_of_sites_with_at_least_k_repeats = find_sites_with_k_repetitions(binned_labels.stimulus_ID, 20)

When one is first starting to analyze a new dataset, one can also use this function to assess how many times each condition has been presented to each site in order to determine how many cross-validation splits to use. Examining the variable min_num_repeats_all_sites could be useful for this purpose, or one could run the following command:

for k = 0:60
    inds_of_sites_with_at_least_k_repeats = find_sites_with_k_repetition(binned_labels.stimulus_ID, k);
    num_sites_with_k_repeats(k + 1) = length(inds_of_sites_with_at_least_k_repeats);

The variable num_sites_with_k_repeats(i) indicates how many sites have at least i - 1 repetitions, i.e., num_sites_with_k_repeats(1) gives the total number of sites, num_sites_with_k_repeats(2) gives how many sites have at least one presentation of each stimulus, etc.. Note that 2 repetitions is the minimum needed to do a decoding analyses, although to get reasonable results usually needs at least 5 repetitions of each condition.