find_sites_with_k_label_repetitions

This helper function takes in labels in binned-format, and an integer k, and returns the indices for all sites (e.g. neurons) that have at least k repetitions of each experimental condition. The function has the following form:

[inds_of_sites_with_at_least_k_repeats, min_num_repeats_all_sites, num_repeats_matrix label_names_used] = find_sites_with_k_label_repetitions(the_labels, k, label_names_to_use)

The arguments to this function are:

  1. the_labels

    The labels in binned-format that should be used (e.g., binned_labels.the_labels_to_use).

  2. k

    An integer specifying that each site returned should have at least k repetitions of each condition.

Optional input arguments:

  1. label_names_to_use

    This specifies what label names (or numbers) to use. For example, if the_labels contains have strings consisting of ‘red’, ‘green’, ‘blue’, but you only want to know which sites have k repeats of ‘red’ and ‘green’ trials, then setting this to label_names_to_use = {'red', 'green'} will accomplish this goal. If this argument is not specified, then any label that was presented to any site will be used.

Returned values:

  1. inds_of_sites_with_at_least_k_repeats

    The indices of sites that have at least k repetitions of each condition.

  2. min_num_repeats_all_sites

    This vector lists, for each site, the number of repetitions present for the label that has the minimum number of repetitions.

  3. num_repeats_matrix

    A [num_sites x num_labels] matrix that specifies for each site, the number of repetitions of each condition. This variable could be useful for determining if particular conditions should be excluded based on whether a specific condition was presented only a few times to many of the sites.

  4. label_names_used

    A specifies what label names were used when counting repetitions. This variable is equal to label_names_to_use if label_names_to_use was passed as an input argument. [added in NDT version 1.4]

Example

Suppose we had an experiment in which a number of different stimuli were shown when recordings were made from a number of different sites, and this information was contained in the variable binned_labels.stimulus_ID. The following command would find all sites in which each stimulus condition was presented at least 20 times:

1
inds_of_sites_with_at_least_k_repeats = find_sites_with_k_label_repetitions(binned_labels.stimulus_ID, 20)

When one is first starting to analyze a new dataset, one can also use this function to assess how many times each condition has been presented to each site in order to determine how many cross-validation splits to use. Examining the variable min_num_repeats_all_sites could be useful for this purpose, or one could run the following command:

1
2
3
4
for k = 0:60
    inds_of_sites_with_at_least_k_repeats = find_sites_with_k_label_repetitions(binned_labels.stimulus_ID, k);
    num_sites_with_k_repeats(k + 1) = length(inds_of_sites_with_at_least_k_repeats);
end

The variable num_sites_with_k_repeats(i) indicates how many sites have at least i - 1 repetitions, i.e., num_sites_with_k_repeats(1) gives the total number of sites, num_sites_with_k_repeats(2) gives how many sites have at least one presentation of each stimulus, etc.. Note that 2 repetitions is the minimum needed to do a decoding analyses, although to get reasonable results usually needs at least 5 repetitions of each condition.