generalization_DS

This datasource object (DS) allows one to train a classifier on a specific set of labels, and then test the classifier on a different set of labels. This enables one to evaluate how similar neural representations are across different but related conditions (i.e., does training on one set of conditions generalization to a different but related set of conditions?). This datasource is a subclass of the handle class (i.e., it has a persistent state) and contains a basic_DS where it gets most of its functionality from.

The constructor for this datasource contains contains the same arguments as basic_DS, plus two additional arguments the_training_label_number and the_test_label_numbers i.e., the constructor has the form: ds = generalization_DS(the_data, the_labels, num_cv_splits, the_training_label_numbers, the_test_label_numbers). the_training_label_number and the_test_label_numbers are cell arrays that specify which labels should belong to which class, with the first element of these cells arrays specifying the training/test labels that should be in first class, the second element of the cell array specifies which labels belong to the second class, etc.. For example, suppose one was interested in testing position invariance, and had done an experiment in which data was recorded while 7 different objects were shown at three different locations. If the labels for the 7 objects at the first location had labels 'obj1_loc1', 'obj2_loc1', ..., 'obj7_loc1' at the second location were 'obj1_loc2', 'obj2_loc2', ..., 'obj7_loc2', and at the third location were 'obj1_loc3', 'obj2_loc3', ..., 'obj7_loc3', then one could do a test of position invariance by setting the_training_label_names{1} = {'obj1_loc1}, the_training_label_names{2} = {'obj2_loc1'}, ..., the_training_label_names{7} = {'obj7_loc1'}, and setting the_test_label_names{1} = {'obj1_loc2', 'obj1_loc3'}, the_test_label_names{2} = {'obj2_loc2', 'obj2_loc3'}, ..., the_test_label_names{7} = {'obj7_loc2', 'obj7_loc3'}. This DS object is able to test such generalization from training on one set of labels and testing on a different set of labels by remapping the training label numbers to the index number in the_training_label_names cell array, and remapping the test label numbers with the the index number into the the_test_label_names cell array.

There is also an additional property that can be set for this object which is: use_unique_data_in_each_CV_split (default value is 0). When this argument is set to 0, the get_data method returns the normal leave one split out training and test data sets (i.e., the training set consists of (num_cv_splits - 1) splits of the data and the test set consists of 1 split of the data).

The data in the training still comes from different splits as the data in the test set, thus one can have some of the same labels in the both the_training_label_names and in the_test_label_names (in fact, if ones sets the_test_label_names = the_training_label_names , then the get_data method will be the same as the basic_DS get_data method). However, if use_unique_data_in_each_CV_split = 1, then each training and test set will consist data from only split, and thus each cross-validation run is essentially like running an independent decoding experiment. In this case the_training_label_names and the_test_label_names must not contain any of the same labels (otherwise, they would be copies of the same data which would violate the fact that the training and the test set must not have any of the same data).

Methods

ds = generalization_DS(binned_data_name, specific_binned_label_names, num_cv_splits, the_training_label_numbers, the_test_label_numbers, load_data_as_spike_counts)

The constructor, which takes the following inputs:

  1. binned_data_name

    A string containing the name of a file that has data in binned-format, or alternatively, a cell array of data in binned-format

  2. specific_binned_label_name 

    A string containing the name of specific binned labels, or alternatively, a cell array (or vector) containing the specific binned labels (e.g., binned_labels.specific_binned_labels)

  3. num_cv_splits

    A number indicating how many cross-validation splits there should be

  4. the_training_label_numbers

    A cell array specifying which labels should belong to which class, with the first element of this cell arrays specifying the training labels for the first class the second element of the cell array specifying which labels belong to the second class, etc.

  5. the_test_label_numbers

    A cell array specifying which test labels should belong to which class, with the first element of this cell arrays specifying the test labels for the first class the second element of the cell array specifying which labels belong to the second class, etc.

  6. load_data_as_spike_counts

    If this optional argument is set to an integer greater than 0, this will convert the data from firing rates (the default value saved by create_binned_data_from_raster_data function) to spike counts. This is useful when using the Poisson Naive Bayes classifier which only works on spike count data.

  7. [XTr_all_time_cv YTr_all XTe_all_time_cv YTe_all] = get_data(ds)

    The same arguments are basic_DS but now the data and labels are based on the grouping given by the_training_label_numbers and the_test_label_numbers that are set in the constructor

the_properties = get_DS_properties(ds)

Also returns the properties values for the_training_label_numbers, the_test_label_numbers and use_unique_data_in_each_CV_split.

ds = set_specific_sites_to_use(ds, curr_bootstrap_sites_to_use)

Exact same functionality inherited from basic_DS

Properties

In addition to the properties inherited from basic_DS, generalization_DS also has the following property that can be set:

use_unique_data_in_each_CV_split (default = 0).

When this argument is set to 0, the get_data method returns the normal leave one split out training and test data sets (i.e., the training set consists of (num_cv_splits – 1) splits of the data and the test set consists of 1 split of the data). The data in the training still comes from different splits as the data in the test set, thus one can have some of the same labels in the both the_training_label_numbers and in the_test_label_numbers (in fact, if one has the_test_label_numbers = the_training_label_numbers, then the get_data method will be the same as the basic_DS get_data method. However, if use_unique_data_in_each_CV_split = 1, then each training and test set will consist data from only split, and each cross-validation split will consist of unique data. In this case the_training_label_numbers and the_test_label_numbers must not contain any of the same labels (otherwise, they would be copies of the same population vector which would violate the fact that the training and the test set must not have any of the same data).

The following properties function the same way as in basic_DS (for more information see the basic_DS documentation):

  • create_simultaneously_recorded_populations (default = 0).
  • sample_sites_with_replacement (default = 0).
  • num_times_to_repeat_each_label_per_cv_split (default = 1).
  • num_resample_sites (default = -1, which means use all sites).
  • sites_to_use (default = -1).
  • sites_to_exclude (default = []).
  • time_periods_to_get_data_from (default = []).
  • randomly_shuffle_labels_before_running (default = 0).