Back to start
Link to full Jupyter notebook
Introduction¶
This is a tutorial for RyMVPA, explained here.
Note: does not run on my github, as I have yet to set up with a public dataset.
To understand and use this, please be familiar with PyMVPA. At the current state, this wrapper is untested, though it largely calls and streamlines already tested PyMVPA code minus advanced functions. Therefore please understand PyMVPA, and look at RyMVPA code yourself before publishing with it.
Best prerequisites are these PyMVPA tutorial chapters:
RyMVPA structure¶
RyMVPA wrapper functions are organized into several key script files:
- datamanage.py - etc. functions for handling datasets (e.g., saving, masking, printing attributes to CSV, making ROI masks, and more)
- searchlight_wraps.py - wrapper functions performing searchlights with various methods (classification, RSA, and our additional RSA methods)
- roi_wraps.py - wrapper functions performing various methods on specified ROI via ROI mask (classificaiton, RSA, our additional RSA methods)
We should likely make an easy directory in RyMVPA to print out functions and their purpose, but you'll find them relatively easy to navigate within the Python files. There are according prefixes to all searchlight ('sl') and ROI ('roi') functions, so you can tab out from those to quickly see options. As well, use Python's help function on any RyMVPA method to see documentation.
Let's import RyMVPA and its dependencies, and take a look at a function's doc:
%matplotlib inline
from rymvpa import *
help(slRSA_m_nSs)
Various other unique called by these wrappers are in the various other Python files, such as unique similarity analysis algorithms (e.g., multiple regression RSA), partitioners (e.g., that allow us to train on certain targets, test on others), and permutation testing algorithms. Please browse them to find these functions documented, or contact me with questions. I would recommend scanning the files to get a picture of what we have already created. Or, I'd appreciate help making better documentation (and testing ;).
Setting up¶
Next, we load the data in the form of a 'data dictionary' or 'datadict'. The datadict is the current convenient way RyMVPA holds and works on datasets with multiple subjects. Each key is the subject ID, each value is the PyMVPA dataset with attributes loaded in.
In this example, and my work so far, once we create each PyMVPA dataset per subject (including the loading of attributes), we save them into an H5 object that can be readily loaded to analyze data. H5 files are merely any python object saved as a hard file that can be loaded back into python.
This is memory inefficient, as it's holding each subject in in memory, and should be reprogrammed to load one at a time.
data = h5load('brc_allsubjs_tcm.gzipped.hdf5')
print("'Data' type:",type(data))
print('Subject IDs:',data.keys())
print('Subject 544',type(data['544']))
As we will see, each dataset is pre-prepared with sample attributes:
ds = data['559'].copy() #create temp dataset to look at; hard copy so it doesnt alter original!
print(ds.summary())
Now, let's take a close look at this dataset. What we see if samples where participants looked at faces varying on a morph continuum between black and white faces, with S0 ('start') referring to the center point, M ('minus') referring to increasingly White faces, P ('plus') to increasingly Black faces.
That is, the continuum varies from White (-6, or 'M6') to Ambiguous (0, or 'S0') to Black (+6, or 'P6').
Data is chunked by functional runs.
print('Targets:',ds.UT)
print('Chunks:',ds.UC)
Now, let's load in some dissimilarity matrices we can use for RSA analyses, and any other independent data we may want to combine with analyses. You will see, my python file holding this has some functions that print out QA about DMs, and what is being loaded.
import DMs_brc
print('Example DM:',DMs_brc.mlDMstandard['mlDMstandard'])
Lastly, a note. If all of your data is the same size, you should create a 'remap' dataset, which will be used as formatting to bring any PyMVPA datset back to NIFTI format. These are essential as some functions alter featural aspects of the dataset. A remap dataset should always be the original format the dataset was loaded into PyMVPA as.
remap = ds.copy()
Since our datasets are different sizes (each masked by their own data), we will later use each subject's own data as their remap dataset.
Data management tools¶
Now that our environment is set up, with our datadict, a remap dataset, and any additional data needed (DMs), we can get started.
First, let's take a look at some essential data management tools available in RyMVPA.
# Save PyMVPA datasets to nifti files
sl2nifti(data['559'],data['559'],'s559')
import glob
print('Nifti files saved in current directory:',glob.glob('./*nii'))
# also see datadict2nifti for quickly saving a datadict to niftis when shapes match in a dataset
# Select certain targets (remove targets you aren't interested to save memory / aid algorithms)
data_S0 = select_targets_data(data,['S0'])
print('Unique targets across subjects:', np.unique([data_S0[s].UT for s in data_S0]))
del(data_S0)
# also see omit functions, and select chunk functions
# Mask a dataset! ...for some reason, PyMVPA doesn't make code available to do this outside loading a datset with fmri_dataset().
mask = 'tc3_SVM_mask_rFG.nii.gz'
print('Masked ds shape:',mask_dset(ds,mask).shape) # mask of rFG from a group level classification analysis of 3 categories in this dataset
Make sure to check out other useful functions for saving PyMVPA datasets back to CSVs to analyze data elsewhere, such as clust2mask, sxs2dset, and sa2csv.
ROI basic analyses¶
To start with one useful function, what if we wanted to see the neural similarity of each condition? You could then use MDS or other methods to look at the representational geometry of any ROI you specify.
# currently seems to alter data permanently, so, data needs to be reloaded after
print(roi2ndm_1Ss(ds, mask))
RSA¶
What if we wanted to perform RSA in this region, using a basic a priori visual model?
Quick note: you will notice '1' vs. 'n' prefixes for each function in RyMVPA. '1' applies the analysis to one subject/PyMVPA dataset. 'n' applies the analysis to a datadict.
data = h5load('brc_allsubjs_tcm.gzipped.hdf5')
res = roiRSA_nSs(data,mask,DMs_brc.mlDMstandard['mlDMstandard'])
print res[0]
Here we see condition similarity is relate to an a priori visual model of the morph levels' similarities. Of course this analysis was circular / based on wholebrain SVM results. Note the documentation for the RSA function, and that you may control for additional DMs in partial correlations or multiple regression as well, for instance, controlling for additional visual models. As well, check documentation for different options, such as distance metrics.
Classification¶
Now let's try classification. First, we will relab anything below the morph center as White, and above as Black.
data_wb = dict([ (s,data[s].copy()) for s in data ] )
for s in data_wb:
data_wb[s].sa['targets'] = ['w' if i[0] == 'M' else 'b' for i in data_wb[s].targets]
print('New vs. old target labels:')
print data_wb['566'].targets
print data['566'].targets
# to run the classifier
res = roiClass_nSs(data_wb,mask)
print res[0]
del(data_wb)
Again, circular, but it works! See documentation for altering parameters, such as cross validation partitioners and classifiers.
Searchlights¶
Now, to demonstrate these same analyses as searchlights, lets first reduce the dataset size substantially so that we can test this code quickly (searchlights can take substantial time, and I recommend HPC use always to divide subjects up and perform quickly). We'll also only look at a few subjects.
data = h5load('brc_allsubjs_tcm.gzipped.hdf5')
data = dict( [ (s,mask_dset(data[s],mask)) for s in data.keys()[:2]]) #using our mask, so datasets will have only a few hundred searchlight centers
RSA¶
First, let's run our RSA model through a searchlight. See function documentation for other options, such as saving options, metrics, multiple regression RSA, etc.
slres = slRSA_m_nSs( data, DMs_brc.mlDMstandard['mlDMstandard'])['r'] #turn off status print to see status, easy to estimate searchlight time to complete with this
print('shape',slres['559'].shape)
print('max',np.max(slres['559']))
print('min',np.min(slres['559']))
print('mean',np.mean(slres['559']))
print('sd',np.std(slres['559']))
We now have a numpy array per datadict key (subject), where each value is the result of the analysis within a searchlight sphere around that voxel center. We may now remap and save these as nifti files, and complete whole-brain analyses in other software such as AFNI. Make sure to see documentation for additional save options, such as datadict saving.
sl2nifti(slres['559'],data['559'],'slRSA_rFG_559.nii')
print('Nifti files saved in current directory:',glob.glob('./*nii'))
plotteRy_Brain('./slRSA_rFG_559.nii')
Classification¶
Lastly, let's do classification. Again, check documentation for more options. First, let's relabel again. Then run our datadict searchlight.
data_wb = dict([ (s,data[s].copy()) for s in data ] )
for s in data_wb:
data_wb[s].sa['targets'] = ['w' if i[0] == 'M' else 'b' for i in data_wb[s].targets]
print('New vs. old target labels:')
print data_wb['559'].targets
print data['559'].targets
slres = slClass_nSs( data_wb )
print('shape',slres['559'].shape)
print('max',np.max(slres['559']))
print('min',np.min(slres['559']))
print('mean',np.mean(slres['559']))
print('sd',np.std(slres['559']))
We again now have an array of results per searchlight voxel center. However, this time, they are accuracies subtracting chance level, indicating probability of correct classification compared to chance level. Again, we may save this in various ways for group level significance testing.
sl2nifti(slres['559'],data['559'],'slClass_rFG_559.nii')
print('Nifti files saved in current directory:',glob.glob('./*nii'))
plotteRy_Brain('./slClass_rFG_559.nii')