After running TBM, we want to associate the SNP dosage to every point in the brain to see where we find significant associations.

Questions? Comments? contact me.

    STEPS

  1. First, we need to make a spreadsheet to store all the paths to the relevant files and covariates.
    • If your sample is unrelated, then have the following columns: "subjectID", "Age", "Sex", "JacFiles"
    • JacFiles would be a column of file paths, pointing to the required nifti outputs of the Jacobians from the TBM analysis. In the example, and our first run, we will proceed with the files ending in "_log_geo_jacobian_sm2.nii.gz" to represent slightly smoothed logarithm of the Geometric Jacobians from the warps.
    • If your sample has related individuals, in addition to the columns required for unrelated individuals, please include: "familyID" and "zygosity", for zygosity, mark MZ twins as "1" and DZ twins as "2" and everyone else as "0" (no quotes!)

  2. Download and unzip the ENIGMA R-codes from here. The bash codes below will call these!
  3. The following R- packages are also required, so please make sure to install them. Clicking on the package name will take you to the webpage.
  4. Now, lets define the variables in the first script and create a common set of files.
    In subsequent steps, if you can parallelize, we will split the voxel regressions up, and recombine them later,
    but this first initiation step will remain common between them all.
  5. Save the following script as run_step1_vTBM_all.sh
  6. #!/bin/bash
    #$ -S /bin/bash
    
    ####### Lets define these variables!! ###########
    #####################################################
    ####### Start user defined variables ###########
    
    ## directory where you've downloaded and stored all these regression scripts
    runDirectory=/enigma/vTBMassoc/enigma_Scripts/     
    ## R binary
    Rbin=/usr/local/R/bin/R
    ## one RS number to test voxelwise
    SNP=rs945270
    ## output directory to write into                                       
    outputD=/enigma/vTBMassoc/vSNPassoc_rs945270/      
    ## path to your spreadsheet above
    table=/enigma/vTBMassoc/vRegression_infofile.txt  
    ## the directory to the imputed output from Mach (after imputation scripts) 
    geno_dir=/ENIGMA/Study_Genotypes/1KGPref/Mach/ 
    ## column header to the Jacobian file paths being tested for TBM
    imgpaths=JacFiles  
    ## Are your subjects related?        
    related=no                 
    ## The number of covariates to include
    Ncov=2      
    ## A semicolon delimited list of covariates (these should be in single quotes)             
    covariates='Age;Sex'         
    ## Would you like to filter by any columns in the spreadsheet? If yes how many? 
         ### (likely 0, therefore this can remain unchanged)
    Nfilters=0  
    ## If you would like to filter by any columns in the spreadsheet put the column header here 
         ### and mark unwanted subjects in that column with an NA, otherwise, leave unchanged. 
         ### if there are more than 1, please make sure to use a semicolon delimited list 
         ### surrounded by single quotes     
    filters=none
    
    ####### End of user defined variables ###########
    #####################################################
    ####### no need to edit after this line (well, I hope not) ###########
    
    mkdir -p ${outputD}
    
    ## assumes all imputed files are in chunks named the following way:
    #               chunk*-ready4mach.${chr}.imputed.dose.gz
    cd ${geno_dir}
    
    infolog=${outputD}/SNPS_info_1kg.log
    echo "SNP:chr:position Al1 Al2 Freq1 MAF AvgCall Rsq Genotyped LooRsq EmpR EmpRsq Dose1 Dose2" >> $infolog
    
    ### for SNP in SNPS; do  ###if you have multiple SNPs, uncomment and start your loop here
    chr_pos=`curl -s https://enigma.ini.usc.edu/wp-content/uploads/2014/01/1kgp.key | grep -w ${SNP} | awk -F ' ' '{print $2}'`
    chr=`echo $chr_pos | awk -F ':' '{print $1}'`
    
    chunk_file=`zgrep -l ${chr_pos} chunk*-ready4mach.${chr}.imputed.info.gz`
    line=`zgrep -wn ${chr_pos} ${chunk_file} | awk -F ':' '{print $1}';`
    
    info=$SNP:`zgrep -w ${chr_pos} ${chunk_file} | awk -F '\t' '{print $1}'`
    printf ${info}: >> $infolog; zgrep -w ${chr_pos} ${chunk_file} | awk '{$1="";print}' >> $infolog
    
    SNPline=`expr $line + 1`
    chunk_dose=`echo ${chunk_file%.info.gz}.dose.gz`
    zless ${chunk_dose} | awk -v temp=$SNPline '{print $1, $temp}' | awk -F '>' '{print $2}' > ${outputD}/${SNP}_1kg.txt
    
    ### done ####  if you have multiple SNPs, uncomment and end the loop
    ########################
    ${Rbin} --no-save --slave --args ${table} ${imgpaths} ${outputD} \
           ${related} ${SNP} ${Ncov} ${covariates} ${Nfilters} ${filters} <  \
           ${runDirectory}/run_1KG_RFX_nlmeKin_voxelwise_any.R 
    
    

  7. Now, make sure you have execution permissions (chmod 755 run_step1_vTBM_all.sh) and simply run the script in the terminal from the directory it is located in
    ./run_step1_vTBM_all.sh
  8. Alright!! Now you have some basic files... lets see, can you parallelize? Do you have an SGE or a PBS cluster?
    • If neither, in the script below, put MaxNode=1 and node=1. Unfortunately, it will have to run for a long time.
    • If SGE or PBS, set the values to something reasonable for your cluster. This describes the number of components you will split the image up into, for example MaxNode=100 and we will run this so the nodes end up being the SGE or PBS task numbers from 1 to MaxNode.
  9. Save the following script as run_step2_vTBM_split.sh
  10. #!/bin/bash
    #$ -S /bin/bash
    
    ####### Lets define these variables!! ###########
    #####################################################
    ####### Start user defined variables ###########
    
    ## use any image to indicate which voxels to include, you can use the "outputtemplate0.nii.gz"
    #        image provided as an MDT, just make sure the outside of the brain has voxel values == 0
    maskFile=/enigma/vTBMassoc/outputtemplate0.nii.gz
    node=${SGE_TASK_ID} ## (${SGE_TASK_ID} if SGE cluster or ${PBS_ARRAYID} if PBS cluster)
    MaxNode=100      ## or however more or less you can run at once
    
    ####
    # the remainder of the user input variables should be the same as the step above
    
    ## directory where you've downloaded and stored all these regression scripts
    runDirectory=/enigma/vTBMassoc/enigma_Scripts/     
    ## R binary
    Rbin=/usr/local/R/bin/R
    ## output directory to write into                                       
    outputD=/enigma/vTBMassoc/vSNPassoc_rs945270/ 
    ## Are your subjects related? 
    related=no 
    
    ####### End of user defined variables ###########
    #####################################################
    
    ${Rbin} --no-save --slave --args ${related} ${outputD} ${maskFile} ${node} ${MaxNode}   < \ 
          ${runDirectory}/run_1KG_RFX_nlmeKin_vSplit_nifti.R  
    
    
  11. Assuming MaxNode is set to 100 as in example above, run this script on qsub as
    qsub -t 1:100 run_step2_vTBM_split.sh
  12. Once all nodes are done running, then you can combine them all and create sets of nifti images.
  13. Save the following script as run_step3_vTBM_combine.sh
  14. #!/bin/bash
    #$ -S /bin/bash
    
    ####### Lets define these variables!! ###########
    #####################################################
    ####### Start user defined variables ###########
    
    ##FDR correction level for output text file
    q=0.05
    
    ## the remainder of the user input variables should be the same as the steps above
    ####
    runDirectory=/enigma/vTBMassoc/enigma_Scripts/     
    Rbin=/usr/local/R/bin/R                                      
    outputD=/enigma/vTBMassoc/vSNPassoc_rs945270/
    MaxNode=100
    maskFile=/enigma/vTBMassoc/outputtemplate0.nii.gz
    SNP=rs945270
    Ncov=2
    covariates='Age;Sex'
    
    ####### End of user defined variables ###########
    #####################################################
    
    ${Rbin} --no-save --slave --args ${MaxNode} ${maskFile} ${outputD} ${SNP} ${Ncov} ${covariates} ${q} <  \
             ${runDirectory}/combine_1KG_node_data_RFX.R  
    

  15. Now, make sure you have execution permissions (chmod 755 run_step3_vTBM_combine.sh) and simply run the script in the terminal from the directory it is located in
    ./run_step3_vTBM_combine.sh
ENIGMA on social media: