Aligning deep RNAseq and Digital Gene Expression (DGE) reads using bcbio
Last Updated: 2024-02-14
Quick Start
- Ensure your O2 environment is properly setup to run
bcbio. - Put your data in permanent storage and create a copy on
/n/scratch/. - Do you have multiple fastq files that need to be merged for each sample?
- If yes, run
bcbio_prepare_samples.pyto merge the files. Rename the resulting*-merged.csvtoalignment.csv. (This is often the case for deep RNAseq experiments.) - If not, compose
alignment.csvthat maps filenames to corresponding sample descriptions. (This is often the case for DGE.)
- If yes, run
- (Optional) Download the latest human (or mouse) genome reference.
- Compose a setting YAML file.
- Instantiate the
bcbioworkspace. Descend intoalignment/worksubdirectory. Kick off abcbiorun.
Recommended Directory Structure
To keep things organized, we recommend maintaining the following directory structure. Let /n/scratch/abc123/myProject/ be the root directory of your analysis. Replace abc123 with your eCommons ID, and myProject with your project name. See the section on data storage to learn how to create a copy of your data on /n/scratch/.
Under /n/scratch/abc123/myProject/ create the following subdirectories:
fastq- place your raw fastq files here.merged- automatically created bybcbio_prepare_samples.pyif you have mulitple files per sample.reference- download your reference genome to this subdirectory.alignment- automatically created bybcbiotogether with its subdirectories:config-bcbiowill derive configuration files from your settings YAML file and place them here.work- the bulk of the work files will reside here.final- the resulting counts matrices will appear here, whenbcbiofinishes.