Aligning deep RNAseq and Digital Gene Expression (DGE) reads using bcbio
Last Updated: 2024-02-14
Quick Start
- Ensure your O2 environment is properly setup to run
bcbio
. - Put your data in permanent storage and create a copy on
/n/scratch/
. - Do you have multiple fastq files that need to be merged for each sample?
- If yes, run
bcbio_prepare_samples.py
to merge the files. Rename the resulting*-merged.csv
toalignment.csv
. (This is often the case for deep RNAseq experiments.) - If not, compose
alignment.csv
that maps filenames to corresponding sample descriptions. (This is often the case for DGE.)
- If yes, run
- (Optional) Download the latest human (or mouse) genome reference.
- Compose a setting YAML file.
- Instantiate the
bcbio
workspace. Descend intoalignment/work
subdirectory. Kick off abcbio
run.
Recommended Directory Structure
To keep things organized, we recommend maintaining the following directory structure. Let /n/scratch/abc123/myProject/
be the root directory of your analysis. Replace abc123
with your eCommons ID, and myProject
with your project name. See the section on data storage to learn how to create a copy of your data on /n/scratch/
.
Under /n/scratch/abc123/myProject/
create the following subdirectories:
fastq
- place your raw fastq files here.merged
- automatically created bybcbio_prepare_samples.py
if you have mulitple files per sample.reference
- download your reference genome to this subdirectory.alignment
- automatically created bybcbio
together with its subdirectories:config
-bcbio
will derive configuration files from your settings YAML file and place them here.work
- the bulk of the work files will reside here.final
- the resulting counts matrices will appear here, whenbcbio
finishes.