Quick Start

Ensure your O2 environment is properly setup to run bcbio.
Put your data in permanent storage and create a copy on /n/scratch/.
Do you have multiple fastq files that need to be merged for each sample?
- If yes, run bcbio_prepare_samples.py to merge the files. Rename the resulting *-merged.csv to alignment.csv. (This is often the case for deep RNAseq experiments.)
- If not, compose alignment.csv that maps filenames to corresponding sample descriptions. (This is often the case for DGE.)
(Optional) Download the latest human (or mouse) genome reference.
Compose a setting YAML file.
Instantiate the bcbio workspace. Descend into alignment/work subdirectory. Kick off a bcbio run.

Recommended Directory Structure

To keep things organized, we recommend maintaining the following directory structure. Let /n/scratch/abc123/myProject/ be the root directory of your analysis. Replace abc123 with your eCommons ID, and myProject with your project name. See the section on data storage to learn how to create a copy of your data on /n/scratch/.

Under /n/scratch/abc123/myProject/ create the following subdirectories:

fastq - place your raw fastq files here.
merged - automatically created by bcbio_prepare_samples.py if you have mulitple files per sample.
reference - download your reference genome to this subdirectory.
alignment - automatically created by bcbio together with its subdirectories:
- config - bcbio will derive configuration files from your settings YAML file and place them here.
- work - the bulk of the work files will reside here.
- final - the resulting counts matrices will appear here, when bcbio finishes.

Aligning deep RNAseq and Digital Gene Expression (DGE) reads using bcbio

Aligning deep RNAseq and Digital Gene Expression (DGE) reads using bcbio

Quick Start

Recommended Directory Structure