5 YAML Settings
bcbio
offers a rich suite of analysis tools, with highly-customizable configurations. The number of options can be overwhelming. Here, we provide exemplar files that should work for most standard analyses at the LSP. These should serve as a starting point, and users are expected to tweak the files to suit their particular needs.
- For the purposes of consistency with the rest of the guide, please name your configuration file
O2.yaml
. - If you are using the standard reference genome (i.e., not downloading your own), remove lines that begin with
transcriptome_fasta
andtranscriptome_gtf
. - If you are using a custom reference, remove the line that begins with
genome_build
. - If you reference contains non-coding regions as discussed previously, replace
GRCh38.cdna.all.fa.gz
withGRCh38.111.fa.gz
. - In the YAML format, white space matters! When copying and pasting the examples below, please ensure that each line is properly indented.
5.1 Digital Gene Expression
For DGE, we recommend to use the following configuration file as a template. Copy and paste the the lines below to your O2.yaml
or download it directly using the command wget https://labsyspharm.github.io/rnaseq/example_settings/dge/O2.yaml
.
If using a custom reference, replace abc123
with your eCommons ID and myProject
with your project name.
details:
- analysis: scRNA-seq
genome_build: hg38
algorithm:
transcriptome_fasta: /n/scratch/abc123/myProject/reference/Homo_sapiens.GRCh38.cdna.all.fa
transcriptome_gtf: /n/scratch/abc123/myProject/reference/Homo_sapiens.GRCh38.111.gtf
umi_type: harvard-scrb
minimum_barcode_depth: 0
cellular_barcode_correction: 1
positional_umi: False
5.2 Deep RNAseq
For deep RNAseq, we recommend to use the following configuration file as a template. Copy and paste the lines below to your O2.yaml
or download it directly using the command wget https://labsyspharm.github.io/rnaseq/example_settings/rna_seq/O2.yaml
.
If using a custom reference, replace abc123
with your eCommons ID and myProject
with your project name. Ensure that the Spike-in FASTA file matches what you downloaded previously.
details:
- analysis: RNA-seq
genome_build: hg38
algorithm:
transcriptome_fasta: /n/scratch/abc123/myProject/reference/Homo_sapiens.GRCh38.cdna.all.fa
transcriptome_gtf: /n/scratch/abc123/myProject/reference/Homo_sapiens.GRCh38.111.gtf
aligner: star
expression_caller: salmon
strandedness: auto
trim_reads: false
spikein_fasta: /n/scratch/abc123/myProject/reference/ERCC92.fa