5 YAML Settings

bcbio offers a rich suite of analysis tools, with highly-customizable configurations. The number of options can be overwhelming. Here, we provide exemplar files that should work for most standard analyses at the LSP. These should serve as a starting point, and users are expected to tweak the files to suit their particular needs.

  • For the purposes of consistency with the rest of the guide, please name your configuration file O2.yaml.
  • If you are using the standard reference genome (i.e., not downloading your own), remove lines that begin with transcriptome_fasta and transcriptome_gtf.
  • If you are using a custom reference, remove the line that begins with genome_build.
  • If you reference contains non-coding regions as discussed previously, replace GRCh38.cdna.all.fa.gz with GRCh38.111.fa.gz.
  • In the YAML format, white space matters! When copying and pasting the examples below, please ensure that each line is properly indented.

5.1 Digital Gene Expression

For DGE, we recommend to use the following configuration file as a template. Copy and paste the the lines below to your O2.yaml or download it directly using the command wget https://labsyspharm.github.io/rnaseq/example_settings/dge/O2.yaml.

If using a custom reference, replace abc123 with your eCommons ID and myProject with your project name.

details:
  - analysis: scRNA-seq
    genome_build: hg38
    algorithm:
      transcriptome_fasta: /n/scratch/abc123/myProject/reference/Homo_sapiens.GRCh38.cdna.all.fa
      transcriptome_gtf: /n/scratch/abc123/myProject/reference/Homo_sapiens.GRCh38.111.gtf
      umi_type: harvard-scrb
      minimum_barcode_depth: 0
      cellular_barcode_correction: 1
      positional_umi: False

5.2 Deep RNAseq

For deep RNAseq, we recommend to use the following configuration file as a template. Copy and paste the lines below to your O2.yaml or download it directly using the command wget https://labsyspharm.github.io/rnaseq/example_settings/rna_seq/O2.yaml.

If using a custom reference, replace abc123 with your eCommons ID and myProject with your project name. Ensure that the Spike-in FASTA file matches what you downloaded previously.

details:
  - analysis: RNA-seq
    genome_build: hg38
    algorithm:
      transcriptome_fasta: /n/scratch/abc123/myProject/reference/Homo_sapiens.GRCh38.cdna.all.fa
      transcriptome_gtf: /n/scratch/abc123/myProject/reference/Homo_sapiens.GRCh38.111.gtf
      aligner: star
      expression_caller: salmon
      strandedness: auto
      trim_reads: false
      spikein_fasta:  /n/scratch/abc123/myProject/reference/ERCC92.fa