Bioinformatics Notes

This page is mainly intended to keep notes and perhaps someone else might find them useful. I am using OS X and Ubuntu 18.04.

How to install conda

I use conda/bioconda to easily install and manage bioinformatics packages that require different versions of dependencies that might conflict, etc.

  1. Download miniconda (only install required packages to run) installer from - https://conda.io/en/latest/miniconda.html - I am working with the Python 3.7 bash (.sh) installer

  2. In terminal $ cd Downloads/ (or wherever miniconda was download)

  3. Make the .sh file executable $ chmod +x Miniconda3-latest-MacOSX-x86_64.sh

  4. Execute the installation $ ./Miniconda3-latest-MacOSX-x86_64.sh

That should be it.

If you are running OS X Catalina the conda environment might not initialize automatically upon your next terminal session. You should see (base) next to your username:

To fix this you need to manually initialize conda by specifying the path:

  1. cd to the miniconda installation bin $ cd ~/miniconda3/bin/

  2. Initialize conda $ conda init zsh

How to setup bioconda channels

“Channels” are where conda downloads bioinformatics packages from.

$ conda config --add channels defaults

$ conda config --add channels bioconda

$ conda config --add channels conda-forge

How to setup bioconda environments

These are environments I utilize. Feel free to customize your own environments but always check for conflicting dependencies - you will usually receive a warning before anything is installed.

$ conda create -n alignment samtools bwa tablet seqtk

$ conda create -n binning maxbin2 openjdk

$ conda create -n taxonomy metaphlan2 kraken mash

$ conda create -n qc fastqc cutadapt seqtk fqtrim seqkit

$ conda create -n mcheck checkm-genome

$ conda create -n contigs megahit

The syntax for the first command says “conda” runs conda, “create -n” creates a new environment, “alignment” is the name of the environment, “samtools bwa tablet seqtk seqkit” are the packages that are installed.

To activate or enter the environments you just created simply type:

$ conda activate alignment

You will see (base) change to (alignment)

To change environments type the next environment:

$ conda activate binning

Resources: https://bioconda.github.io/user/install.html#install-conda

BWA Notes

$ conda activate alignment

Index reference genome

$ bwa index reference.fasta

Map reads to reference (-t 4 is the number of threads used, using more makes it go faster)

$ bwa mem -t 4 reference.fasta metagenome_reads.fasta > metagenome_reads.sam

Extract unmapped reads using samtools

 $ samtools view -S -f4 metagenome_reads.sam > unmapped_metagenome_reads.sam

Extract mapped reads

$ samtools view -S -F4 metagenome_reads.sam > mapped_metagenome_reads.sam

Identify mapped reads

$ cut -f1 mapped_metagenome_reads.sam | sort | uniq > mapped_metagenome_reads.lst

Extract mapped reads from metagenome reads

$ seqtk subseq metagenome_reads.fasta mapped_metagenome_reads.lst > mapped_metagenome_reads.fasta

Convert unmapped reads sam to fasta

$ samtools fasta unmapped_metagenome_reads.sam > unmapped_metagenome_reads.fasta

Resources: https://github.com/alvaralmstedt/Tutorials/wiki/Separating-mapped-and-unmapped-reads-from-libraries

MASH Notes

$ conda activate taxonomy

Make sketches of your reads

$ mash sketch -m 2 reads.fastq

Determines distances

$ mash dist refseq.genomes.k21.s1000.msh reads.fastq.msh > distances.tab

Sort results

$ sort -gk3 distances.tab | head

Resources: https://mash.readthedocs.io/en/latest/

MetaPhlan2 Notes

$ conda activate taxonomy

Here is the basic example to profile a single metagenome from raw reads (--nproc 4 uses 4 threads)

$ metaphlan2.py reads.fasta  --input_type fasta --nproc 4 > profile.txt

Resources: https://bitbucket.org/biobakery/biobakery/wiki/metaphlan2

CheckM Notes

$ conda activate mcheck

$ checkm lineage_wf -t 4 -x fasta --reduced_tree -f checkm.txt bins/ checkm/

Resources: https://github.com/Ecogenomics/CheckM/wiki

Cutadapt Notes

$ conda activate qc

Remove Illumina Universal Adapter (-j 0 uses maximum number of threads)

$ cutadapt -a AGATCGGAAGAG -o cleaned.fastq original.fastq -j 0

Looking for a straightforward Metagenome pipeline? I have utilized Metagenome-Atlas for a recent study.

$ conda create -n atlas

$ conda activate atlas

$ conda install -y -c bioconda -c conda-forge metagenome-atlas=2.8

$ atlas init --db-dir databases path/to/fastq/files

$ atlas run all

Resource: https://github.com/metagenome-atlas/atlas