Bioinformatics Notes
This page is mainly intended to keep notes and perhaps someone else might find them useful. I am using OS X and Ubuntu 18.04.
How to install conda
I use conda/bioconda to easily install and manage bioinformatics packages that require different versions of dependencies that might conflict, etc.
Download miniconda (only install required packages to run) installer from - https://conda.io/en/latest/miniconda.html - I am working with the Python 3.7 bash (.sh) installer
In terminal $ cd Downloads/ (or wherever miniconda was download)
Make the .sh file executable $ chmod +x Miniconda3-latest-MacOSX-x86_64.sh
Execute the installation $ ./Miniconda3-latest-MacOSX-x86_64.sh
That should be it.
If you are running OS X Catalina the conda environment might not initialize automatically upon your next terminal session. You should see (base) next to your username:
To fix this you need to manually initialize conda by specifying the path:
cd to the miniconda installation bin $ cd ~/miniconda3/bin/
Initialize conda $ conda init zsh
How to setup bioconda channels
“Channels” are where conda downloads bioinformatics packages from.
$ conda config --add channels defaults
$ conda config --add channels bioconda
$ conda config --add channels conda-forge
How to setup bioconda environments
These are environments I utilize. Feel free to customize your own environments but always check for conflicting dependencies - you will usually receive a warning before anything is installed.
$ conda create -n alignment samtools bwa tablet seqtk
$ conda create -n binning maxbin2 openjdk
$ conda create -n taxonomy metaphlan2 kraken mash
$ conda create -n qc fastqc cutadapt seqtk fqtrim seqkit
$ conda create -n mcheck checkm-genome
$ conda create -n contigs megahit
The syntax for the first command says “conda” runs conda, “create -n” creates a new environment, “alignment” is the name of the environment, “samtools bwa tablet seqtk seqkit” are the packages that are installed.
To activate or enter the environments you just created simply type:
$ conda activate alignment
You will see (base) change to (alignment)
To change environments type the next environment:
$ conda activate binning
Resources: https://bioconda.github.io/user/install.html#install-conda
BWA Notes
$ conda activate alignment
Index reference genome
$ bwa index reference.fasta
Map reads to reference (-t 4 is the number of threads used, using more makes it go faster)
$ bwa mem -t 4 reference.fasta metagenome_reads.fasta > metagenome_reads.sam
Extract unmapped reads using samtools
$ samtools view -S -f4 metagenome_reads.sam > unmapped_metagenome_reads.sam
Extract mapped reads
$ samtools view -S -F4 metagenome_reads.sam > mapped_metagenome_reads.sam
Identify mapped reads
$ cut -f1 mapped_metagenome_reads.sam | sort | uniq > mapped_metagenome_reads.lst
Extract mapped reads from metagenome reads
$ seqtk subseq metagenome_reads.fasta mapped_metagenome_reads.lst > mapped_metagenome_reads.fasta
Convert unmapped reads sam to fasta
$ samtools fasta unmapped_metagenome_reads.sam > unmapped_metagenome_reads.fasta
Resources: https://github.com/alvaralmstedt/Tutorials/wiki/Separating-mapped-and-unmapped-reads-from-libraries
MASH Notes
$ conda activate taxonomy
Make sketches of your reads
$ mash sketch -m 2 reads.fastq
Determines distances
$ mash dist refseq.genomes.k21.s1000.msh reads.fastq.msh > distances.tab
Sort results
$ sort -gk3 distances.tab | head
Resources: https://mash.readthedocs.io/en/latest/
MetaPhlan2 Notes
$ conda activate taxonomy
Here is the basic example to profile a single metagenome from raw reads (--nproc 4 uses 4 threads)
$ metaphlan2.py reads.fasta --input_type fasta --nproc 4 > profile.txt
Resources: https://bitbucket.org/biobakery/biobakery/wiki/metaphlan2
CheckM Notes
$ conda activate mcheck
$ checkm lineage_wf -t 4 -x fasta --reduced_tree -f checkm.txt bins/ checkm/
Resources: https://github.com/Ecogenomics/CheckM/wiki
Cutadapt Notes
$ conda activate qc
Remove Illumina Universal Adapter (-j 0 uses maximum number of threads)
$ cutadapt -a AGATCGGAAGAG -o cleaned.fastq original.fastq -j 0
Looking for a straightforward Metagenome pipeline? I have utilized Metagenome-Atlas for a recent study.
$ conda create -n atlas
$ conda activate atlas
$ conda install -y -c bioconda -c conda-forge metagenome-atlas=2.8
$ atlas init --db-dir databases path/to/fastq/files
$ atlas run all