Skip to content

The workflow of all steps is summarised below:

and the template is hosted on the LCSB Gitlab

Book resources on iris

si is a shortcut for booking a short interactive session (30 minutes, 1 core).

But, we can alter the time for one hour with -t and ask for more cores with -c. Example below where we ask for 1 jour and 6 cores:

si -t 1:00:00 -c 6

Interactive sessions are limited to maximum 2 hours.

(Optional) Reservations

Two nodes were reserved for this course. This will work if you have a student accounts or if you are part of the Department of Life Sciences. To access them you need to add the following parameter for Thursday:

si -t 1:0:0 -c 6 --reservation=aginolhac-teaching-20210902

and for Friday:

si -t 1:0:0 -c 6 --reservation=aginolhac-teaching-20210903

Install the snakemake template

We will work in the scratch partition.

  • the alias to go there is:
cds

Type pwd to ensure you are in /scratch/users/username

  • create the directory and go inside:
mkdir snakemake-atac-seq
cd snakemake-atac-seq

and run the following commands:

VERSION="v0.0.2"
wget -qO- https://gitlab.lcsb.uni.lu/aurelien.ginolhac/snakemake-atac-seq/-/archive/${VERSION}/snakemake-atac-seq-${VERSION}.tar.gz | tar xfz - --strip-components=1

this command will download, extract (without the root folder) the following files:

config/
Dockerfile
README.md
workflow/

you may want to delete the Dockerfile and README.md if you wish, they are not used by snakemake for runtime.

Load necessary tools

  • load singularity

the command once on a node is:

(base) user@access module load tools/Singularity
  • load the snakemake environment
(base) user@access $ conda activate snakemake

Of note, the 2 above steps can be replaced by the alias smk if you added the alias in your .bashrc

It should look like this from accessing the access machine to getting the resources and activating the environment:

(base) aginolhac@access1.iris-cluster.uni.lux(14:05:02)-> 20:56): ~ $ si -c 6 -t 1:00:00
# salloc -p interactive --qos debug -C batch 
salloc: Pending job allocation 2424900
salloc: job 2424900 queued and waiting for resources
salloc: job 2424900 has been allocated resources
salloc: Granted job allocation 2424900
salloc: Waiting for resource configuration
salloc: Nodes iris-139 are ready for job
(base) aginolhac@iris-139(14:17:21)-> 29:51)(2424900 1N/T/1CN): ~ $ smk
(snakemake) aginolhac@iris-139(14:17:23)-> 29:49)(2424900 1N/T/1CN): ~ $

Run on human data

Disclaimer: those sequence files are of human origin. You must not copy them except for this teaching exercise.

Fetch data

Make sure you are in /scratch/users/username/snakemake-atac-seq/, then copy the 12 fastq files:

mkdir data
rsync -v /scratch/users/aginolhac/snakemake-atac-seq/data/*gz data/

Set-up the cache

To avoid spending too much time fetching the same reference genome and indexing it for mapping, we will share our work for these parts.

export SNAKEMAKE_OUTPUT_CACHE=/scratch/users/aginolhac/snakecache

Dry-run

First a dry-run as we did before:

snakemake -j 6 -n

Run

If all correct, run the workflow with cache activated. Of note, my scratch is shared for fetching the databases of fastq_screen.

snakemake --use-singularity --singularity-args "-B /scratch/users/aginolhac:/scratch/users/aginolhac"  --cache -j 6

Fetch data back on your computer

The needed files are:

report.html
results/qc/multiqc/multiqc.html
results/big_wig/mDAN_D30_1.bigWig
results/big_wig/mDAN_D30_2.bigWig
results/big_wig/mDAN_D30_3.bigWig
results/big_wig/smNPC_1.bigWig
results/big_wig/smNPC_2.bigWig
results/big_wig/smNPC_3.bigWig