Eddie Configuration

nf-core pipelines sarek, rnaseq, atacseq, and viralrecon have all been tested on the University of Edinburgh Eddie HPC. All except atacseq have pipeline-specific config files; atacseq does not yet support this.

Getting help

There is a Slack channel dedicated to eddie users on the MRC IGC Slack: https://igmm.slack.com/channels/eddie3

Using the Eddie config profile

To use, run the pipeline with -profile eddie (one hyphen). This will download and launch the eddie.config which has been pre-configured with a setup suitable for the University of Edinburgh Eddie HPC.

The configuration file supports running nf-core pipelines with Docker containers running under Singularity by default. Conda is not currently supported.

nextflow run nf-core/PIPELINE -profile eddie  # ...rest of pipeline flags

Before running the pipeline you will need to install Nextflow or load it from the module system. Generally the most recent version will be the one you want. If you want to run a Nextflow pipeline that is based on DSL2, you will need a version that ends with ‘-edge’.

To list versions:

module avail igmm/apps/nextflow

To load the most recent version:

module load igmm/apps/nextflow

This config enables Nextflow to manage the pipeline jobs via the SGE job scheduler and using Singularity for software management.

Singularity set-up

Load Singularity from the module system.

module load singularity

The eddie profile is set to use /exports/igmm/eddie/BioinformaticsResources/nfcore/singularity-images as the Singularity cache directory. If some containers for your pipeline run are not present, please contact the IGC Data Manager to have them added. You can add these lines to the file $HOME/.bashrc, or you can run these commands before you run an nf-core pipeline.

If you do not have access to /exports/igmm/eddie/BioinformaticsResources, set the Singularity cache directory to somewhere sensible that is not in your $HOME area (which has limited space). It will take time to download all the Singularity containers, but you can use this again.

Singularity will by default create a directory .singularity in your $HOME directory on eddie. Space on $HOME is very limited, so it is a good idea to create a directory somewhere else with more room and link the locations.

cd $HOME
mkdir /exports/eddie/path/to/my/area/.singularity
ln -s /exports/eddie/path/to/my/area/.singularity .singularity

Running Nextflow

On a login node

You can use a qlogin to run Nextflow, if you request more than the default 2GB of memory. Unfortunately you can’t submit the initial Nextflow run process as a job as you can’t qsub within a qsub.

qlogin -l h_vmem=8G

If your eddie terminal disconnects your Nextflow job will stop. You can run Nextflow as a bash script on the command line using nohup to prevent this.

nohup ./nextflow_run.sh &

On a wild west node - IGC only

Wild west nodes on eddie can be accessed via ssh (node2c15, node2c16, node3g22). To run Nextflow on one of these nodes, do it within a screen session.

Start a new screen session.

screen -S <session_name>

List existing screen sessions

screen -ls

Reconnect to an existing screen session

screen -r <session_name>

Using iGenomes references

A local copy of the iGenomes resource has been made available on the Eddie HPC for those with access to /exports/igmm/eddie/BioinformaticsResources so you should be able to run the pipeline against any reference available in the igenomes.config. You can do this by simply using the --genome <GENOME_ID> parameter.

Adjusting maximum resources

This config is set for IGC standard nodes which have 32 cores and 384GB memory. If you are a non-IGC user, please see the ECDF specification and adjust the --clusterOptions flag appropriately, e.g.

--clusterOptions "-C mem256GB" --max_memory "256GB"

Config file

See config file on GitHub

//Profile config names for nf-core/configs
params {
    config_profile_description = 'University of Edinburgh (eddie) cluster profile provided by nf-core/configs.'
    config_profile_contact     = 'Graeme Grimes (@ggrimes)'
    config_profile_url         = 'https://www.ed.ac.uk/information-services/research-support/research-computing/ecdf/high-performance-computing'
}

executor {
    name = "sge"
}

process {
    resourceLimits = [
        memory: 384.GB,
        cpus: 32,
        time: 240.h
    ]
    clusterOptions = { task.memory ? "-l h_vmem=${task.memory.bytes / task.cpus}" : null }
    stageInMode    = 'symlink'
    scratch        = 'false'
    penv           = { task.cpus > 1 ? "sharedmem" : null }

    // common SGE error statuses
    errorStrategy  = { task.exitStatus in [143, 137, 104, 134, 139, 140] ? 'retry' : 'finish' }
    maxErrors      = '-1'
    maxRetries     = 3

    beforeScript   = '''
    . /etc/profile.d/modules.sh
    module load singularity
    export SINGULARITY_TMPDIR="$TMPDIR"
    '''
}

params {
    // iGenomes reference base
    igenomes_base = '/exports/igmm/eddie/BioinformaticsResources/igenomes'
    max_memory    = 384.GB
    max_cpus      = 32
    max_time      = 240.h
}

env {
    MALLOC_ARENA_MAX = 1
}

singularity {
    envWhitelist = "SINGULARITY_TMPDIR,TMPDIR"
    runOptions   = '-p --scratch /dev/shm -B "$TMPDIR"'
    enabled      = true
    autoMounts   = true
    cacheDir     = "/exports/igmm/eddie/BioinformaticsResources/nfcore/singularity-images"
}