How to Run nf-core Analysis on the Cloud Using nf-core/rnaseq Pipeline

3 minute read
Lifebit

Lifebit

 

nf corenf-core is a great community-driven effort which makes bioinformatics pipelines very standardised and incredibly simple to run (Ewels et al., 2019) (check out our previous blog post which delves into what nf-core really is about). You can run these pipelines with ease, and rest assured that you are following community best practices.

However, for any given project, you still have to make sure you have installed all the required software (Nextflow & Docker/Singularity), manage all of your data, provide the necessary compute resources & wait long queue times if submitting to a computing cluster…

How to run any nf-core analysis over the Cloud: an example using the nf-core/rnaseq pipeline

What if you don’t have the resources or are tired of waiting? In this blog post, we will show you how it is possible to run any of the stable release nf-core pipelines with ease over the Cloud by using the CloudOS platform. We have used the RNA-seq pipeline as an example because it is the most popular of all the nf-core pipelines. The following can also be done for any of the nf-core pipelines.

The RNA-seq workflow processes raw FastQ inputs, aligns the reads and generates gene counts before performing extensive quality control on the results. (See the output documentation for more details).

How to import a pipeline

Before starting, make sure you have already created your free CloudOS account. You can then navigate to the pipelines page on CloudOS:

Pipeline Deploit

Once on the pipelines page, you are able to create a new pipeline. To do this follow the steps below:

  1. Click the green “New” button
  2. You can then “Select” the GitHub logo to import the RNA-seq pipeline which is coming from GitHub GitHub
  3. Paste the URL of the repository from GitHub: https://github.com/nf-core/rnaseq
  4. Name the pipeline, eg “rnaseq”
  5. Optionally: give the pipeline a description
  6. Finally, click “Next”

 

Import rnaseq

(Optional) Select a pipeline

This step is optional because at the end of the last step you will be taken to the page to select data & parameters for the newly imported pipeline. If this is the case, you don’t need to do anything for this step.

Your imported pipelines can be found on the pipelines page under the “MY PIPELINES & TOOLS” tab:

my pipelines and tools

select_rnaseq.gif

Selecting data & parameters

We have provided example data within the S3 bucket s3://lifebit-featured-datasets/pipelines/rnaseq-data. Alternatively, you can select your own input S3 bucket/data required you have the correct input files.

To select input data & parameters:

Import the dataset

  1. Click the blue add data button data button
  2. Click the green plus to add a new dataset new dataset
  3. Optional: enter a name for your new dataset, eg “rnaseq_test” & hit enter
  4. Click “Add files & folders” & “Import” Optional
  5. Double click lifebit-featured S3 bucket & navigate to the folder  “lifebit-featured-datasets/pipelines/rnaseq-data”

import rnaseq data

Add & set the following parameters/data:

For any of the nf-core pipelines, you can see a well-documented list of all available parameters. For the RNA-seq pipeline, we will add the following:

  1. reads – Select the folder “rnaseq_test/rnaseq-data/reads” & add the regex “*” to select all FastQ files within the folder
  2. singleEnd – To select single-end reads
  3. fasta – Select the file “rnaseq_test/rnaseq-data/reference/genome.fa
  4. gtf -Select the file “rnaseq_test/rnaseq-data/reference/genes.gtf
  5. max_memory -Type “60.GB” to prevent the pipeline from using too much memory
  6. Click “Next”

Rnaseq data params

Running an analysis

You’re almost done! The last 3 steps follow and then you’ll you have successfully scheduled and deployed your first job on the CloudOS platform!

  1. Select a project
    • This is to group analyses together
    • For example, you can select the existing “Demo” project
  2. Select an instance
    • This is to set the compute resources available for running the analysis
    • For example, you can select the instance “m2.2xlarge”
  3. Finally, click “Run job”

Run job

Monitoring an analysis

After clicking ”Run Job”, the job will be initialising and will take ~5mins to initialise while the AWS instance is scheduled. Until then you can navigate to the jobs page dashboard to view all jobs (both completed & running). Once the job has finished initialising, you can click on it to view the Job Analysis page. Here, you can view the resource consumption, results & MultiQC HTML quality control report.

rnaseq results

View an example completed job

This tutorial shows you how you can import and run the nfcore/rnaseq pipeline on CloudOS. We’re pleased to say that we have the released & stable nf-core pipelines already on the CloudOS platform with example data and parameters. This means that they are even easier to run!

Thanks for reading & hope you enjoyed the blog post. Now that you’ve learned how you can run any of the nf-core pipelines over CloudOS be sure to check out all of the nf-core pipelines so that you can go out and…


We would like to know what you think! Please fill out the following form or contact us at hello@lifebit.ai. We welcome your comments and suggestions!

Featured news and events

What is a Data Lakehouse?
Continue reading

Lifebit and Lupus Research Alliance Partner to Accelerate Lupus Research through Secure Data Analytics Platform
Continue reading

Lifebit and Flatiron Health Bring Cutting-Edge Research Technology to Japan, Advancing Global Cancer Care through Real-World Data
Continue reading

Lifebit Joins AWS Marketplace to Boost Health Data Research
Continue reading

Streamlining Internal Data Analysis with Trusted Research Environments
Continue reading

Data Security and Compliance in Nonprofit Health Research
Continue reading

Data Harmonization: Overcoming Challenges with Proprietary and Outsourced Datasets
Continue reading

Lifebit, CanPath and AWS Collaborate to Advance Health Research with Innovative Cloud-Based Data Analytics Platform
Continue reading

Maximizing Research Efficiency with Trusted Research Environments
Continue reading

Revolutionizing Pharma: Unlocking the Power of a Global Federated Data Network
Continue reading