Step-by-Step Animated Guide: Running Your First Pipeline on CloudOS

5 minute read
Lifebit

Lifebit

Introduction

In this post you will:

  • Be introduced to the main CloudOS web app functionalities
  • Learn how to run the Google AI DeepVariant pipeline on the CloudOS platform as shown on our latest webinar
  • Delve into CloudOS’ interactive graphics for job monitoring analytics (there will be gifs!)

This tutorial is an animated, step-by-step guide on how to run your first pipeline on the CloudOS bioinformatics analysis and deployment platform. With CloudOS, running an analysis is easy as 1-2-3.

  1. Select a pipeline
  2. Select input data and parameters, and
  3. Run analysis

CloudOS takes care of everything else in between, from job scheduling all the way to results. You can, well .. deploy :) your analysis on the CloudOS app either by using the web UI or by CLI through our RESTful API. More on how to access CloudOS programmatically on an upcoming post, so stay tuned. For now, let’s take a closer look at what the steps actually look like on the CloudOS web platform.

First things first: The CloudOS platform on wordpress-614922-2173205.cloudwaysapps.com

You can find a link to try the CloudOS platform as soon as you land on our website. You can register for a free account by clicking on the blue button Sign up - free .

 

Complete your registration form and welcome aboard ! We provide you with a Lifebit cloud account with pre-loaded credits to start your analyses. You can always switch to your cloud provider later from the ⚙️ Settings menu.

Pipelines on the CloudOS platform

After registration, we are redirected to the CloudOS PIPELINES catalogue, under the PUBLIC PIPELINES & TOOLS menu. This is the library where all the curated, public pipelines live. You can use the pipelines with your own data and modify the parameters as you wish. Default pre-selected example input data and parameters are available for every pipeline, so you can quickly get started and run your first analysis on the CloudOS platform.

Exploring the CloudOS platform: Navigation bar icons

You can browse the CloudOS app menu from the navigation bar on the left anytime to access the following:

  • Home : Homepage with an overview of your jobs
  • Jobs : Executed projects
  • Projects : Pipelines with configured parameters.
  • Pipelines: The collection of the pipelines available on CloudOS
  • Data : All the datasets you have uploaded
  • Docs : Documentation on how to use CloudOS at our GitBook.io
  • Settings : Profile settings (user profile details, linked accounts eg. Cloud, GitHub, Docker, Lifebit API key)

 

Selecting DeepVariantTrain from the CloudOS pipelines catalogue

As mentioned above, CloudOS is already populated with community curated, containerised pipelines. You can always access and browse the available CloudOS pipelines, by clicking on the pipelines icon in the navigation bar on the left.

The DeepVariantTrain pipeline, is an nf-core based implementation we have developed at Lifebit for running the Google DeepVariant variant caller. You can find everything in more detail on our Github repository.

Briefly, the  DeepVariantTrain model takes a Deep Learning, image recognition approach on variant calling, which requires the encoding of the BAM files into images, to subsequently obtain the variants and indels. The output of the prediction is in the standard VCF format that other commonly used variant callers also return.

Parameters and input data

After selecting the DeepVariantTrain pipeline, you will land on the Select data & parameters page. Here, you can select your input data to upload and define your parameters. For an elaborate tutorial on how to create your own customised nf-core based pipeline, you can check Phil’s post and get coding.

For now, let’s go with the example parameters and input data that are already available on CloudOS. Just click on Try with example data & parameters and your pipeline will be ready to go.

 

After loading the example, you will notice that you can also preview the respective code snippet at the Final command dark grey field, as shown above. If you have changed your mind and want to modify your input data and parameters, you can click on the - minus button and clear the arguments. You can always reset your pipelines this way and add different parameters and data before running the analysis.

Cloud configuration and job scheduling

By now you have successfully selected a pipeline and loaded the example data and parameters. Now it’s time for deploying the analysis!  After clicking ‘Run’ on the top righthand of your screen, you will be prompted to select your preferred execution platform (eg. AWS Cloud or Azure Cloud) and choose configuration which boils down to how quick you want your job to be finished.

This means that you can go low (cost) and slow(er), or give your analysis an extra boost if you have increased demands for a specific project and/or need your results as soon as possible.

As shown above, once you click ‘Run job’ you will be notified that your job has been successfully scheduled ✅. You are now free to go have a coffee and return to find your analysis results ready!

Job completion, abort mission and troubleshooting

After clicking Run job  you will then be redirected in the Jobs page (check the rocket icon on the left of your screen),

where you can monitor your job status, a real-time update of the analysis cost (charged by your Cloud provider), and you can also have a look at the results button, which is initially grey . You will know your results are ready when this has turned green and a checkmark ✅ has appeared. Check it out below:

Having second thoughts about running the analysis  ? Well, you can always just abort mission by clicking on the far right black icon:

And you know, bad things happen even to the best pipelines out there and jobs sometimes fail.

But, fear not! You can always raise an issue on the pipeline repo on our GitHub and we’ll have a look. Everything is saved on the log file, which facilitates troubleshooting and debugging. You can also go detective mode yourself as well, and access the log file from the Job monitor page.

 

Raising GitHub issues is actually something we encourage a lot as a team, since it helps us keep track of everything and improve the pipeline as we go. As an added bonus, the issues themselves serve like Q&A answers, useful for other users so GitHub has a special place in our heart. Of course, you can also always reach out in real time while on the CloudOS platform by accessing the blue icon on the bottom right corner.

Explore and Download Results

After your job has successfully finished, you can access your results by clicking on the green button. This time, we tried the DeepVariantTrain pipeline, which is a variant caller, so we expect to have a .vcf file in the results folder as shown below. Click on the ••• three dots to Downloadyour results to have a copy on your machine.

You can view the vcf file after downloading it. Further analysis can also be run on the vcf file, for example with the vcftools pipeline, which is also available on CloudOS.

Job monitoring analytics with interactive graphics

After job completion, you can access job monitor analytics, ie CPU and RAM usage per process in the pipeline, by accessing the information ℹ️ button as shown below.

An overview of the job monitoring analytics is shown below, which includes information about the CPU and RAM usage per process in the pipeline:

Now you have an overview of the CloudOS platform and you are ready to run your first analysis. You can experiment with the available examples that include a comprehensive bundle of example input data, preset parameters and curated pipelines. This means that you don’t have to bring anything on the platform (eg. code, data, cloud) on your first time trying it, and you can check how running a pipeline on CloudOS actually looks and feels like straightaway.

Already have analysed data (+ code) sitting somewhere from an old project? Wondering how the experience of running the analysis yourself compares to the CloudOS way? You can then try with your own data and compare with your previous implementations ( for example ease of configuration with cloud, programmer time vs machine time in each case).

Something missing from CloudOS? Bring on your questions or suggestions for new features you would like to see included in the platform and we are happy to get coding.

As mentioned, we particularly encourage raising GitHub issues or contacting us via Twitter, email or in real-time when you run your pipelines from the conversation menu on the CloudOS app, so that we can help.

Not a GUI fan?  You can access CloudOS programmatically and use it as your execution engine with our RESTful API. Stay tuned for an upcoming post exactly on that!

Happy CloudOS-ing!


We would like to know what you think! Please fill out the following form or contact us at hello@lifebit.ai. We welcome your comments and suggestions!

 

Featured news and events

What is a Data Lakehouse?
Continue reading

Lifebit and Lupus Research Alliance Partner to Accelerate Lupus Research through Secure Data Analytics Platform
Continue reading

Lifebit and Flatiron Health Bring Cutting-Edge Research Technology to Japan, Advancing Global Cancer Care through Real-World Data
Continue reading

Lifebit Joins AWS Marketplace to Boost Health Data Research
Continue reading

Streamlining Internal Data Analysis with Trusted Research Environments
Continue reading

Data Security and Compliance in Nonprofit Health Research
Continue reading

Data Harmonization: Overcoming Challenges with Proprietary and Outsourced Datasets
Continue reading

Lifebit, CanPath and AWS Collaborate to Advance Health Research with Innovative Cloud-Based Data Analytics Platform
Continue reading

Maximizing Research Efficiency with Trusted Research Environments
Continue reading

Revolutionizing Pharma: Unlocking the Power of a Global Federated Data Network
Continue reading