Step-by-Step Animated Guide: Running Your First Pipeline on CloudOS
Lifebit
Introduction
In this post you will:
- Be introduced to the main CloudOS web app functionalities
- Learn how to run the Google AI DeepVariant pipeline on the CloudOS platform as shown on our latest webinar
- Delve into CloudOS’ interactive graphics for job monitoring analytics (there will be gifs!)
This tutorial is an animated, step-by-step guide on how to run your first pipeline on the CloudOS bioinformatics analysis and deployment platform. With CloudOS, running an analysis is easy as 1-2-3.
- Select a pipeline
- Select input data and parameters, and
- Run analysis
CloudOS takes care of everything else in between, from job scheduling all the way to results. You can, well .. deploy :) your analysis on the CloudOS app either by using the
or by web UI
through our RESTful API. More on how to access CloudOS programmatically on an upcoming post, so stay tuned. For now, let’s take a closer look at what the steps actually look like on the CloudOS web platform.CLI
First things first: The CloudOS platform on wordpress-614922-2173205.cloudwaysapps.com
You can find a link to try the CloudOS platform as soon as you land on our website. You can register for a free account by clicking on the blue button Sign up - free
.
Complete your registration form and welcome aboard ! We provide you with a Lifebit cloud account with pre-loaded credits to start your analyses. You can always switch to your cloud provider later from the Settings
menu.
Pipelines
on the CloudOS platform
After registration, we are redirected to the CloudOS PIPELINES
catalogue, under the PUBLIC PIPELINES & TOOLS
menu. This is the library where all the curated, public pipelines live. You can use the pipelines with your own data and modify the parameters as you wish. Default pre-selected example input data and parameters are available for every pipeline, so you can quickly get started and run your first analysis on the CloudOS platform.
Exploring the CloudOS platform: Navigation bar icons
You can browse the CloudOS app menu from the navigation bar on the left anytime to access the following:
Homepage with an overview of your jobsHome :
Executed projectsJobs :
Pipelines with configured parameters.Projects :
The collection of the pipelines available on CloudOSPipelines:
All the datasets you have uploadedData :
Documentation on how to use CloudOS at our GitBook.ioDocs :
Profile settings (user profile details, linked accounts eg. Cloud, GitHub, Docker, Lifebit API key)Settings :
Selecting DeepVariantTrain
from the CloudOS pipelines catalogue
DeepVariantTrain
As mentioned above, CloudOS is already populated with community curated, containerised pipelines. You can always access and browse the available CloudOS pipelines, by clicking on the pipelines icon in the navigation bar on the left.
The DeepVariantTrain
pipeline, is an nf-core based implementation we have developed at Lifebit for running the Google DeepVariant variant caller. You can find everything in more detail on our Github repository.
Briefly, the DeepVariantTrain
model takes a Deep Learning, image recognition approach on variant calling, which requires the encoding of the BAM
files into images, to subsequently obtain the variants and indels. The output of the prediction is in the standard VCF
format that other commonly used variant callers also return.
Parameters and input data
After selecting the
pipeline, you will land on the DeepVariantTrain
page. Here, you can select your input data to upload and define your parameters. For an elaborate tutorial on how to create your own customised nf-core based pipeline, you can check Phil’s post and get coding.Select data & parameters
For now, let’s go with the example parameters and input data that are already available on CloudOS. Just click on
and your pipeline will be ready to go.Try with example data & parameters
After loading the example, you will notice that you can also preview the respective code snippet at the
dark grey field, as shown above. If you have changed your mind and want to modify your input data and parameters, you can click on the Final command
minus button and clear the arguments. You can always reset your pipelines this way and add different parameters and data before running the analysis.-
Cloud configuration and job scheduling
By now you have successfully selected a pipeline and loaded the example data and parameters. Now it’s time for deploying the analysis! After clicking ‘Run’ on the top righthand of your screen, you will be prompted to select your preferred execution platform (eg. AWS Cloud or Azure Cloud) and choose configuration which boils down to how quick you want your job to be finished.
This means that you can go low (cost) and slow(er), or give your analysis an extra boost if you have increased demands for a specific project and/or need your results as soon as possible.
As shown above, once you click ‘Run job’ you will be notified that your job has been successfully scheduled . You are now free to go have a coffee and return to find your analysis results ready!
Job completion, abort mission and troubleshooting
After clicking
you will then be redirected in the Run job
page (check the rocket icon on the left of your screen),Jobs
where you can monitor your job status, a real-time update of the analysis cost (charged by your Cloud provider), and you can also have a look at the results button, which is initially grey . You will know your results are ready when this has turned green and a checkmark has appeared. Check it out below:
Having second thoughts about running the analysis ? Well, you can always just abort mission by clicking on the far right black icon:
And you know, bad things happen even to the best pipelines out there and jobs sometimes fail.
But, fear not! You can always raise an issue on the pipeline repo on our GitHub and we’ll have a look. Everything is saved on the log file, which facilitates troubleshooting and debugging. You can also go detective mode yourself as well, and access the log file from the
page.Job monitor
Raising GitHub issues is actually something we encourage a lot as a team, since it helps us keep track of everything and improve the pipeline as we go. As an added bonus, the issues themselves serve like Q&A answers, useful for other users so GitHub has a special place in our heart. Of course, you can also always reach out in real time while on the CloudOS platform by accessing the blue icon on the bottom right corner.
Explore and Download Results
After your job has successfully finished, you can access your results by clicking on the green button. This time, we tried the
pipeline, which is a variant caller, so we expect to have a DeepVariantTrain
.vcf
file in the results folder as shown below. Click on the ••• three dots to Download
your results to have a copy on your machine.
You can view the vcf
file after downloading it. Further analysis can also be run on the vcf
file, for example with the
pipeline, which is also available on CloudOS.vcftools
Job monitoring analytics with interactive graphics
After job completion, you can access job monitor analytics, ie
and CPU
usage per process in the pipeline, by accessing the information RAM
button as shown below.
An overview of the job monitoring analytics is shown below, which includes information about the
and CPU
usage per process in the pipeline:RAM
Now you have an overview of the CloudOS platform and you are ready to run your first analysis. You can experiment with the available examples that include a comprehensive bundle of example input data, preset parameters and curated pipelines. This means that you don’t have to bring anything on the platform (eg. code, data, cloud) on your first time trying it, and you can check how running a pipeline on CloudOS actually looks and feels like straightaway.
Already have analysed data (+ code) sitting somewhere from an old project? Wondering how the experience of running the analysis yourself compares to the CloudOS way? You can then try with your own data and compare with your previous implementations ( for example ease of configuration with cloud, programmer time vs machine time in each case).
Something missing from CloudOS? Bring on your questions or suggestions for new features you would like to see included in the platform and we are happy to get coding.
As mentioned, we particularly encourage raising GitHub issues or contacting us via Twitter, email or in real-time when you run your pipelines from the conversation menu on the CloudOS app, so that we can help.
Not a
fan? You can access CloudOS programmatically and use it as your execution engine with our RESTful API. Stay tuned for an upcoming post exactly on that!GUI
Happy CloudOS-ing!
We would like to know what you think! Please fill out the following form or contact us at hello@lifebit.ai. We welcome your comments and suggestions!