NGI Stockholm detects variants on genomes

The analysis pipeline Sarek is designed to assist Swedish scientists in germline or somatic sequencing

Published on: 11/08/2021
News

 The National Genomics Infrastructure (NGI) in Sweden has developed a pipeline to analyse genome data and sequencing. The umbrella organisation, SciLifeLab, has an open source policy. They intend to contribute to existing open source projects and develop their own to share.

Genome

 

NGI Stockholm is a research facility within the field of genomics. As part of the SciLifeLab (Science for Life Laboratory, Sweden), they follow a strict open source policy. One of their highlighted open source projects is Sarek, a workflow to detect genome variations in any species. The identified genomes are also available as public data.

 

Genomics

SciLifeLab is a national resource for expertise and researchers in areas of biomedicine, ecology, and evolution. NGI deals primarily with technology for massive parallel sequencing and genotyping and supports researchers in Sweden with bioinformatics support.

Genomics is the study of all of a person's genes (the genome), including interactions of those genes with each other and with the person's environment.

- National Human Genome Research Institute, Bethesda, Maryland, US

Under SciLifeLab’s open source policy, servers run on open source and the software must be released under a licence that fosters transparency and cooperation within the academic society. One of the software that NGI has developed is Sarek. It was previously known under the name ‘Cancer Analysis Workflow’ (CAW).

 

Detection of genome variants

The Sarek is built on NextFlow, which enables scientific workflows by using software containers. Sarek detects variants on whole genomes or other sequencing data, both in the cell forming egg, sperm, and fertilised egg and in somatic cells. It pre-processes based on the Genome Analysis Toolkit (GATK) best practices, then identifies variants, and then summarises the information in a MultiQC report. MultiQC is a visualisation tool that summarises all data into an HTML repost (see the detailed workflow) .

Similar to open source policies on software, research data processed by the Sarek pipeline should be treated as public domain:

In the era of FAIR (Findable, Accessible, Interoperable and Reusable) and Open science, datasets should be made available to the public, for example by submitting your data to a public repository.

- Nf-core, Data Management

 

 

Final take-aways

  • Sarek is produced and used under SciLifeLab’s open source policy. They wish to contribute to and produce an open and collaborative academic community.
  • Formerly known as Cancer Analysis Workflow, Sarek detects variants in genomes, regardless of species.