De novo genome sequencing¶
The data life cycle is typically divided into design, generation, analysis, storage & archiving, and sharing. Below you will find information about infrastructure resources available during these phases.
Data design¶
During this phase you plan for which data is needed to answer your research question. High quality science is often only possible if the resource facilities you intend to use gets involved already in the planning phase of a project. Consultation and advice regarding data management planning, data generation and data analysis are offered by NBIS and SciLifeLab.
It is wise to write a data management plan, using either a tool provided by your university or DS wizard.
Also, some resources have specific application periods and thus needs to be contacted well in advance. If your project includes sensitive human data, note that there are ethical and legal issues that you have to consider, such as apply for an ethics approval and report the data processing to your Data Protection Officer. See the page on Sensitive data for more information.
Data generation¶
Consider to upload the raw data to a repository already when receiving them, under an embargo. This way you always have an off-site backup with the added benefit of making the Data sharing phase more efficient.
Facilities which offer data generation services for De novo genome sequencing:
- NGI (National Genomics Infrastructure) offers an infrastructure equipped with a comprehensive range of technology platforms for next generation sequencing (NGS) and genotyping.
Data analysis¶
Facilities which offer data analysis services for De novo genome sequencing:
- NBIS support (National Bioinformatics Infrastructure Sweden) national research infrastructure offers bioinformatic support in various forms for a wide range of areas including NGS, proteomics, metabolomics and biostatistics.
Data storage and archiving¶
After the project is finished, the data needs to be stored in a backed-up fashion at least for 10 years, and for as long as the data is of scientific value. After this time, some of the data should be archived and some can be disposed. It is best to contact your university for information about the procedures for this.
SNIC offers storage for small and medium-sized datasets. In the future also large-sized storage will be offered.
Data sharing¶
In the era of FAIR (Findable, Accessible, Interoperable and Reusable) and Open science, datasets should be made available to the public.
Repositories for de novo genome sequencing data (non-human)¶
ENA¶
The ENA hosts an instance of the Sequence Read Archive (SRA), the same archive that exists on NCBI. SRA accepts raw sequence data from any sequencing platform, generated in any research project.
There are several ways to submit data to ENA, including extensive documentation on programmatic submissions.
Repositories for de novo genome sequencing data (human)¶
NBIS is building a local federated version of the European Genome-phenome Archive (EGA) in Sweden (EGA-SE), allowing for the publication of sensitive personal data within a legal framework. Until local EGA is available, the dataset should remain in the secure analysis environment (eg at Bianca on Uppmax). We suggest to make a metadata-only record in the SciLifeLab Data Repository with contact details on how to get access, and for which a DOI (ie a persistent identifier) can be issued. The DOI can then be used in the article to refer to the dataset. Once the Swedish EGA is operational, and the dataset deposited there, the access information can be changed to point at the EGA ID. See https://doi.org/10.17044/scilifelab.12292778, for an example.
Other repositories¶
For other domain-specific repositories, see e.g. ELIXIR Deposition databases, Scientific Data recommended repositories, EBI archive wizard (help to find the right repository depending on data type), or FAIRsharing (the latter can also assist in finding metadata standards suitable for describing your datasets). For datasets that do not fit into domain-specifik repositories, use an institutional repository when available (e.g. SciLifeLab Data Repository) or a general repository such as Figshare and Zenodo.
Feedback¶
Any comments or questions? Please don’t hesitate to send an email to data-management@scilifelab.se