Welcome to SciLifeLab Data Guidelines!¶
SciLifeLab is committed to the principles of FAIR (Findable, Accessible, Interoperable and Reusable) research data, i.e. that data should be easily accessed, understood, exchanged and reused. We work actively to ensure that the investments done by the society in research infrastructure resources can achieve the highest possible impact.
Research data management concerns the organization, storage, preservation, and sharing of data that is collected or analysed during a research project. Proper planning and management of research data will make project management easier and more efficient while projects are being performed. It also facilitates sharing and allows others to validate as well as reuse the data.
The purpose of these guidelines is to serve as an information resource to researchers regarding research data management. Click on any of the data types for guidance on good data management practices during the data life cycle, including available infrastructures for data generation and analysis and appropriate data repositories for sharing. There is also overarching guidance, applicable to all data types, on e.g. metadata standards and managing sensitive data under General information.
Data types: | Generic guidance: | |||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
COVID-19¶General information¶Please see the Swedish COVID-19 Data Portal for the latest information regarding Swedish efforts in COVID-19 research, including data generating facilities. Also see the European COVID-19 Data Portal and Horizon 2020 guidelines regarding COVID-19 for useful information on European level. Data Life Cycle¶The data life cycle is typically divided into design, generation, analysis, storage & archiving, and sharing. Below you will find information about standards and infrastructure resources available during these phases. ![]() Data design¶During this phase you plan for wich data is needed to answer your research question. High quality science is often only possible if the resource facilities you intend to use gets involved already in the planning phase of a project. Consultation and advice regarding data management planning, data generation and data analysis are offered by NBIS and SciLifeLab. It is wise to write a data management plan, using either a tool provided by your university or DS wizard. Also, some resources have specific application periods and thus needs to be contacted well in advance. If your project includes sensitive human data, note that there are ethical and legal issues that you have to consider, such as apply for an ethics approval and report the data processing to your Data Protection Officer. See the page on Sensitive data for more information. Data generation¶SciLifeLab National Genomics Infrastructure (NGI) provide a wide range of sequencing technologies and can offer state-of-the-art solutions for many different types of COVID-19 sequencing projects. Chemical proteomics & proteogenomics and BioMS offers mass spectrometry support. For a complete list please visit Swedish COVID-19 Data Portal. Data analysis¶
Data storage and archiving¶After the project is finished, the data needs to be stored in a backed-up fashion at least for 10 years, and for as long as the data is of scientific value. After this time, some of the data should be archived and some can be disposed. It is best to contact your university Research Data Office for information about the procedures for this. SNIC offers storage for small and medium-sized datasets. In the future also large-sized storage will be offered. Data sharing¶The guidelines in all subsections regarding COVID-19 has been adapted from the Research Data Alliance 5th release of the COVID-19 Data Sharing Recommendations & Guidelines. In general:
The following subsections contain guidelines adressing specific covid-19 data types and resources: Recommendations for Virus Genomics Data¶Repositories¶We suggest that raw virus sequence data as well as assembled and annotated genomes are submitted to ENA.
Data and metadata standards¶A list of relevant data and metadata standards can be found in FAIRsharing, some specific examples are below. We suggest that data is preferentially stored in the following formats, in order to maximize the interoperability with each other and with standard analysis pipelines:
Consider annotating virus genomes using the ENA virus pathogen reporting standard checklist, which is a minimal information standard under development right now and the more general Viral Genome Annotation System (VGAS) (Zhang et al. 2019). For submitting data and metadata relating to phylogenetic relationships (including topology, branch lengths, and support values) consider using widely accepted formats such as Newick, NEXUS and PhyloXML. The Minimum Information About a Phylogenetic Analysis checklist provides a reference list of useful tree annotations. Recommendations for Host Genomics Data¶Host genomics data is often coupled to human subjects. This comes with many ethical and legal obligations, such as apply for an ethics approval and report the data processing to your Data protection officer. See the page on Sensitive personal data for more information. General Recommendations¶
Repositories¶Several different types of host genomics data are being collected for COVID-19 research. Some suitable repositories for these are:
Data and metadata standards¶
Recommendations for Structural data¶Repositories¶Several different types of structural data are being collected for Covid-19 research. Some suitable repositories for these are:
Locating existing data¶The COVID-19 Molecular Structure and Therapeutics Hub community data repository and curation service for structure, models, therapeutics, simulations and related computations for research into the COVID-19 pandemic is maintained by The Molecular Sciences Software Institute (MolSSI) and BioExcel. Data and metadata standards¶X-ray diffraction
Electron microscopy
NMR
Neutron scattering
Molecular Dynamics (MD) simulations
Computer-aided drug design data
Recommendations for Proteomics¶Proteomics studies are used to find biomarkers for disease and susceptibility. Repositories¶For a curated list of relevant repositories see FAIRsharing using the query ’proteomics’. The ProteomeXchange Consortium enables searches across the following deposition databases, following common standards.
Data and metadata standards¶For a curated list of relevant standards see FAIRsharing using the query ’proteomics’. Recommendations for Metabolomics¶Metabolomics studies are used to find biomarkers for disease and susceptibility. Lipidomics is a special form of metabolomics, but is also described in more detail in a separate section because of its special relevance to COVID-19 research. Repositories¶For a curated list of relevant repositories see FAIRsharing using the query ‘metabolomics‘.
Data and metadata standards¶For a curated list of relevant standards see FAIRsharing using the query ‘metabolomics‘.
Recommendations for Lipidomics¶Lipidomics revealed an altered lipid composition in infected cells and serum lipid levels in patients with preexisting conditions. Lipid rafts (lipid microdomains) play a critical role in viral infections facilitating virus entry, replication, assembly and budding. Lipid rafts are enriched in glycosphingolipids, sphingomyelin and cholesterol. It is likely that SARS-CoV-2 enters the cell via angiotensin-converting enzyme-2 (ACE2) that depends on the integrity of lipid rafts in the infected cell membrane. General Recommendations for Researchers¶Lipidomics analysis should follow the guidelines of the Lipidomic Standards Initiative Repositories¶The largest repository for lipidomics data is Metabolights Data and metadata standards¶
Genomics¶The following sections contain guidelines for different genomics data types. Click on any of them for guidance on good data management practices during the data life cycle, including available infrastructures for data generation and analysis and appropriate data repositories for sharing. Data types:
Imaging¶The data life cycle is typically divided into design, generation, analysis, storage & archiving, and sharing. Below you will find information about infrastructure resources available during these phases. ![]() Data design¶During this phase you plan for which data is needed to answer your research question. High quality science is often only possible if the resource facilities you intend to use gets involved already in the planning phase of a project. Consultation and advice regarding data management planning, data generation and data analysis are offered by NBIS and SciLifeLab. It is wise to write a data management plan, using either a tool provided by your university or DS wizard. Also, some resources have specific application periods and thus needs to be contacted well in advance. If your project includes sensitive human data, note that there are ethical and legal issues that you have to consider, such as apply for an ethics approval and report the data processing to your Data Protection Officer. See the page on Sensitive data for more information. Data generation¶Consider to upload the raw data to a repository already when receiving them, under an embargo. This way you always have an off-site backup with the added benefit of making the Data sharing phase more efficient. Facilities which offer data generation services for Imaging:
Data analysis¶Facilities which offer data analysis services for imaging:
Data storage and archiving¶After the project is finished, the data needs to be stored in a backed-up fashion at least for 10 years, and for as long as the data is of scientific value. After this time, some of the data should be archived and some can be disposed. It is best to contact your university for information about the procedures for this. SNIC offers storage for small and medium-sized datasets. In the future also large-sized storage will be offered. Data sharing¶In the era of FAIR (Findable, Accessible, Interoperable and Reusable) and Open science, datasets should be made available to the public. Repositories for Imaging data:¶Depending on the type of image data you have, different public repositories are available, please see the table at BioImage Archive. If you have data that requires controlled access because of personal privacy issues, informed consents, and/or ethical approvals etc, we suggest to store the data locally in a secure environment and make a metadata-only record in the SciLifeLab Data Repository with contact details on how to get access, and for which a DOI (ie a persistent identifier) can be issued. The DOI can then be used in the article to refer to the dataset. Other repositories¶For other domain-specific repositories, see e.g. ELIXIR Deposition databases, Scientific Data recommended repositories, EBI archive wizard (help to find the right repository depending on data type), or FAIRsharing (the latter can also assist in finding metadata standards suitable for describing your datasets). For datasets that do not fit into domain-specifik repositories, use an institutional repository when available (e.g. SciLifeLab Data Repository) or a general repository such as Figshare and Zenodo. Feedback¶Any comments or questions? Please don’t hesitate to send an email to data-management@scilifelab.se Metabolomics¶The data life cycle is typically divided into design, generation, analysis, storage & archiving, and sharing. Below you will find information about infrastructure resources available during these phases. ![]() Data design¶During this phase you plan for which data is needed to answer your research question. High quality science is often only possible if the resource facilities you intend to use gets involved already in the planning phase of a project. Consultation and advice regarding data management planning, data generation and data analysis are offered by NBIS and SciLifeLab. It is wise to write a data management plan, using either a tool provided by your university or DS wizard. Also, some resources have specific application periods and thus needs to be contacted well in advance. If your project includes sensitive human data, note that there are ethical and legal issues that you have to consider, such as apply for an ethics approval and report the data processing to your Data Protection Officer. See the page on Sensitive data for more information. Data generation¶Consider to upload the raw data to a repository already when receiving them, under an embargo. This way you always have an off-site backup with the added benefit of making the Data sharing phase more efficient. Facilities which offer data generation services for Metabolomics:
Data analysis¶Facilities which offer data analysis services for Metabolomics:
Data storage and archiving¶After the project is finished, the data needs to be stored in a backed-up fashion at least for 10 years, and for as long as the data is of scientific value. After this time, some of the data should be archived and some can be disposed. It is best to contact your university for information about the procedures for this. SNIC offers storage for small and medium-sized datasets. In the future also large-sized storage will be offered. Data sharing¶In the era of FAIR (Findable, Accessible, Interoperable and Reusable) and Open science, datasets should be made available to the public. Repositories for Metabolomics data:¶MetaboLights is a database for Metabolomics experiments and derived information. The database is cross-species, cross-technique and covers metabolite structures and their reference spectra as well as their biological roles, locations and concentrations, and experimental data from metabolic experiments. If you have data that requires controlled access because of personal privacy issues, informed consents, and/or ethical approvals etc, we suggest to store the data locally in a secure environment and make a metadata-only record in the SciLifeLab Data Repository with contact details on how to get access, and for which a DOI (ie a persistent identifier) can be issued. The DOI can then be used in the article to refer to the dataset. Other repositories¶For other domain-specific repositories, see e.g. ELIXIR Deposition databases, Scientific Data recommended repositories, EBI archive wizard (help to find the right repository depending on data type), or FAIRsharing (the latter can also assist in finding metadata standards suitable for describing your datasets). For datasets that do not fit into domain-specifik repositories, use an institutional repository when available (e.g. SciLifeLab Data Repository) or a general repository such as Figshare and Zenodo. Feedback¶Any comments or questions? Please don’t hesitate to send an email to data-management@scilifelab.se Proteomics¶The data life cycle is typically divided into design, generation, analysis, storage & archiving, and sharing. Below you will find information about infrastructure resources available during these phases. ![]() Data design¶During this phase you plan for which data is needed to answer your research question. High quality science is often only possible if the resource facilities you intend to use gets involved already in the planning phase of a project. Consultation and advice regarding data management planning, data generation and data analysis are offered by NBIS and SciLifeLab. It is wise to write a data management plan, using either a tool provided by your university or DS wizard. Also, some resources have specific application periods and thus needs to be contacted well in advance. If your project includes sensitive human data, note that there are ethical and legal issues that you have to consider, such as apply for an ethics approval and report the data processing to your Data Protection Officer. See the page on Sensitive data for more information. Data generation¶Consider to upload the raw data to a repository already when receiving them, under an embargo. This way you always have an off-site backup with the added benefit of making the Data sharing phase more efficient. Facilities which offer data generation services for Proteomics:
Data analysis¶Facilities which offer data analysis services for Proteomics:
Data storage and archiving¶After the project is finished, the data needs to be stored in a backed-up fashion at least for 10 years, and for as long as the data is of scientific value. After this time, some of the data should be archived and some can be disposed. It is best to contact your university for information about the procedures for this. SNIC offers storage for small and medium-sized datasets. In the future also large-sized storage will be offered. Data sharing¶In the era of FAIR (Findable, Accessible, Interoperable and Reusable) and Open science, datasets should be made available to the public. Repositories for Proteomics data:¶ProteomeXchange Consortium provide globally coordinated standard data submission and dissemination pipelines involving the main proteomics repositories:
If you have data that requires controlled access because of personal privacy issues, informed consents, and/or ethical approvals etc, we suggest to store the data locally in a secure environment and make a metadata-only record in the SciLifeLab Data Repository with contact details on how to get access, and for which a DOI (ie a persistent identifier) can be issued. The DOI can then be used in the article to refer to the dataset. Other repositories¶For other domain-specific repositories, see e.g. ELIXIR Deposition databases, Scientific Data recommended repositories, EBI archive wizard (help to find the right repository depending on data type), or FAIRsharing (the latter can also assist in finding metadata standards suitable for describing your datasets). For datasets that do not fit into domain-specifik repositories, use an institutional repository when available (e.g. SciLifeLab Data Repository) or a general repository such as Figshare and Zenodo. Feedback¶Any comments or questions? Please don’t hesitate to send an email to data-management@scilifelab.se |
General information¶The following sections contain general guidelines, independent of datatype. Metadata contains information about appropriate standards for (meta)data formats. If sensitive data is part of your project, we recommend reading the Sensitive data page. Also, there is a collection of Data protection officers (for sensitive data processing) and Research data offices (for data management guidance) at the different universities who can assist you further. FAIR principles¶FAIR stands for Findable, Accessible, Interoperable and Reusable:
In Wilkinson, et al 2016 a set of principles were defined for each of these properties. Below, each of the principles are explained further as adapted from FAIR principles translation. F1. (meta)data are assigned a globally unique and persistent identifier¶Explanation: Each data set is assigned a globally unique and persistent identifier (PID), for example a DOI. These identifiers allow to find, cite and track (meta)data. Action: Ensure that each data set is assigned a globally unique and persistent identifier. Certain repositories automatically assign identifiers to data sets as a service. If not, researchers must obtain a PID via a PID registration service. F2. data are described with rich metadata (defined by R1 below)¶Explanation: Each data set is thoroughly (see below, in R1) described: these metadata document how the data was generated, under what term (license) and how it can be (re)used, and provide the necessary context for proper interpretation. This information needs to be machine-readable. Action: Fully document each data set in the metadata, which may include descriptive information about the context, quality and condition, or characteristics of the data. Another researcher in any field, or their computer, should be able to properly understand the nature of your dataset. Be as generous as possible with your metadata (see R1). F3. metadata clearly and explicitly include the identifier of the data it describes¶Explanation: The metadata and the data set they describe are separate files. The association between a metadata file and the data set is obvious thanks to the mention of the data set’s PID in the metadata. Action: Make sure that the metadata contains the data set’s PID. F4. (meta)data are registered or indexed in a searchable resource¶Explanation: Metadata are used to build easily searchable indexes of data sets. These resources will allow to search for existing data sets similarly to searching for a book in a library. Action: Provide detailed and complete metadata for each data set (see F2). A1. (meta)data are retrievable by their identifier using a standardized communications protocol¶Explanation: If one knows a data set’s identifier and the location where it is archived, one can access at least the metadata. Furthermore, the user knows how to proceed to get access to the data. Action: Clearly define who can access the actual data, and specify how. It is possible that data will actually not be downloaded, but rather reused in situ. If so, the metadata must specify the conditions under which this is allowed (sometimes versus the conditions needed to fulfill for external usage/“download”). A1.1 the protocol is open, free, and universally implementable¶Explanation: Anyone with a computer and an internet connection can access at least the metadata. A1.2 the protocol allows for an authentication and authorization procedure, where necessary¶Explanation: It often makes sense to request users to create a user account on a repository. This allows to authenticate the owner (or contributor) of each data set, and to potentially set user specific rights. A2. metadata are accessible, even when the data are no longer available¶Explanation: Maintaining all data sets in a readily usable state eternally would require an enormous amount of curation work (adapting to new standards for formats, converting to different format if specifically needed software is discontinued, etc.). Keeping the metadata describing each data set accessible, however, can be done with much less resources. This allows to build comprehensive data indexes including all current, past and potentially arising data sets. Action: Provide detailed and complete metadata for each data set (see below in R1). I2. (meta)data use vocabularies that follow FAIR principles¶Explanation: The controlled vocabulary used to describe data sets needs to be documented. This documentation needs to be easily findable and accessible by anyone who uses the data set. Action: The vocabularies/ontologies/thesauri are themselves findable, accessible, interoperable and thoroughly documented, hence FAIR. Researchers can refer to metrics assessing the FAIRness of a digital resource (if available). I3. (meta)data include qualified references to other (meta)data¶Explanation: If the data set builds on another data set, if additional data sets are needed to complete the data, or if complementary information is stored in a different data set, this needs to be specified. In particular, the scientific link between the data sets needs to be described. Furthermore, all data sets need to be properly cited (i.e. including their persistent identifiers). Action: Properly cite relevant/associated data sets, in particular by providing their persistent identifiers, in the metadata, and describe the scientific link/relation to your data set. R1. meta(data) are richly described with a plurality of accurate and relevant attributes¶Explanation: Description of a data set is required at two different levels:
Action: Provide complete metadata for each data file. Some points to take into consideration (non-exhaustive list):
R1.1. (meta)data are released with a clear and accessible data usage license¶Explanation: The conditions under which the data can be used should be clear to machines and humans. This has to be specified in the metadata describing a data set. Action: Include information about the license in the metadata. If a particular license is needed, you have to provide it along with the data set. Where possible it is suggested to use common licenses, such as CC 0, CC BY, etc., which can be referred to by URL. R1.2. (meta)data are associated with detailed provenance¶Explanation: Detailed information about the provenance of data is necessary for reuse: this will, for example, allow researchers to understand how the data was generated, in which context it can be reused, and how reliable it is. Provenance is a central issue in scientific databases to validate data. Action: The metadata to thoroughly describe the workflow that led to your data: Who generated or collected it? How has it been processed? Has it been published before? Does it contain data from someone else, potentially transformed or completed? Ideally the workflow is described in a machine-readable format. Criterion I3 is closely linked to this issue when reusing published data sets. R1.3. (meta)data meet domainrelevant community standards¶Explanation: It is easier to reuse data sets if they are similar: same type of data, data organized in a standardized way, well-established and sustainable file formats, documentation (metadata) following a common template and using common vocabulary. If community standards or best practices for data archiving and sharing exist, they should be followed. Note that quality issues are not addressed by the FAIR principles. How reliable data is lies in the eye of the beholder and depends on the foreseen application. Action: Prepare your (meta)data according to community standards and best practices for data archiving and sharing in your research field. There might be situations where good practice exist for the type of data to be submitted but the submitter has valid and specified reasons to divert from the standard practice. This needs to be addressed in the metadata. Metadata¶Good documentation in research projects, describing how the datasets were created, how they are structured, and what they mean, is essential for making your data understandable. Metadata provides such ‘data about data’ , and may include information on the methodology used to collect the data, analytical and procedural information, definitions of variables, units of measurement, any assumptions made, the format and file type of the data and software used to collect and/or process the data. Researchers are strongly encouraged to use community metadata standards where these are in place (see further down). Data repositories may also provide guidance about appropriate metadata standards and requirements e.g. ENA sample checklists. It is highly recommended to, already from the beginning of the project, structure e.g. sample metadata in a way that enables sequence data submission without having to reformat the metadata. Ontologies¶Ontologies, controlled vocabularies and data dictionaries are used to standardize the language used to describe the metadata. Think of the many ways to write that the organism is human (human, Human, homo sapiens, H. sapiens, Homo Sapiens, man, etc), using an ontology such as NCBI taxonomy unifies the language and makes it easier for both humans and machines to interpret and work with the data. While an ontology has a hierarchical structure, a controlled vocabulary is an unstructured set of terms. A Data Dictionary is a user-defined way of describing what all the variable names and values in your data really mean. For a suggested list of ontologies appropriate for Life Science community please see FAIRsharing.org, filter on e.g. Domain. Below are ontology resources, adapted from Table 2 in Griffin PC, Khadake J, LeMay KS et al. Best practice data life cycle approaches for the life sciences. F1000Research 2018, 6:1618. doi: 10.12688/f1000research.12344.2
Data and metadata standards Genomics data¶A list of relevant data and metadata standards can be found in FAIRsharing, some specific examples are below. Gene expression¶Transcriptomics:¶
Microarray-based gene expression data:¶
Genome-wide association studies (GWAS):¶Metagenomics¶
Functional Annotation of Animal Genomes Consortium (FAANG) standards¶
Data and metadata standards Proteomics¶For a curated list of relevant standards see FAIRsharing using the query ’proteomics’ Data and metadata standards Metabolomics:¶For a curated list of relevant standards see FAIRsharing using the query ‘metabolomics’.
Data and metadata standards Lipidomics:¶
Data and metadata standards Structural data / Imaging¶X-ray diffraction
Electron microscopy
NMR
Neutron scattering
Sensitive personal data¶The following is a list of Ethical, Legal and Social Implications (ELSI) that should be considered when working with human data. The content on this page is based on a checklist that has been developed in the Tryggve project. It is intended be used as a tool to document these considerations, and is available as:
Note that the checklist was created with cross-border collaborative projects in mind, but it should be useful for other research projects as well. Before the collection of personal data has begun you should always consult with the Data Protection Officer of your organisation. Ethical reviews and informed consent (more info)¶
GDPR (more info)¶
Other considerations (more info)¶
Clarifications and comments¶Ethical reviews and informed consents¶The purpose of these questions is to spell out what uses the subjects have consented to, and/or for what uses ethical approvals have been given. Then, given the stated research purpose of this project, are the consents and ethical approvals for the datasets compatible with this. GDPR¶State the purpose of processing the personal data¶The GDPR stipulates that to process personal data the controller must do that with stated purposes, and not further process the data in a manner that is incompatible with those purposes (Article 5 - Principles relating to processing of personal data). Who are the data controller of the personal data processed in the project?¶Article 4 (7): “‘controller’ means the natural or legal person, public authority, agency or other body which, alone or jointly with others, determines the purposes and means of the processing of personal data; […].” The Controller is typically the university employer of the PI, and the PI should act as a representative of her university employer and is responsible for ensuring that personal data is handled correctly in her projects. If the project involves more than one legal entity, and joint controllership is considered, make sure that all parties understand their obligations, and it is probably good to define the terms for this in an agreement between the parties. What is the legal basis for processing the personal data?¶Article 6 (1) lists under what conditions the processing is considered lawful. Of these, Consent or Public interest are relevant when it comes to research. You should determine what legal basis (or bases) you have for processing the personal data in your project. Traditionally, consent has been the basis for processing personal data for research, but under the GDPR there cannot be an imbalance between the processor and the data subject for it to be considered to be freely given. In some countries the use of consent as the legal basis for processing by universities for research purposes is therefore not recommended. In those cases, public interest should probably be your legal basis. Note that if your legal basis for processing is consent, a number of requirements exists for the consent to be considered valid under the GDPR. Consents given before the GDPR might not live up to this. Also note that even if public interest is the legal basis, other laws and research ethics standards might still require you to have consent from the subjects for performing the research. Please consult with the Data Protection Officer of your organisation on which legal basis to apply to your data. What are the exemptions for the prohibition for processing of special categories of data (such as health and genetic data) under Art. 9 GDPR used?¶Processing of certain categories of personal data is not allowed unless there are exemptions in law to allow this. Among these categories (“sensitive data”) are “‘[…] data revealing racial or ethnic origin, […] genetic data, […] data concerning health’”. Most types of personal data collected in biomedical research will fall under these categories. Article 9 (2) lists a number of exemptions that apply, of which consent and scientific research are most likely to be relevant for research. Please consult with your Data Protection Officer of your organisation. Have data processing agreements been established between the data controller(s) and any data processors?¶Article 4 (8): “‘processor’ means a natural or legal person, public authority, agency or other body which processes personal data on behalf of the controller.” Examples of this is if you use a secure computing environment provided by another organisation to do your analysis or to store the data, along with several other scenarios. In the case that you do, there needs to be a legal agreement established between the controller(s) and processor(s) as defined in Article 28 (3): “Processing by a processor shall be governed by a contract or other legal act under Union or Member State law, that is binding on the processor with regard to the controller and that sets out the subject-matter and duration of the processing, the nature and purpose of the processing, the type of personal data and categories of data subjects and the obligations and rights of the controller. […]” Article 28 also lists the required contents of such an agreement. Your organisation and/or the processor organisation will probably have agreement templates that you can use. Have Data Protection Impact Assessments (DPIA) been performed for the personal data?¶Where a type of processing is likely to result in a high risk to the rights and freedoms of natural persons, the controller shall, prior to the processing, carry out an assessment of the impact of the envisaged processing operations on the protection of personal data, a so called Data Protection Impact Assessment (DPIA) - Article 35. To clarify when this is necessary, the Swedish Data Protection Authority (DPA) “Datainspektionen” has issued guidance of when an impact assessment is required. Large-scale processing of sensitive data such as genetic or other health related data is listed as requiring DPIAs. The French DPA has made a PIA tool (endorsed by several other DPAs) available that can help in performing these impact assessments. Please also consult your Data Protection Officer of your organisation. What technical and procedural safeguards have been established for processing the data?¶To ensure that the personal data that you process in the project is protected at an appropriate level, you should apply technical and procedural safeguards to ensure that the rights of the data subjects are not violated. Examples of such measures include, but are not limited to, pseudonymisation end encryption of data, the use of computing and storage environments with heightened security, and clear and documented procedures for project members to follow. What happens with the data after project completion?¶The GDPR states that the processing (including storing) of personal data should stop when the intended purpose of the processing is done. There are, however, exemptions to this e.g. when the processing is done for research purposes. Also, from a research ethics point of view, research data should be kept to make it possible for others to validate published research findings and reuse data for new discoveries. This is also governed by what the data subjects have been informed about regarding how you will treat the data after project completion. The recommendation is to deposit the sensitive data in the appropriate controlled access repositories if such are available, but this requires that the data subjects are informed and have agreed to this. Other considerations¶There might also exist other national legal or procedural considerations for cross-border research collaborations. Other laws might affect how and if data can or cannot be made available outside the country of origin. The operating procedures of government authorities or other organisations might create obstacles for sharing data across borders. To make sure that it is clear how original and derived data, as well as results, can be used by the parties after the project completion, consider establishing legal agreements that defines this. This can include e.g. reuse of data for other projects or intellectual property rights derived from the research project. Data Protection Officer (dataskyddssombud)¶This is the person that is responsible for ensuring that the data processing of sensitive data adheres to the GDPR. You should report personal data processing to this person.
Reseach Data Office (RDO)¶Some of the universities have established RDO or Data Access Unit (DAU), in order to help with data management questions. Also, the libraries can most often give advice or redirect to local instances.
For other sites, the Swedish National Data Service (SND) network is listed here. |
These pages are provided to you by NBIS data management team and SciLifeLab Data Centre. You can reach us by sending an email to data-management@scilifelab.se.