Recommendations for Structural data

Repositories

Several different types of structural data are being collected for Covid-19 research. Some suitable repositories for these are:

  • Structural data on proteins acquired using using any experimental technique (x-ray crystallography, nuclear magnetic resonance) should be deposited in the wwPDB: Worldwide Protein Data Bank via EBI PDBe.

Locating existing data

The COVID-19 Molecular Structure and Therapeutics Hub community data repository and curation service for structure, models, therapeutics, simulations and related computations for research into the COVID-19 pandemic is maintained by The Molecular Sciences Software Institute (MolSSI) and BioExcel.

Data and metadata standards

X-ray diffraction

Electron microscopy

  • Data archiving and validation standards for cryo-EM maps and models are coordinated internationally by EMDataResource (EMDR).
  • Cryo-EM structures (map, experimental metadata, and optionally coordinate model) are deposited and processed through the wwPDB OneDep system, following the same annotation and validation workflow also used for X-ray crystallography and nuclear magnetic resonance (NMR) structures. EMDB holds all workflow metadata while PDB holds a subset of the metadata.
  • Most electron microscopy data is stored in either raw data formats (binary, bitmap images, tiff, etc.) or proprietary formats developed by vendors (dm3, emispec, etc.).
  • Processed structural information is submitted to structural resources as PDBx/mmCIF.
  • Experimental metadata are described in EMDR, see also Lawson et al 2020

NMR

Neutron scattering

  • ENDF/B-VI of Cross-Section Evaluation Working Group (CSEWG) and JEFF of OECD/NEA have been widely utilized in the nuclear community. The latest versions of the two nuclear reaction data libraries are JEFF-3.3 and ENDF/B-VIII.0 (Brown et al., 2018) with a significant upgrade in data for a number of nuclides (Carlson et al., 2018).
  • Neutron scattering data are stored in the internationally-adopted ENDF-6 format maintained by CSEWG.
  • Processed structural information is submitted in the PDBx/mmCIF format.

Molecular Dynamics (MD) simulations

  • Raw trajectory files containing all the coordinates, velocities, forces and energies of the simulation are stored as binary files: .trr, .dcd, .xtc and .netCDF
  • Refined structural models from experimental structural data using MD simulations are stored in .pdb format

Computer-aided drug design data