🔖 Datasource

  • awesome-public-datasets :: A collection of large-scale public datasets on the Internet.
  • common-workflow-language :: Repository for CWL Specfications.
  • datasets :: Original data or Aggregated / cleaned / restructured existing datasets. Released under Creative Commons Attribution-ShareAlike 4.0 International License.
  • Wikidata :: A free linked database that acts as central storage for the structured data of its Wikimedia sister projects including Wikipedia, Wikivoyage, Wikisource, and others, that can be read and edited by both humans and machines.
  • World Bank Open Data :: Free and open access to data about development in countries around the globe.
  • Registry of Research Data Repositories :: provides researchers, funding organisations, libraries and publishers with over 1,000 listed research data repositories from all over the world making it the largest and most comprehensive online catalog of research data repositories on the web.
  • Scientific Databases list on WP.


Systems biology models

Pharmacology database


Genomic / Genetic data

Pathogen data

Econometrics data

Finance data


ASTRONOMY aka AstroPhysics data

Medical Imaging data

Molecular Biology data

  • SASBDB ::Small Angle Scattering Biological Data Bank.


  • Codeneuro-Datasets :: Shared data sets for collaborating, testing, and benchmarking.
  • MindResearchRepository.jl :: Access data sets from the Mind Research Repository.
  • :: A project dedicated to the free and open sharing of functional magnetic resonance imaging (fMRI) datasets, including raw data.
  • Neuroscience Databases list.
  • Neurovault :: A place where researchers can publicly store and share unthresholded statistical maps produced by MRI and PET studies.

Earth Science

Machine learning datasets

  • Machine learning datasets :: A list of the biggest machine learning datasets from across the web.
  • Celeb-DF :: A New Dataset for DeepFake Forensics that contains real and DeepFake synthesized videos having similar visual quality on par with those circulated online. The Celeb-DF dataset includes 408 original videos collected from YouTube with subjects of different ages, ethic groups and genders, and 795 DeepFake videos synthesized from these real videos.
  • UCI Machine Learning Repository

Video data

  • Databrary :: A video data library for developmental science. Share videos, audio files, and related metadata. The source code is on github.