Catalogues / lists / repositories of open data sources

Moderator · February 17, 2026, 5:26am

Open data — meta-directories (lists of lists)

This wiki post presents a curated overview of directories, catalogues, registries and curated lists whose primary content is other open data sources — portals, repositories, databases, datasets, or lists of creative assets. This file lists meta-directories only — resources whose value is in pointing to other open data sources. Individual datasets, portals, repositories and asset collections are described in a second wiki post below. This is a wiki post, so you can add to this post; please feel free to enrich/improve where you can!!

1. Cross-domain meta-directories

General-purpose directories that span many types of open data source. The broadest starting points.

DataPortals.org — ~520 open data portals worldwide. Long-standing curated registry maintained by an international group of open data experts; covers local, regional and national levels.
OpenDataSoft – Open Data Sources catalogue — 2,900+ portals worldwide, organised by geography. Generally regarded as the most comprehensive list.
Open Data Inception — Geotagged map view of ~1,600+ open data portals worldwide; built on the OpenDataSoft list but useful for browsing by location.
DataCatalogs.org — Long-running curated catalogue of open data catalogues (now redirects to / merged with DataPortals.org).
CKAN Portals listing — Index of CKAN-powered open data portals; CKAN underpins a large share of government data portals globally.
PortalJS Data Portals listing — Modern catalogue of open data portals maintained by the PortalJS project (Datopian/CKAN ecosystem).
List of open government data sites (Wikipedia) — Wikipedia-maintained country-by-country index of national, regional and municipal OGD portals.
EasyData Open Data Portals Catalogus — Dutch-language catalogue of open data portals (NL-based, global in scope).
Awesome Public Datasets — Topic-organised GitHub list of high-value public datasets; hundreds of entries, community-maintained.
sindresorhus/awesome — The hub-of-hubs: 1,000+ topical “awesome” lists, several of which (datasets, transit, citizen science, ML) are themselves directories of open data sources.
brandonhimpfen/awesome-open-data — Curated list of open data resources, tools and platforms across domains.
CoolDatasets — Curated, lightly categorised collection of public datasets across topics.
Open Data Impact Map — Global database of organisations that use open data (Center for Open Data Enterprise); useful for finding sectoral data users and sources.

2. Directories of government & intergovernmental portals

Meta-lists specifically of official government / IGO open data portals.

List of open government data sites (Wikipedia) — (Also in §1.) The most complete country-by-country index of government portals.
US City Open Data Census — Ranks US cities by their data-sharing policies; doubles as a navigation index to city portals.
Open Data Monitor (EU, legacy) — Index/benchmark of European national open data portals (now largely dormant).
NAPCORE — Coordination body for the EU’s 30+ mobility National Access Points; the authoritative directory of national transport-data portals (see also §7).

3. Registries of research data repositories (generalist)

Cross-disciplinary registries of repositories that hold research data.

re3data.org – Registry of Research Data Repositories — 3,300+ research data repositories across all disciplines, with rich metadata (subjects, certifications, policies, APIs). Run by DataCite + KIT + Purdue + partners. The canonical registry for scientific repositories.
FAIRsharing.org — Curated registry of data standards, databases and policies; ~2,000+ databases catalogued with FAIR-compliance metadata.
OpenDOAR — Global directory of ~6,000 academic open-access repositories (including data); operated by Jisc.
Open Access Directory: Data Repositories — Wiki-maintained directory of data repositories, hosted by Simmons University.

4. Catalogues of databases within a domain

Meta-lists of the major databases in a specific field — the canonical “where are all the databases for X” references.

Life sciences & biomedical

NAR Online Molecular Biology Database Collection — Curated catalogue of ~1,650 molecular-biology and bioinformatics databases, classified into 15 categories and 41 sub-categories; maintained alongside the annual Nucleic Acids Research Database Issue. The canonical meta-list for the life sciences.
FAIRsharing (life sciences view) — (Also in §3.) Especially deep for biomedical databases and standards.

Astronomy

VizieR (CDS) — The most complete library of published astronomical catalogues; ~24,000 catalogues and tables gathered by the Centre de Données astronomiques de Strasbourg. The reference meta-catalogue for astronomy.
NASA Astrophysics Data System (ADS) — 15M+ records; indexes external data catalogues and archives alongside the literature.

Linguistics & language

OLAC – Open Language Archives Community — International federation/meta-catalogue of dozens of language-resource archives (LDC, ELRA, AILLA, ELAR, etc.) searchable through one interface.
CLARIN Virtual Language Observatory — Cross-archive discovery over the CLARIN network’s language resources.

Linked / semantic data

Linked Open Data Cloud — Diagram and dataset of ~1,300 interlinked Linked Open Data datasets across nine domains (geography, government, life sciences, linguistics, media, etc.). Maintained by the Insight Centre for Data Analytics; CC BY.

Cultural heritage

Europeana and DPLA — Each aggregates thousands of institutions, so each effectively functions as a meta-directory of cultural-heritage collections (also listed as sources in the companion file).

5. Dataset search engines & aggregators (that index many sources)

Tools that don’t host data themselves but catalogue/index it across many sources.

Google Dataset Search — Indexes tens of millions of datasets by reading schema.org metadata across the web.
Google Public Data Explorer — Directory + visualiser over public-interest datasets (World Bank, OECD, IMF, US BLS, etc.).
DataCite Commons — Search across ~60 million DOI-registered research outputs.
OpenAIRE Explore — Cross-repository aggregator for European Open Science (~150M items).
BASE (Bielefeld Academic Search Engine) — ~400 million documents across academic repositories, including datasets.

7. Directories of map, transport & traffic data

Meta-lists specific to geospatial and mobility data sources.

NAPCORE — (Also in §2.) Directory of the EU’s 30+ mobility National Access Points.
Mobility Database (MobilityData) — Catalogue of 6,000+ GTFS / GTFS-RT / GBFS public-transport feeds across 99+ countries.
Transitland Atlas — Open feed registry of GTFS / GTFS-RT / GBFS / MDS feeds from 2,500+ operators across 55+ countries.
OpenAddresses — Aggregates 2,600+ open government address sources worldwide (a directory of address datasets as much as a dataset).

9. Curated dataset lists for data journalism

Curated, regularly updated lists aimed at journalists and storytellers — strong for finding interesting rather than merely official datasets.

Data Is Plural — Jeremy Singer-Vine’s weekly newsletter of useful/curious datasets, running since 2015; 1,750+ datasets, with a browsable archive as a “dataset of datasets.”
Data Liberation Project — Initiative (now run by MuckRock + Big Local News) that obtains, documents and publishes hard-to-get government datasets of public interest.
FiveThirtyEight Data — Index of the datasets behind FiveThirtyEight’s data journalism (politics, sports, science, economics), released as plain CSVs.
BuzzFeed News GitHub — Data and analysis behind BuzzFeed News investigations.
ProPublica Data Store — Datasets compiled and cleaned by ProPublica’s investigative team (many free, some priced).
The Pudding — Datasets underlying The Pudding’s visual essays.
Awesome Public Datasets — (Also in §1.) Widely used by data journalists as a starting point.

10. Directories of open design & creative assets

Meta-lists of openly licensed creative assets (icons, fonts, images, CC media).

Open Source Design – Resources — Curated directory of openly licensed icons, fonts, images, CC media and design tools. The best single meta-list for creative open assets.
Openverse — Search engine indexing 800 million+ openly licensed and public-domain images and audio files across hundreds of sources; WordPress Foundation successor to CC Search.
Creative Commons Search — Meta-search across CC-licensed works.

11. Directories for data preservation & “data rescue”

Meta-lists / clearinghouses of preservation efforts (especially the 2025 US federal-data rescue).

Data Rescue Project Portal — Clearinghouse and tracker indexing 1,000+ rescued US public datasets; co-run by IASSIST, RDAP and the Data Curation Network.
Public Environmental Data Partners — Coalition directory of archived/mirrored environmental datasets.

12. Standards, metrics & community references

Not data sources, but the infrastructure and benchmarks that catalogue or rank them.

Open Data Charter — International principles for open data publication.
Open Knowledge Foundation — Organisation behind CKAN, the Open Data Handbook and the Global Open Data Index.
Global Open Data Index — Country-level open-data openness ranking (now mostly historical).
Open Data Maturity Report (EU) — Annual EU benchmarking of national open data maturity.
Open Data Watch / ODIN — Global benchmark of national statistical-office data openness.

shenol · April 20, 2026, 10:32am

Useful. I need to find a tool which analyzes all this data.

mona · April 29, 2026, 11:16am

Wow this is a nice collection! Thanks for compiling these, makes it easier to explore.

Moderator · May 28, 2026, 9:35am

Open data — individual sources

In this post we present a curated overview of individual open data sources: portals, repositories, databases, datasets and openly licensed asset collections. For directories and curated lists of these sources (meta-directories), see the wiki post above. This is a wiki post, so you can add to this post; please feel free to enrich/improve where you can!!

Note: Links live as of the date this list was produced. Items marked are not strictly open-licensed (controlled access, partial open, or commercial-with-free-tier) and are flagged inline.

1. Cross-domain (general)

Large general-purpose open datasets and knowledge bases that don’t fit a single domain.

Wikidata — Wikimedia’s free, collaborative, multilingual structured knowledge base; ~115 million items, CC0. The data backbone behind Wikipedia.
DBpedia — Structured knowledge extracted from Wikipedia, queryable via SPARQL; millions of entities across 125+ languages, interlinked to other Linked Open Data datasets.
Kaggle Datasets — ~400,000+ datasets, community-curated and tied to ML competitions/notebooks.
Hugging Face Datasets — ~500,000+ datasets, ML-focused, with built-in tooling.
Registry of Open Data on AWS — Open datasets hosted on AWS (Common Crawl, Sentinel imagery, genomics, climate, transport); free to access.
freeCodeCamp Open Data — Open datasets, analyses and demos published monthly by the freeCodeCamp community.

2. Government & intergovernmental portals

Official open data portals run by governments and intergovernmental organisations — typically the largest single sources.

Pan-European / EU institutions

data.europa.eu — The official portal for European data. ~1.5 million datasets aggregated from 36+ European countries, EU institutions, agencies and bodies. Built on CKAN.
Eurostat — The EU’s statistical office; thousands of official statistical datasets.
European Environment Agency (EEA) data hub — Environment-focused data hub of the EEA.
INSPIRE Geoportal — Pan-European geospatial open data under the INSPIRE Directive.

National (selected, by data volume / relevance)

Data.gov (USA) — ~300,000+ federal, state, local and tribal datasets. Built on CKAN. politically volatile in 2025–2026; see §11 archives.
data.gov.uk — ~70,000+ UK government datasets.
Government of Canada – Open Government — ~40,000+ federal datasets.
data.gouv.fr — French government open data; ~50,000+ datasets including local authorities.
GovData.de — Open data portal of the German Federation, Länder and municipalities.
data.overheid.nl — Dutch government open data portal; ~25,000+ datasets.
data.gov.in (India) — Indian Open Government Data platform.
data.gov.au (Australia) — National open data portal.
data.gov.sg (Singapore) — Often cited as a quality benchmark.

City-level (selected examples)

NYC Open Data — ~3,000+ datasets; one of the largest, best-maintained city portals.
London Datastore — Greater London Authority open data.
Paris Open Data — City of Paris open data.
Data Amsterdam — City of Amsterdam open data.

Intergovernmental / international organisations

World Bank Open Data — Thousands of global development, economy and poverty indicators. Free, APIs.
UN Data — UN statistical databases aggregating dozens of agencies.
OECD Data — Comparable statistics across OECD members.
IMF Data — Economic and financial data from the IMF.
WHO – Global Health Observatory — World Health Organization’s data portal.
FAOSTAT — Food and agriculture data from 245+ countries.

3. Scientific & research data repositories (generalist)

Multidisciplinary repositories for research data — usually with DOI assignment, versioning and metadata standards.

Zenodo — CERN/OpenAIRE generalist research repository; millions of records, 50 GB per record, DOIs included.
Figshare — Generalist repository; millions of items. Free for public deposits.
Dryad — Curated research-data repository with editorial review; partners with many journals.
Harvard Dataverse — Large Dataverse instance; 150,000+ datasets across disciplines.
Mendeley Data — Elsevier-operated generalist research data repository.
Open Science Framework (OSF) — Free, open-source project hosting + repository; Center for Open Science.
UCI Machine Learning Repository — UC Irvine; one of the oldest and most-cited collections of ML benchmark datasets.
Yelp Open Dataset — Large subset of Yelp businesses/reviews/users as JSON, for academic and educational use.
LODUM (University of Münster) — University open-data initiative publishing institutional data as Linked Open Data.

4. Scientific & research data repositories (domain-specific)

Major discipline-specific repositories — usually the canonical source for their field.

Life sciences & biomedical

NCBI / GenBank — Genetic sequence database; hundreds of millions of records.
European Nucleotide Archive (ENA) — EMBL-EBI sequence archive, fully open.
European Bioinformatics Institute (EBI) — Umbrella for dozens of databases (UniProt, Ensembl, ChEMBL, etc.).
UniProt — Protein sequence/function; ~250M sequences in TrEMBL, ~570k reviewed in Swiss-Prot.
Ensembl — Genome annotation; EMBL-EBI / Wellcome Sanger.
Protein Data Bank (PDB) — 3D macromolecular structures; ~220,000+.
AlphaFold Protein Structure Database — EMBL-EBI + DeepMind; 200M+ predicted protein structures.
Sequence Read Archive (SRA) — NCBI raw sequencing data.
GEO – Gene Expression Omnibus — NCBI gene-expression series.
GBIF — 2+ billion species occurrence records.
UK Biobank — 500,000-participant cohort; access-controlled.

Earth observation, climate & environmental science

Copernicus Data Space — Sentinel satellite data from the EU’s Copernicus programme.
NASA Open Data — ~40,000+ datasets from NASA missions.
NASA Earthdata — NASA’s Earth science data across all DAAC archives.
USGS Earth Explorer — Landsat, aerial and elevation data.
Pangaea — Earth & environmental science data publisher; ~400,000+ datasets.
Climate Data Store (Copernicus CDS) — Climate data, reanalyses and projections.
ECMWF Open Data — Open weather forecast data.
NOAA Open Data Dissemination (NODD) — NOAA Earth observation data on AWS/Azure/GCP.
OBIS — IOC-UNESCO; >100M marine biodiversity occurrences.
EMODnet — European Marine Observation and Data Network.
Global Forest Watch — WRI; near-real-time deforestation/land-use data.

Astronomy & physics

CERN Open Data Portal — Real LHC collision data + simulations.
NASA/IPAC Extragalactic Database (NED) — Objects beyond the Milky Way.
ESA Science Archives — Mission data from ESA spacecraft.
HEPData — High-energy physics data tables; Durham University + CERN.

Social sciences & economics

ICPSR — ~500,000+ files of social science research.
FRED — ~800,000+ economic time series from 100+ sources.
UK Data Service — UK’s largest collection of economic, population and social data.
CESSDA Data Catalogue — Consortium of European Social Science Data Archives.
IPUMS — Harmonised census and survey microdata, global.

Linguistics & language

CLARIN — Pan-European language-resources research infrastructure (ERIC).
Mozilla Common Voice — Crowdsourced speech; 30,000+ validated hours, 100+ languages, CC0.
LDC — Linguistic Data Consortium; open catalogue, mostly licensed resources.
OPUS — Open parallel corpora.

Humanities & cultural heritage

Europeana — ~50 million+ digitised cultural-heritage items from 3,000+ institutions.
DPLA — ~50 million+ items from US libraries, archives and museums.
Open Context — 1M+ CC-licensed archaeological resources.
Smithsonian Open Access — 4.5M+ CC0 records.

5. Search engines & cross-portal aggregators

Tools that don’t host data themselves but index it across many sources.

Google Dataset Search — Indexes tens of millions of datasets via schema.org metadata.
Google Public Data Explorer — Visualiser over World Bank / OECD / IMF / US BLS data.
DataCite Commons — Search across ~60M DOI-registered research outputs.
OpenAIRE Explore — EU aggregator of open research products (~150M items).
BASE — ~400M documents across academic repositories, including datasets.

6. Industry / specialised

Open data for specific industries or use cases.

Humanitarian Data Exchange (HDX) — UN OCHA; ~25,000+ datasets from 1,800+ organisations across 250+ locations.
OpenCorporates — ~225 million+ company records from 130+ jurisdictions.
GLEIF (Legal Entity Identifier) — ~2.7M global legal entity identifiers.
OpenSanctions — Aggregated sanctions and PEP lists; ~1M+ entities.
OFAC Sanctions Lists — US Treasury sanctions data.
OECD Tax Database — Comparative tax data across OECD countries.
CORE — ~290 million open-access research papers + metadata.
Patents: Google Patents Public Data, EPO Open Patent Services, USPTO Open Data, Lens.org — global open patent data.
ENTSO-E Transparency Platform — Pan-European electricity market data from 42 TSOs across 35 countries.
EIA Open Data (US) — US Energy Information Administration.
ECB Data Portal — European Central Bank.
SEC EDGAR — US securities filings.
GDELT Project — Quarter-billion+ global news event records since 1979; via Google BigQuery.
CourtListener — Free Law Project; 10M+ legal opinions, ~17M PACER docs, 16,191 judges.
EUR-Lex Open Data — EU legal documents, 24 languages.

7. Map, geospatial, transport & traffic data

Openly licensed map data, public-transport feeds, real-time traffic, speed limits, addresses, boundaries and related infrastructure.

Foundational map data

OpenStreetMap — World’s largest crowdsourced open geographic database (OSMF, since 2004); billions of features under ODbL. Planet dump (~85 GB PBF) updated minutely. Speed-limit coverage is partial (~12% of roads tagged).
Overture Maps Foundation — Open map data from AWS, Meta, Microsoft, TomTom + 30 members (Linux Foundation). Quarterly GeoParquet releases: Places, Buildings, Transportation, Base, Addresses, Divisions.
Natural Earth — Public-domain vector + raster map data at 1:10m/1:50m/1:110m scales.
GADM — Administrative boundaries to 5 levels; v4.1 has 400,276 areas, v5 released Jan 2026. non-commercial only.
geoBoundaries — CC BY administrative boundaries for every country; commercial use allowed.
Who’s on First — Gazetteer of administrative places with structured identifiers.
GeoNames — 25M+ geographic names, CC BY.
OurAirports — ~85,000 airports worldwide, CC0.

OSM extracts, tooling & derivatives

Geofabrik Downloads — Daily OSM extracts by country/subdivision; PBF + Shapefile. The de facto standard server.
BBBike Extracts — Free user-defined OSM extracts of any polygon; many formats.
Protomaps — Subscription-free map tiles; slice arbitrary OSM regions.
Mapillary — Crowdsourced street-level imagery (CC BY-SA); 2B+ images.
KartaView — Open street-level imagery alternative to Mapillary.

Addresses

OpenAddresses — Aggregates 2,600+ open government address sources; ~600M addresses worldwide.
Overture Addresses theme — Growing global open address dataset in Overture’s releases.

Public transport — feeds

Mobility Database (MobilityData) — 6,000+ GTFS / GTFS-RT / GBFS feeds across 99+ countries; daily freshness checks. Maintained by MobilityData.
Transitland — Aggregator of feeds from 2,500+ operators across 55+ countries; APIs + Transitland Atlas (CC BY).
GTFS.org / GBFS spec — Canonical specification sites (MobilityData).

National Access Points for mobility (EU)

Under EU ITS Directive 2010/40/EU and its Delegated Regulations, every member state must run a National Access Point (NAP) for mobility data (real-time traffic, multimodal travel, truck parking, EV charging). 30+ operational.

NDW – Nationaal Dataportaal Wegverkeer (NL) — Dutch NAP; real-time + historical traffic, work-zones (“Melvin”), bike counts via Dexter. Also hosts the NWB base road map.
Mobilithèque (FR) — French NAP.
Mobilithek (DE) — German NAP (BMDV).
Punto de Acceso Nacional (ES) — Spanish NAP.
Trafiklab (SE) / Entur (NO) — Nordic transport-data hubs.

Real-time traffic — open feeds

Truly open live traffic is rare (TomTom, HERE, Mapbox, INRIX, Google sell it). The open exceptions are mainly EU NAPs and national road authorities.

NDW Real-time data — Dutch real-time speeds/flows/incidents (DATEX II).
transport.data.gouv.fr (FR) — French DATEX II real-time road feeds.
National Highways DATEX II (UK) — Real-time motorway flow/incidents.
511 systems (US, by state) — US state-level real-time traffic feeds.
Waze for Cities — reciprocal city partnership data (not strictly open).

Speed limits

OpenStreetMap maxspeed — Tagged speed limits; ~12% explicit coverage, combine with country defaults.
OSM Default Speed Limits library — Legal default speed limits per country/road type; used by GraphHopper, Valhalla, StreetComplete.
NDW maximum speeds (NL) — Authoritative Dutch speed-limit feeds.
National Highways speed-limit data (UK) — Strategic Road Network speed limits.

Aviation, maritime & rail

OpenSky Network — Community ADS-B + Mode S aircraft tracking; >30 trillion messages.
ADS-B Exchange — Unfiltered live ADS-B; free historical data.
Global Fishing Watch — Open AIS-derived global fishing activity.
AISHub — Community AIS vessel-tracking exchange.

Cycling, walking, micromobility & EV charging

Open Charge Map — Global open EV-charging registry; ~800,000+ points.
CycleStreets — Open cycle-routing data and tools.
Strava Metro — aggregated cycling/walking data; free for agencies, not open-licensed.

Routing & isochrone engines (open backends)

OpenRouteService — Free OSM-based routing/isochrones/matrix API (HeiGIT).
OSRM, Valhalla, GraphHopper — Open-source OSM routing engines.

8. AI training data corpora

Large openly licensed datasets used to train ML / generative-AI models. These carry distinctive legal and ethical caveats, noted inline.

Common Crawl — Open web-crawl repository; 10+ petabytes since 2008, refreshed ~monthly (2B+ pages each). The dominant text source behind most LLMs. a Nov 2025 investigation alleged it under-honoured publisher opt-outs.
LAION — German non-profit; Re-LAION-5B (Aug 2024) is the safety-rescreened replacement for the withdrawn LAION-5B (~5.5B text-image pairs). Backbone of Stable Diffusion.
Common Pile v0.1 — EleutherAI’s ~8 TB copyright-clean text corpus (June 2025); successor to The Pile.
The Stack v2 — BigCode/Hugging Face; permissively licensed source-code corpus.
Mozilla Common Voice — Open speech corpus; 30,000+ hours, 100+ languages, CC0.
Pile of Law — Open ~256 GB legal-text corpus.
Have I Been Trained? (Spawning.ai) — Tool to search LAION-style datasets and opt images out of training.

Caveat: AI corpora sit at the contested edge of “open data.” Copyright (The Pile/Books3), consent (LAION CSAM removal) and opt-out compliance (Common Crawl) are live issues. Verify licences before reuse.

9. Data-journalism datasets

The individual data collections published by data-journalism teams (the curated lists of these live in §9 of the meta-directories file).

FiveThirtyEight Data — Datasets behind FiveThirtyEight’s stories, as plain CSVs.
ProPublica Data Store — ProPublica investigative datasets (many free, some priced).
The Pudding — Datasets underlying The Pudding’s visual essays.
BuzzFeed News GitHub — Data and analysis behind BuzzFeed News investigations.

10. Open design & creative assets

Openly licensed creative assets: icons, fonts, images, audio/video, colour systems.

Icons

The Noun Project — Vast library of CC and purchasable icons.
Tabler Icons, Remix Icon, Iconoir, Font Awesome (Free) — large MIT/CC open SVG icon sets.

Fonts

Google Fonts — Hundreds of open-licensed (mostly OFL) families.
The League of Moveable Type — Curated open-source typefaces.
Open Font Library — Community libre-font repository.
Velvetyne, Open Foundry, Use & Modify — libre type foundries.

Images & photos

Openverse — 800M+ openly licensed and public-domain images and audio (CC Search successor).
Wikimedia Commons — ~110 million freely licensed media files.
Flickr Creative Commons — Large CC-licensed photo pool.
Unsplash, Pexels, Pixabay, StockSnap — large free-to-use photo libraries.
NYPL Public Domain Collections — public-domain digitised images.
Smithsonian Open Access — 4.5M+ CC0 images and records.
Rijksmuseum and The Met Open Access — CC0 fine-art image collections.
unDraw, Open Doodles — open illustration sets.

CC media (audio / video)

Free Music Archive — open/CC music library.
YouTube and SoundCloud Creative Commons pools.

Colour & design systems

Leonardo (Adobe, open source) — accessible colour-contrast generator.
Color Hunt and colors.lol — open colour palettes.

11. Data preservation, archives & “data rescue”

Resources focused on preserving open datasets — increasingly important given the 2025 US federal data removals.

Harvard Library Innovation Lab – Data.gov Archive — 311,000+ datasets (~16 TB) harvested from Data.gov in 2024–2025; updated daily.
DataLumos (ICPSR) — Crowdsourced repository for archiving at-risk US government data.
Internet Archive – Wayback Machine — Web-scale archive including government data pages.
End of Term Web Archive — Archives US federal websites at each administration’s end.
Source Cooperative — Data-publishing utility hosting open datasets and archives (incl. the Data.gov archive).

12. Standards, tooling & community

Not data sources themselves — the infrastructure that produces and packages open data.

Open Data Commons — Standard open-data licences (ODbL, ODC-By, PDDL).
CKAN — Leading open-source data-portal platform; powers Data.gov, data.gov.uk and many national portals.
Open Data Kit (ODK) — Open-source field data-collection tools.
Frictionless Data — OKFN tooling and specs (Data Packages).
DataCite — Global DOI registration for research outputs.
Schema.org Dataset vocabulary — Markup that makes datasets findable by Google Dataset Search.

Topic		Replies	Views
Catalogues / lists / repositories of open-source software Catalogues	3	83	April 29, 2026
Catalogues / lists / repositories of open standards Catalogues open-standards	2	27	April 29, 2026
Best practices for ethical software development Best Practices	0	9	May 28, 2026
About the Privacy category Privacy	0	2	February 17, 2026
About the Catalogues category Catalogues	0	7	February 17, 2026

Catalogues / lists / repositories of open data sources

Open data — meta-directories (lists of lists)

1. Cross-domain meta-directories

2. Directories of government & intergovernmental portals

3. Registries of research data repositories (generalist)

4. Catalogues of databases within a domain

Life sciences & biomedical

Astronomy

Linguistics & language

Linked / semantic data

Cultural heritage

5. Dataset search engines & aggregators (that index many sources)

7. Directories of map, transport & traffic data

9. Curated dataset lists for data journalism

10. Directories of open design & creative assets

11. Directories for data preservation & “data rescue”

12. Standards, metrics & community references

Open data — individual sources

1. Cross-domain (general)

2. Government & intergovernmental portals

Pan-European / EU institutions

National (selected, by data volume / relevance)

City-level (selected examples)

Intergovernmental / international organisations

3. Scientific & research data repositories (generalist)

4. Scientific & research data repositories (domain-specific)

Life sciences & biomedical

Earth observation, climate & environmental science

Astronomy & physics

Social sciences & economics

Linguistics & language

Humanities & cultural heritage

5. Search engines & cross-portal aggregators

6. Industry / specialised

7. Map, geospatial, transport & traffic data

Foundational map data

OSM extracts, tooling & derivatives

Addresses

Public transport — feeds

National Access Points for mobility (EU)

Real-time traffic — open feeds

Speed limits

Aviation, maritime & rail

Cycling, walking, micromobility & EV charging

Routing & isochrone engines (open backends)

8. AI training data corpora

9. Data-journalism datasets

10. Open design & creative assets

Icons

Fonts

Images & photos

CC media (audio / video)

Colour & design systems

11. Data preservation, archives & “data rescue”

12. Standards, tooling & community

Related topics