High-value datasets (HVD) are defined in the European Commission Regulation 2023/138. They are public sector documents and information that, under the Open Data Directive, are considered to have particular value for society, the environment, and the economy. High-value datasets must be made available free of charge and in machine-readable form through APIs. The European Commission’s regulation is based on the Open Data Directive, which is explained in more detail in step 2 of the operating model. If a document or material does not fall within the scope of the Directive, it is also not considered a high-value dataset under the Regulation.
Licence
Any high-value datasets under the Directive and Regulation must be made available for reuse under a licence that allows for their unrestricted reuse. Suitable licences include the CC0 license or, alternatively, the Creative Commons BY 4.0 licence or an equivalent or less restrictive open license.
Metadata
Public sector bodies with high-value datasets, as defined by the Regulation, must ensure that the datasets are designated as high-value datasets in their metadata descriptions. The Implementing Regulation also contains detailed, sector-specific requirements for metadata. For more information about metadata and how it should be described, see step 6 of the operating model.
What are high-value datasets?
According to the Implementing Regulation, high-value datasets are divided into six thematic categories:
- Geospatial
- Earth observation and environment
- Meteorological
- Statistics
- Company and company ownership
- Mobility
What are the benefits of opening high-value datasets?
According to the European Commission, high-value datasets significantly reduce the barriers to entry into European data-driven markets and increase the reuse of datasets. They help promote research, the creation of new digital services, and the improvement of existing services or business processes.
The reuse of geospatial and mobility data can open up business opportunities for the logistics or transport sector and improve the efficiency of the provision of public services, for example by understanding traffic flows to improve transport efficiency.
Earth observation and environmental data, as well as meteorological data (e.g. radar data, air quality, soil pollution, biodiversity), can be used to support e.g. research and knowledge-based decision-making, especially in combating climate change and its impacts.
Statistical data (e.g. labour market, demographic structure, industrial production) makes it easier to predict the impacts of e.g. possible policy measures.
Company and company ownership data increases market transparency and allows for more accurate targeting of private investments or public support. The wider availability of information concerning businesses has clear social benefits, for example in the fight against crime (including financial crime), increasing civic participation, and promoting transparency in business.
Personal data
The Open Data Directive and the Regulation pertaining to high-value datasets do not apply to documents whose availability or disclosure has been restricted on the basis of the protection of personal data. Compliance with the GDPR must always be ensured with regard to the processing of personal data.