Data Provider Guidelines
Requirements
Table 1 outlines the interoperability guidelines for EO projects and data providers who wish to deliver their datasets to APEx for integration within the ESA Project Results Repository (PRR) for long-term preservation and their utilisation within the APEx Project Environments. By fulfilling these requirements, APEx ensures seamless integration, discoverability, and usability of the datasets across the ESA EO ecosystem, facilitating broader access and reusability within the EO community.
Most of these requirements focus on standardising dataset metadata, formats, and access methods to ensure compatibility with existing tools and support their efficient exploitation. In particular, datasets should adhere to well-established EO data standards and provide consistent, machine-readable metadata descriptions.
APEx supports integration primarily through recognised standards such as STAC (SpatioTemporal Asset Catalogue) and cloud-native data formats. This ensures that almost any EO dataset can be made available as a ready-to-use resource in the ESA PRR and used through the APEx tooling.
Overall, the objective is to streamline and simplify the delivery of high-quality, interoperable EO datasets to APEx, fostering wider adoption and enabling advanced use cases in downstream applications.
| ID | Requirement | Description |
|---|---|---|
| DATA-REQ-01 | EO project results with respect to raster data, shall be delivered as cloud-native datasets. | Where possible, cloud optimized GeoTIFF [1] is preferred. For more complex datasets, CF-Compliant netCDF [2] is a good alternative. Use of the still evolving GeoZarr [3] format requires confirmation by APEx and may result in future incompatibility if the selected flavour is not standardised eventually. Additional recommendations for the usage of file formats within the APEx services are available on the APEx documentation. |
| DATA-REQ-02 | EO project results with respect to vector data, shall be delivered as cloud-native datasets. | Small datasets can use GeoJSON [4], FlatGeobuf [5] or GeoParquet [6] are recommended for larger datasets. |
| DATA-REQ-03 | EO project results should be accompanied with metadata in a STAC [7] format, including applicable STAC extensions. | The specific STAC profiles will align with the recommendations will align with the recommendations provided in the Metadata Recommendations section. |
For more details regarding the recommended file formats and their usage within APEx, please refer to the APEx File Format Recommendations.
Metadata Recommendations
Format Specific Recommendations
When sharing geospatial datasets in cloud-optimised formats, such as Cloud Optimised GeoTIFF (COG), NetCDF, and Zarr, it is essential to embed as much relevant metadata as possible directly within the files. Although these formats are designed for efficient cloud access, their interoperability potential is enhanced when the files carry rich, standardised metadata aligned with their respective specifications. Doing so not only improves data reuse by third-party tools but also enables more reliable automatic inference of STAC metadata during cataloguing or dataset publication.
APEx recommends that the following details be incorporated into the file metadata:
- The projection system used to present the data within the file
- he Nodata value applied
- The unit of measurement for values represented in the dataset
- A definition of the colour map or legend utilised for the dataset visualisation in case of categorical data.
- Band or variable names and descriptions
For more details and examples on adding this additional metadata to your results, please consult the specific tools (e.g. gdal, rasterio, …) for generating the results.
STAC Metadata Recommendations
The STAC specification provides a comprehensive and interoperable framework for describing geospatial datasets. Within APEx, STAC serves as the foundation to enhance the discoverability, interoperability, and integration of data across a range of platforms, data catalogues, including the ESA Project Results Repository, and tools such as the APEx Geospatial Explorer.
To enhance interoperability, data providers are advised to consistently use a recommended set of STAC-related extensions and best practices. These recommendations come from community input and collaboration with other initiatives, like EarthCODE and EOEPCA, to ensure consistency across projects and promote the adoption of best practices.
Table 2 offers a summary of the suggested metadata. For further details, please refer to the resources listed below.
| ID | Scenario | Level | Requirement | Description |
|---|---|---|---|---|
| General | ||||
| METADATA-REC-01 | Collection / Item | The STAC collection and items should use STAC 1.1 or higher. | ||
| METADATA-REC-02 | Collection | The STAC collection must follow the ESA PRR collection specifications. | This guarantees the collection is compatible for upload and registration in the ESA Project Results Repository. | |
| METADATA-REC-03 | Collection / Item | Collections must be homogeneous: each item has the same assets and uses the same asset keys. | Consistent and homogeneous definition of assets simplifies client-side handling and supports datacube generation. | |
| METADATA-REC-04 | Item | Each item must have at least one asset where the role is set to data. | This allows for accurate identification of assets containing the data. | |
| METADATA-REC-05 | Item |
Each asset must include:
|
These properties help tools and platforms accurately import the dataset. Furthermore, the file properties allow the ESA PRR to perform extra quality checks. | |
| METADATA-REC-06 | Data Visualisation | Item |
In the case of categorical datasets, classification extension is recommended to identify the different classes used in the asset. For additional visualization support, it is recommend setting the title and color_hint properties to allow external tools, such as the APEx Geospatial Explorer to properly visualise the data. |
The classification extension supports the proper interpretation of categorical data that is included in the collection, item or asset. |
| METADATA-REC-07 | Data Visualisation | Item / Asset |
To support the visualization of the dataset, the render extension is recommended. The render extension allows the definition of the following properties:
|
The render extension supplies rendering tools, like the APEx Geospatial Explorer, with key data to auto-configure visualization settings, including rescaling and colour maps. |
| Datacube Formats (netCDF, ZARR, …) | ||||
| METADATA-REC-08 | Data Processing | Collection / Item |
The datacube extension (v2.x) should be used to properly describe the datacube:
|
The extension enables correct data parsing into a datacube by the platform or tool. A variable can be bands in EO data or meteorological variables like rain or temperature in meteorological data sets. |
| Raster Formats (COG) | ||||
| METADATA-REC-09 | Data Processing,Data Visualisation | Item |
The projection extension must be used to identify the CRS, raster bounds and shape. At minimum the following must be defined:
|
The projection extension ensures that any tools accessing the data can accurately determine key raster properties without the performance overhead of inspecting the raster file. If the goal is to visualise your data through the APEx Geospatial Explorer, please consider the projections that are currently supported, as defined in EXPLORER-REQ-07. |
| METADATA-REC-10 | Data Processing,Data Visualisation | Item | To incorporate a time dimension, the item must define a datetime, start_datetime and end_datetime at the item level. Both properties contain a single value using ISO8601 time intervals. |
These properties enable tools to accurately identify the data’s temporal dimension, simplifying search and filtering within the STAC collection. For temporal dimensions, it is recommended to maintain the original level of granularity; data should not be aggregated from daily records to a single label unless specifically instructed by the user or noted in the metadata. When combining data and overlap exists, the user must indicate the methodology unless indicated in the metadata. |
| METADATA-REC-11 | Data Processing,Data Visualisation | Item |
The bands array must be used to identify band information in the raster, keep the order as identified in the array.
|
|
| METADATA-REC-12 | Item | For other dimensions, the datacube extension must be provided. | ||
| METADATA-REC-13 | Data Processing,Data Visualisation | Item |
The raster extension must be used to accurately specify the following attributes associated with the raster file:
|
The raster extension offers valuable information about the dataset, eliminating the need to directly access or analyse the data itself. For instance, when visualising, details like scale and offset can help convert raw values into real-world figures. |
| Statistical Data (FlatGeobuff, GeoJSON) | ||||
| METADATA-REC-14 | Data Visualisation | Item | It is recommended that STAC records for statistics include a datetime property that matches the time stamp of the source data that the statistics are derived from. The description should also make reference to the boundary datasets (e.g. NUTS) that the statistics represent. | This information offers clear insights into the statistical data and its applications and also assists in integrating this data into the APEx Geospatial Explorer. |