The European Space Agency
Home
APEx Application Propagation Environments
Main navigation
  • Algorithm Support
  • Project Environments
    QGIS
    CodeServer
    JupyterLab
    Web Portal
    Geospatial Explorer
    Documentation Hub
    User Forum
    Product Catalogue
  • Resources
    • Algorithm Catalogue
    • Data Catalogue
    • Geospatial Explorer
    • Demo Project Environment
  • Community
    • Documentation
    • User Forum
    • FAQ
  • About APEx
    • Mission Statement
    • News
Contact us
Main navigation
  • Algorithm Support
  • Project Environments
  • Resources
  • Community
  • About APEx
APEx - Documentation Portal
  1. Guides
  2. Developer Guides
  3. File format recommendations
  • Welcome
  • On-demand EO services
    • Using openEO service
    • Supported Platforms
  • Project Environments
    • Use Cases
    • Customisation
    • Accessing your environment
    • Geospatial Explorer
    • Project Web Portal
    • Code Server IDE
    • JupyterLab IDE
    • QGIS as a Remote Desktop
    • Product Catalogue
    • Documentation Portal
    • User Forum
  • Algorithm Support
    • On-Demand EO Services
    • Use Cases
    • Algorithm Porting
    • Algorithm Onboarding
    • Algorithm Upscaling
    • Algorithm Enhancement
    • Toolbox Cloudification
    • Algorithm Intercomparison
  • Guides
    • Developer Guides
      • Authentication
        • Creating an APEx account
        • Creating APEx single sign-on token
      • openEO
        • Creating openEO based services
        • Upscaling openEO based services
      • EOAP
        • Creating EOAP based services
      • APEx Product Catalogue
        • Ingesting STAC metadata in APEx Product Catalogue
        • Linking APEx Product Catalogue with an openEO service
      • APEx Algorithm Catalogue
        • Registering your service
        • Creating benchmarks for your service
      • File format recommendations
    • Admin Guides
      • Creating an APEx account
      • Custom domains for your project environment
      • Geospatial Explorer
        • Configuring the APEx Geospatial Explorer
      • Project Web Portal
        • Login to the Project Web Portal and the Drupal content overview
        • Manage web pages or add a new page
        • Edit web pages via paragraphs
        • Add content and/or visuals
        • Add a Call-To-Action (CTA)
        • Add news items and an overview of the latest or all news
        • Add an event and an overview of the latest or all events
        • Add a web form
        • Add a logo (partners) banner
        • Add publications or downloads (files)
        • Edit the menu navigation
        • Edit the footer
        • Customizing the look and feel
    • Consumer Guides
      • Using the Geospatial Explorer
  • Interoperability and Compliance Guidelines
    • Definitions & Actors
    • Algorithm Service Development Options
    • Algorithm Developer and Provider Guidelines
    • Data Provider Guidelines
    • Algorithm Hosting Platforms Guidelines
    • Geospatial Explorer
    • Federated Business Model
  1. Guides
  2. Developer Guides
  3. File format recommendations

File format recommendations

Within the APEx interoperability and compliance guidelines, DATA-REQ-01 specifies the file formats that projects should use when publishing static data. The information on this page complement that requirement with more detailed recommendations on the use of these formats. Projects are encouraged to consult with the APEx team if they have a use case that does not fit within these guidelines.

Cloud Optimised GeoTIFF

Cloud Optimised GeoTIFF (COG) is the most widely supported format for geospatial raster data, and also one of the most efficient in terms of access costs. It is recommended as the default option to consider when publishing static data products. When combined with STAC metadata, COG can produce self-describing, FAIR-compliant datasets that are easily consumable by various services.

Generating Cloud Optimised GeoTIFF

If you are not familiar with COG or GeoTIFF generation, it is recommended to format your GeoTIFF files using a recent version of GDAL to ensure that they are compliant:

gdal_translate world.tif world_webmerc_cog.tif -of COG

Organising spatiotemporal multi-band datasets

Many datasets have multiple bands (or ‘variables’), or have a date associated with them. The general recommendation is to store a single band per GeoTIFF file and to create STAC items with one asset per band. This layout is commonly used by many other datasets and avoids the complexities of multi-band GeoTIFF files, which can be challenging to use.

An exception to this are the ‘RGB’ style products, where three bands are used to represent a single image. In this case, creating a Cloud Optimised GeoTIFF with three bands is an option.

For associating time information, create one GeoTIFF per timestamp, and one STAC item per timestamp. The GeoTIFF format has not built-in support for conveying time information, but STAC metadata is supporting this very well.

Visualisation in APEx Geospatial Explorer

To optimise visualisation in the APEx Geospatial Explorer, it is recommended to use the GoogleMapsCompatible tiling scheme- typically 256x256 pixel tiles aligned to a global grid. The default Coordinate Reference System (CRS) used in the Geospatial Explorer is Web Mercator projection (EPSG:3857) and therefore all datasets in this projection will be supported. On the fly reprojection and / or configuration of a Geospatial Explorer instance to alternative CRS’s is feasible, although we advise contact the APEx team for specific advice when using alternative projections. The BitsPerSample field must accurately reflect the data format. Overviews are essential for performance and should be generated using downsampling by factors of two until the image dimensions are the size of a tile or smaller. These overviews should also be tiled and placed after the main image data to conform with the COG specification.

(Geo-)Zarr

Zarr is a format that is gaining traction in the geospatial community, although it is not yet as widely supported as Cloud Optimised GeoTIFF. Its main advantage lies in its ability to store complex multi-dimensional datasets that go beyond a simple 4D (x, y, time, bands) structure. Just like COG, Zarr allows for efficient cloud access.

At the time of writing, there are, however these important caveats:

  • GeoZarr aims to define how to store geospatial data in Zarr format, but this standard is not fully developed and lacks widespread tooling support.
  • Overview pyramids for fast online rendering are not yet supported.
  • By design, Zarr allows for many degrees of freedom, which requires the data producer to have a good understanding of the associated trade-offs.
  • By design, Zarr stores data as separate files in a directory structure, optimising cloud access but making direct downloads less convenient.

NetCDF

NetCDF is a self-describing format with some properties similar to Zarr, but less optimised for cloud access. It can be useful for exchanging data cubes as single files through traditional methods. However, it is less recommended for convenient sharing of large datasets, for which either COG or Zarr provide better options.

Statistical Datasets (FlatGeobuf, GeoJSON)

Statistical datasets can be used to store precomputed statistics for dataset variables based on spatial units, such as administrative areas. An example is to collect land cover statistics on using boundaries from nomenclature of territorial units for statistics (NUTS), as shown in the APEx Geospatial Explorer (Statistics). The guidelines in this section are focused on supporting the integration of statistical data for visualisation in the APEx Geospatial Explorer.

The statistical datasets are expected to be vector layers that are provided in a format that can be parsed to a feature collection following the GeoJSON [1] specification. Currently tested and supported formats are GeoJSON [1] and FlatGeobuf [2]. FlatGeobuf should be used where the statistical data is a large size as this allows for streaming of the relevant features without having to download the full dataset, increasing performance.

The metadata header of the file should contain the following properties to define which fields on the features in the dataset should be used for the following purposes.

  • identifierKey: The name of the field that stores the unique identifier for each feature.
  • nameKey: The name of the field that stores the human-readable name for display.
  • levelKey: The name of the field that stores the administrative level number.
  • childrenKey: The name of the field that has a comma-separated list of child feature IDs as declared in identifierKey. Can be the empty string if this is the bottom level.
  • attributeKeys: A comma-separated list of field numbers that store the statistical data.
  • units: The units as displayed in the UI. This is for UI purposes only and has no effect on the data.
  • visualization_hint: A string of histogram, categorised, or continuous used as a hint to the UI to choose a suitable presentation for the data.

For example, properties in the file metadata that is defined as follows:

  • identifierKey: NUTS_ID
  • nameKey: NUTS_NAME
  • levelKey: LEVL_CODE
  • childrenKey: children
  • attributeKeys: Trees, Shrubland, Grassland
  • visualization_hint: categorised

would use the fields NUTS_ID, NUTS_NAME, … in the data to determine the navigation and display of statistics in the Geospatial Explorer. For further guidance, please contact the APEx team through the APEx User Forum.

Datasets that have classifications (such as land use) should have key:value entires consisting of ‘name’:‘value’ and an entry with a key of ‘classifications’ with a value consisting of a string based comma separated list containing all the keys for the classifications and a ‘total’ key with the sum of all other values. This will allow for correctly rendering bar charts and pie charts.

{
  Bare / sparse vegetation: 3349.349614217657,
  Built-up: 18474.280639104116
  Cropland: 155067.6934300016
  Grassland: 140178.79417018566
  Herbaceous wetland: 1612.828666906516
  Mangroves: 479.46053523623897
  Moss and lichen: 499.40601429089236
  Permanent water bodies: 8969.837211370474
  Shrubland: 7342.96093361589
  Snow and ice: 495.7695064816955
  Tree cover: 301783.0035618253
  Unknown: 1.7258467103820294
  total: 638255.1101299465
  classifications: "Tree cover,Shrubland,Grassland,Cropland,Built-up,Bare / sparse vegetation,Snow and ice,
  Permanent water bodies,Herbaceous wetland,Mangroves,Moss and lichen,Unknown"
}

worldcover_bar_chart_example

Datasets that do not have classifications (such as a raster showing soil organic carbon) should contain a selection of the following entries:

  • mean
  • min
  • max

These values will be rendered as a table.

{
  mean: 437.94353402030356
  min: 60
  max: 4410
}

worldsoils_table_example

References

1.
GeoJSON
2.
FlatGeobuf
Creating benchmarks for your service
Custom domains for your project environment