The European Space Agency
Home
APEx Application Propagation Environments
Main navigation
  • Algorithm Services
  • Project Environments
    Web Portal
    Geospatial Explorer
    Documentation Hub
    User Forum
    Product Catalogue
    Collaborative Workspaces
  • Resources
    • Algorithm Services Catalogue
    • Data Catalogue
  • Documentation
  • FAQ
  • About APEx
  • News
Contact us
Main navigation
  • Algorithm Services
  • Project Environments
  • Resources
  • Documentation
  • FAQ
  • About APEx
  • News
APEx - Documentation Portal
  1. Guides
  2. Creating openEO based services
  • Welcome
  • On-demand EO services
    • Using openEO service
    • APEx-Compliant Platforms
  • Project Tools
    • Use Cases
    • Geospatial Explorer
    • Project Portal
    • User Workspace
    • Interactive Development Environment
    • Product Catalogue
    • Documentation Portal
    • User Forum
  • Algorithm Services
    • On-Demand EO Services
    • Use Cases
    • Algorithm Porting
    • Algorithm Onboarding
    • Algorithm Upscaling
    • Algorithm Enhancement
    • Toolbox Cloudification
    • Algorithm Intercomparison
  • Guides
    • Creating an APEx account
    • Creating APEx single sign-on token
    • Creating openEO based services
    • Creating EOAP based services
    • Upscaling openEO based services
    • Ingesting STAC metadata in APEx Product Catalogue
    • Linking APEx STAC catalogue with an openEO service
    • File format recommendations
  • Interoperability and Compliance Guidelines
    • Definitions & Actors
    • Algorithm Service Development Options
    • Algorithm Developer and Provider Guidelines
    • Algorithm Hosting Platforms Guidelines
    • Geospatial Explorer
    • Federated Business Model
  1. Guides
  2. Creating openEO based services

Guide for writing openEO User Defined Processes

Creating an openEO User Defined Process

User defined processes (UDPs) are one out of two standardised options that APEx offers to publish algorithms as a service. This guide gives some concrete steps and guidelines to ensure that your UDP works well for your users. These guidelines are written with APEx in mind, but can also serve as a general guide for openEO UDPs. Where needed, recommendations and choices are made to increase uniformity across theAPEx Algorithm Services Catalogue.

For more background on UDP’s, or a basic tutorial on creating them, the open source Python client provides a good starting point.

Keep in mind that APEx offers Algorithm Porting and Algorithm Onboarding support to help you with transforming your algorithm into an openEO UDP and onboarding it onto an APEx-compliant hosting platform.

Example cases

The best way to learn how to write a UDP is to look at existing examples:

  • max_ndvi_composite: UDP code and description.

Organizing your code

A UDP is simply a parametrized version of an openEO process graph, and as such we recommend that you use the same code to develop and test your algorithm, as you use to generate the UDP json. This ensures that your UDP is functionally equivalent to your code. Your code remains your own, and you only need to export the JSON UDP definition to share it with APEx.

It is however advisable for your UDP to link back to a public git repository if available, making your open source code more discoverable.

UDP versioning

APEx requires you to use versioning, to ensure that changes in your algorithm are clearly communicated and visible to users. A good way to do this is to use semantic versioning, which is a widely used versioning scheme.

In combination with git tags, this allows you to easily track different versions of your UDP JSON, and share immutable links to specific versions. In addition, you can also create tags that always point to the latest development or stable version.

One use case is an algorithm change that requires you to change parameters. We refer to such a change as ‘backward incompatible’, because users of the existing algorithm will not be able to switch to the new version without an update on their side. In such a case, you can create a new UDP version, and keep the old one available for users that are not ready to switch yet. Also in the UDP catalog, we recommend to keep both versions side-by-side for a certain time frame before removing the old one.

A changelog is also required, to document the changes between versions. Guidance on how to keep changelogs can be found online.

It is not mandatory to store your UDP JSON in the APEx git repository, especially if the project already maintains an open source (git) repository. The APEx repository is only a suggestion, which can be convenient if you do not have an alternative.

As an example, the image below shows how one project effectively used git tags to version multiple UDP’s that are hosted on GitHub. In this case, these tags exist alongside regular software version tags!

git tags example

How many UDPs should I write?

Deciding on the granularity of your UDP is an important aspect of making your algorithm usable. There is no need to try and fit all possible use cases into a single UDP. Instead, consider to define pieces of functionality that can work with a limited set of parameters, and write a single UDP for each piece.

Parameter conventions

This section provides some recommendations on how to name parameters in your UDP. While these are not mandatory, we recommend to consider them to avoid that users of the APEx Algorithm Services Catalogue would be confused by variations in process parameter names.

Please let us know if you encounter a parameter that could use a convention, and we will add it here.

Spatial filtering

Any algorithm requires spatial filtering, so we can make the life of our users easier if we all use the same naming.

In the standard openEO processes load_collection and load_stac, spatial filtering done through the argument called spatial_extent, which support multiple types: a bounding box object, an inline GeoJSON object, a vector cube, or it can even be left empty (null). For maximum flexibility, compatibility and the best user experience, we recommend to perform spatial filtering in your algorithm:

  • directly in load_collection/load_stac (instead of separate filter_bbox/filter_spatial processes)
  • using a parameter named spatial_extent (unless there are good reasons otherwise) spatial filtering in these processes, and also use spatial_extent as the parameter name unless there are good reasons otherwise.

If you pass on the spatial_extent parameter to all load_collection processes in your UDP, then it is also allowed to perform filtering using vector data. This conveniently allows for advanced spatial filtering use cases!

The Python client has a convenience function called Parameter.spatial_extent() to create this parameter when building your UDP via Python.

Temporal filtering

For openEO processes that support arbitrary temporal ranges, we recommend using temporal_extent as the name of parameter to ensure consistency with other openEO processes, such as load_collection.

Many cases also require a time range with a fixed length. In such a case, you can allow to specify only the start or end date, and use the date_shift process to construct the second date in the temporal interval, ensuring a fixed length. This avoids faulty user input.

Avoid use of save_result

Most openEO process graphs end in a save_resultprocess. However, this is not recommended for UDPs, as the user may want to perform additional processing steps before generating the result. So having a DataCube (raster or vector) as the final output is recommended unless if your service wants to enforce specific settings on how the output is to be generated.

Include default parameters within the UDP

Via the [1] extension, platform specific parameters (also known as job options) can be included directly in the process definition. While the primary recommendation is to avoid using platform specific parameters, it is nevertheless a very good idea to include them when they are necessary.

UDP documentation

The description of your UDP should be quite extensive if you want users to be able to easily assess if it’s suitable for them.

We recommend including these sections:

  1. Description: A short description of what the algorithm does.
  2. Performance characteristics: Information on the computational efficiency of the algorithm. Include a relative cost if available.
  3. Examples: One or more examples of the algorithm in action, using images or plots to show a typical result. Point to a STAC metadata file with an actual output asset to give users full insight into what is generated.
  4. Literature references: If your algorithm is based on scientific literature, provide references to the relevant publications.
  5. Known limitations: Any known limitations of the algorithm, such as the type of data it works best with, or the size of the area it can process efficiently.
  6. Known artifacts: Use images and plots to clearly show any known artifacts that the algorithm may produce. This helps users to understand what to expect from the algorithm.

A template is available to help you structure your documentation.

Integrating your openEO process (UDP) in APEx

Once you have an eligible openEO process, you are ready to integrate it in APEx. At this point, you should have an HTTP link to a JSON document that defines the process, and is publicly accessible. The most common way to do this is to store it in a public git repository. Tagging a release is a good way to ensure that the link remains stable.

Registering your process

Next, you will need to upload a generic JSON to the APEx Algorithm Catalogue to register your process. For now, this is done by creating a pull request in the APEx Algorithm Catalogue GitHub repository.

An example json is provided below. The properties to modify are listed here:

  • id: a unique identifier for your process
  • created and updated timestamps
  • title: a descriptive title
  • description: a detailed description of the process
  • contacts: a list of contacts, with at least one principal investigator
  • keywords: a list of free form keywords
  • themes: Applicable concepts from a scheme. Concepts can be found in the ESA Data Ontology
  • license: the license under which the process is published. You can use the SPDX license list for this. Proprietary licenses are possible, within the terms of your ESA project.

The links section is crucial, the following “rel” values are mandatory:

  • openeo-process: exactly one link to the JSON document that defines the process
  • service: at least one link to an openEO backend that is able to execute the process
  • example: at least one link to a STAC metadata file that shows the output of the process

The type field of these links should be set to application/json. Please provide a descriptive title for each link, allowing to understand what the link is about.


{
  "id": "max_ndvi_composite",
  "type": "Feature",
  "conformsTo": [
    "http://www.opengis.net/spec/ogcapi-records-1/1.0/req/record-core",
    "https://apex.esa.int/core/openeo-udp"
  ],
  "properties": {
    "created": "2024-09-06T00:00:00Z",
    "updated": "2024-09-06T00:00:00Z",
    "type": "service",
    "title": "Max NDVI Composite based on Sentinel-2 data",
    "description": "A compositing algorithm for Sentinel-2 L2A data, ranking observations by their maximum NDVI.",
    "cost_estimate": 1,
    "cost_unit": "platform credits per km\u00b2",
    "keywords": [
      "vegetation",
      "ndvi
    ],
    "language": {
      "code": "en-US",
      "name": "English (United States)"
    },
    "languages": [
      {
        "code": "en-US",
        "name": "English (United States)"
      }
    ],
    "contacts": [
      {
        "name": "Jeroen Dries",
        "position": "Researcher",
        "organization": "VITO",
        "links": [
          {
            "href": "https://www.vito.be/",
            "rel": "about",
            "type": "text/html"
          },
          {
            "href": "https://github.com/jdries",
            "rel": "about",
            "type": "text/html"
          }
        ],
        "contactInstructions": "Contact via VITO",
        "roles": [
          "principal investigator"
        ]
      },
      {
        "name": "VITO",
        "links": [
          {
            "href": "https://www.vito.be/",
            "rel": "about",
            "type": "text/html"
          }
        ],
        "contactInstructions": "SEE WEBSITE",
        "roles": [
          "processor"
        ]
      }
    ],
    "themes": [ {
        "concepts": [
          { "id": "NORMALIZED DIFFERENCE VEGETATION INDEX (NDVI)", url: "https://gcmd.earthdata.nasa.gov/kms/concept/2297a00a-80f5-466e-b28e-b9ca42562d3f?format=json" },
          { "id": "Sentinel-2 MSI",  "url": "https://gcmd.earthdata.nasa.gov/kms/concept/fc57a9a0-a287-4bcf-a517-20811b55596b?format=json" }
        ],
        "scheme": "https://gcmd.earthdata.nasa.gov/kms/concepts/concept_scheme/sciencekeywords"
      }],
    "formats": [
      "GeoTiff", "netCDF"
    ],
    "license": "CC-BY-4.0"
  },
  "linkTemplates": [],
  "links": [
    {
      "rel": "application",
      "type": "application/json",
      "title": "openEO Process Definition",
      "href": "https://raw.githubusercontent.com/ESA-APEx/apex_algorithms/max_ndvi_composite/openeo_udp/examples/max_ndvi_composite/max_ndvi_composite.json"
    },
    {
      "rel": "service",
      "type": "application/json",
      "title": "CDSE openEO federation",
      "href": "https://openeofed.dataspace.copernicus.eu"
    },
    {
      "rel": "example-output",
      "type": "application/json",
      "title": "Example output",
      "href": "https://radiantearth.github.io/stac-browser/#/external/s3.waw3-1.cloudferro.com/swift/v1/APEx-examples/max_ndvi_denmark/collection.json"
    },
    {
      "rel": "webapp",
      "type": "text/html",
      "title": "OpenEO Web Editor",
      "href": "https://editor.openeo.org/?wizard=UDP&wizard~process=max_ndvi_composite&wizard~processUrl=https://raw.githubusercontent.com/ESA-APEx/apex_algorithms/main/algorithm_catalog/vito/max_ndvi_composite/openeo_udp/max_ndvi_composite.json&server=https://openeofed.dataspace.copernicus.eu"
    }
  ]
}

Defining a validation & benchmark scenario

APEx has the capability to automatically check if your openEO process is working as expected, and if the cost for specific scenarios is sufficiently stable over time. This is a very important feature to avoid that your users have a bad experience. It also helps to ensure that the openEO backend provider you selected is performing well, and that changes to the backend service do not break your process.

APEx requires at least one benchmark scenario, to be able to correctly mark processes that are (temporarily)unavailable. When this happens, the ‘principal investigator’, as defined in the JSON of the previous step, is informed, allowing to take action as desired.

The benchmark scenarios are defined as JSON files in the benchmark_scenarios folder of your service. The schema of these files is defined (as JSON Schema)in the schema/benchmark_scenario.json file.

Example benchmark definition:

[
  {
    "id": "max_ndvi",
    "type": "openeo",
    "backend": "openeofed.dataspace.copernicus.eu",
    "process_graph": {
      "maxndvi1": {
        "process_id": "max_ndvi",
        "namespace": "https://raw.githubusercontent.com/ESA-APEx/apex_algorithms/f99f351d74d291d628e3aaa07fd078527a0cb631/openeo_udp/examples/max_ndvi/max_ndvi.json",
        "arguments": {
          "temporal_extent": ["2023-08-01", "2023-09-30"],
          ...
        },
        "result": true
      }
    },
    "reference_data": {
      "job-results.json": "https://s3.example/max_ndvi.json:max_ndvi:reference:job-results.json",
      "openEO.tif": "https://s3.example/max_ndvi.json:max_ndvi:reference:openEO.tif"
    }
  },
  ...
]

Note how each benchmark scenario references:

  • the target openEO backend to use.
  • an openEO process graph to execute. The process graph will typically just contain a single node pointing with the namespace field to the desired process definition at a URL, following the remote process definition extension. The URL will typically be a raw GitHub URL to the JSON file in the openeo_udp folder, but it can also be a URL to a different location.
  • reference data to which actual results should be compared.

Defining and updating benchmark reference data

A benchmark scenario should define reference data, with which the actual results of benchmark runs should be compared. The reference_data field of a benchmark scenario is a JSON object that maps each of the openEO batch job result assets (by asset name) to a URL where the reference data can be found, e.g.

{
  "reference_data": {
    "job-results.json": "https://s3.example/max_ndvi.json:max_ndvi:reference:job-results.json",
    "openEO.tif": "https://s3.example/max_ndvi.json:max_ndvi:reference:openEO.tif"
  }
}

The reference data URLs should be publicly accessible. For example, as suggested by the example, as S3 storage URLs. While you are free to manage the hosting of the reference data yourself, it is recommended to just rely on the APEx infrastructure and workflows as follows.

The APEx benchmarking system will automatically store the actual results (if any)of failed benchmark runs in a public S3 bucket managed by APEx, to allow post-mortem inspection of these results. The URLs of these actual result assets will be reported in the benchmark report and can be used directly as reference data URLs for subsequent benchmark runs. Of course, make sure the actual results are properly validated and acceptable under your standards before using them as reference data for future benchmark runs.

Initial reference data

When you are initially defining a benchmark scenario, you may not have the full set of expected reference data and metadata yet. In this case, it’s fine to leave out the reference_data field, make it an empty mapping object, or use invalid placeholder URLs. When the benchmark scenario is executed and the underlying openEO batch job finishes successfully, the actual batch job results will be downloaded first. Next, the full actual result set will be compared to the reference data, which will be missing or incomplete, causing the benchmark run to fail. This failure will trigger the storage of the actual results in the APEx S3 bucket, and the corresponding URLs will be reported in the (pytest based) benchmark run output of the corresponding GitHub workflow run. It will look like this:

-------------------- `upload_assets` stats: {'uploaded': 2} --------------------
- tests/test_benchmarks.py::test_run_benchmark[max_ndvi]:
  - 'actual/openEO.tif' uploaded to 'https://s3.example/gh-1234!max_ndvi!actual/openEO.tif'
  - 'actual/job-results.json' uploaded to 'https://s3.example/gh-1234!max_ndvi!actual/job-results.json'

These URLs can be used as reference data URLs for subsequent benchmark runs.

Updating reference data

When an algorithm is updated, or the underlying data changes for some reason, a benchmark scenario might start to fail on the reference data comparison. Updating the reference data is practically the same as defining it initially:look up the URLs of the actual results in the benchmark report, validate these new results, and update the benchmark scenario with the new reference data URLs.

References

1.
OpenEO Processing Parameters extension
Creating APEx single sign-on token
Creating EOAP based services