Guide for writing openEO User Defined Processes

User defined processes (UDPs) are one out of two standardized options that APEx offers to publish algorithms as a service. This guide gives some concrete steps and guidelines to ensure that your UDP works well for your users. These guidelines are written with APEx in mind, but can also serve as a general guide for openEO UDPs. Where needed, recommendations & choices are made to increase uniformity across the APEx algorithm catalog.

For more background on UDP’s, or a basic tutorial on creating them, the open source Python client provides a good starting point.

Keep in mind that APEx offers an algorithm enhancement service to help you with these steps if needed.

Example cases

The best way to learn how to write a UDP is to look at existing examples:

max_ndvi_composite: UDP code and description.

Organizing your code

A UDP is simply a parametrized version of an openEO process graph, and as such we recommend that you use the same code to develop and test your algorithm, as you use to generate the UDP json. This ensures that your UDP is functionally equivalent to your code. Your code remains your own, and you only need to export the JSON UDP definition to share it with APEx.

It is however advisable for your UDP to link back to a public git repository if available, making your open source code more discoverable.

How many UDPs should I write?

Deciding on the granularity of your UDP is an important aspect of making your algorithm usable. There is no need to try and fit all possible use cases into a single UDP. Instead, consider to define pieces of functionality that can work with a limited set of parameters, and write a single UDP for each piece.

Parameter conventions

This section provides some recommendations on how to name parameters in your UDP. While these are not mandatory, we recommend to consider them to avoid that users of the algorithm catalog would be confused by variations in process parameter names.

Please let us know if you encounter a parameter that could use a convention, and we will add it here.

Spatial filtering

Any algorithm requires spatial filtering, so we can make the life of our users easier if we all use the same naming.

In other openEO processes, the spatial filtering is called spatial_extent, and we recommend to stick to that name as long as you don’t have more specific names.

If you pass on the spatial_extent parameter to all load_collection processes in your UDP, then it is also allowed to perform filtering using vector data. This conveniently allows for advanced spatial filtering use cases!

Temporal filtering

For openEO processes that support arbitrary temporal ranges, we recommend using temporal_extent as the name of parameter to ensure consistency with other openEO processes, such as load_collection.

Many cases also require a time range with a fixed length. In such a case, you can allow to specify only the start or end date, and use the date_shift process to construct the second date in the temporal interval, ensuring a fixed length. This avoids faulty user input.

Avoid use of `save_result`

Most openEO process graphs end in a save_resultprocess. However, this is not recommended for UDPs, as the user may want to perform additional processing steps before generating the result. So having a DataCube (raster or vector) as the final output is recommended unless if your service wants to enforce specific settings on how the output is to be generated.

UDP documentation

The description of your UDP should be quite extensive if you want users to be able to easily assess if it’s suitable for them. We recommend including these sections:

Description: A short description of what the algorithm does.
Performance characteristics: Information on the computational efficiency of the algorithm. Include a relative cost if available.
Examples: One or more examples of the algorithm in action, using images or plots to show a typical result. Point to a STAC metadata file with an actual output asset to give users full insight into what is generated.
Literature references: If your algorithm is based on scientific literature, provide references to the relevant publications.
Known limitations: Any known limitations of the algorithm, such as the type of data it works best with, or the size of the area it can process efficiently.
Known artifacts: Use images and plots to clearly show any known artifacts that the algorithm may produce. This helps users to understand what to expect from the algorithm.

A template is available to help you structure your documentation.

Integrating your openEO process (UDP) in APEx

Once you have an eligible openEO process, you are ready to integrate it in APEx. At this point, you should have an http link to a json document that defines the process, and is publicly accessible. The most common way to do this is to store it in a public git repository. Tagging a release is a good way to ensure that the link remains stable.

Registering your process

Next, you will need to upload a generic JSON to the APEx algorithm catalog to register your process. For now, this is done by creating a pull request in the APEx algorithm catalog Github repository.

An example json is provided below. The properties to modify are listed here:

id: a unique identifier for your process
created and updated timestamps
title: a descriptive title
description: a detailed description of the process
contacts: a list of contacts, with at least one principal investigator
keywords: a list of free form keywords
themes: Applicable concepts from a scheme. Concepts can be found in the ESA Data Ontology
license: the license under which the process is published. You can use the SPDX license list for this. Proprietary licenses are possible, within the terms of your ESA project.

The links section is crucial, the following “rel” values are mandatory:

openeo-process: exactly one link to the json document that defines the process
service: at least one link to an openEO backend that is able to execute the process
example: at least one link to a STAC metadata file that shows the output of the process

The type field of these links should be set to application/json. Please provide a descriptive title for each link, allowing to understand what the link is about.


{
  "id": "max_ndvi_composite",
  "type": "Feature",
  "conformsTo": [
    "http://www.opengis.net/spec/ogcapi-records-1/1.0/req/record-core"
  ],
  "properties": {
    "created": "2024-09-06T00:00:00Z",
    "updated": "2024-09-06T00:00:00Z",
    "type": "apex_algorithm",
    "title": "Max NDVI Composite based on Sentinel-2 data",
    "description": "A compositing algorithm for Sentinel-2 L2A data, ranking observations by their maximum NDVI.",
    "cost_estimate": 1,
    "cost_unit": "platform credits per km\u00b2",
    "keywords": [
      "vegetation"
    ],
    "language": {
      "code": "en-US",
      "name": "English (United States)"
    },
    "languages": [
      {
        "code": "en-US",
        "name": "English (United States)"
      }
    ],
    "contacts": [
      {
        "name": "Jeroen Dries",
        "position": "Researcher",
        "organization": "VITO",
        "links": [
          {
            "href": "https://www.vito.be/",
            "rel": "about",
            "type": "text/html"
          },
          {
            "href": "https://github.com/jdries",
            "rel": "about",
            "type": "text/html"
          }
        ],
        "contactInstructions": "Contact via VITO",
        "roles": [
          "principal investigator"
        ]
      },
      {
        "name": "VITO",
        "links": [
          {
            "href": "https://www.vito.be/",
            "rel": "about",
            "type": "text/html"
          }
        ],
        "contactInstructions": "SEE WEBSITE",
        "roles": [
          "processor"
        ]
      }
    ],
    "themes": [ {
        "concepts": [
          { "id": "NORMALIZED DIFFERENCE VEGETATION INDEX (NDVI)" },
          { "id": "Sentinel-2 MSI" }
        ],
        "scheme": "https://gcmd.earthdata.nasa.gov/kms/concepts/concept_scheme/sciencekeywords"
      }],
    "formats": [
      "GeoTiff", "netCDF"
    ],
    "license": "CC-BY-4.0"
  },
  "linkTemplates": [],
  "links": [
    {
      "rel": "openeo-process",
      "type": "application/json",
      "title": "openEO Process Definition",
      "href": "https://raw.githubusercontent.com/ESA-APEx/apex_algorithms/max_ndvi_composite/openeo_udp/examples/max_ndvi_composite/max_ndvi_composite.json"
    },
    {
      "rel": "service",
      "type": "application/json",
      "title": "CDSE openEO federation",
      "href": "https://openeofed.dataspace.copernicus.eu"
    },
    {
      "rel": "example",
      "type": "application/json",
      "title": "Example output",
      "href": "https://radiantearth.github.io/stac-browser/#/external/s3.waw3-1.cloudferro.com/swift/v1/APEx-examples/max_ndvi_denmark/collection.json"
    }
  ]
}

Defining a validation & benchmark scenario

APEx has the capability to automatically check if your openEO process is working as expected, and if the cost for specific scenarios is sufficiently stable over time. This is a very important feature to avoid that your users have a bad experience. It also helps to ensure that the openEO backend provider you selected is performing well, and that changes to the backend service do not break your process.

APEx requires at least one check, to be able to correctly mark process that are (temporarily) unavailable. When this happens the ‘principal investigator’, as defined in the JSON of the previous step, is informed, allowing to take action as desired.

The benchmark scenarios are defined as JSON files in the benchmark_scenarios folder. The schema of these files is defined (as JSON Schema) in the schema/benchmark_scenario.json file.

Example benchmark definition:

[
  {
    "id": "max_ndvi",
    "type": "openeo",
    "backend": "openeofed.dataspace.copernicus.eu",
    "process_graph": {
      "maxndvi1": {
        "process_id": "max_ndvi",
        "namespace": "https://raw.githubusercontent.com/ESA-APEx/apex_algorithms/f99f351d74d291d628e3aaa07fd078527a0cb631/openeo_udp/examples/max_ndvi/max_ndvi.json",
        "arguments": {
          "temporal_extent": ["2023-08-01", "2023-09-30"],
          ...
        },
        "result": true
      }
    },
    "reference_data": {
      "job-results.json": "https://s3.example/max_ndvi.json:max_ndvi:reference:job-results.json",
      "openEO.tif": "https://s3.example/max_ndvi.json:max_ndvi:reference:openEO.tif"
    }
  },
  ...
]

Note how each benchmark scenario references:

the target openEO backend to use.
an openEO process graph to execute. The process graph will typically just contain a single node pointing with the namespace field to the desired process definition at a URL, following the remote process definition extension. The URL will typically be a raw GitHub URL to the JSON file in the openeo_udp folder, but it can also be a URL to a different location.
reference data to which actual results should be compared.