Algorithm Provider Guidelines
Table 1 outlines the interoperability prerequisites required for algorithm providers, such as EO projects, to host their workflows and algorithms within APEx. By satisfying these requirements, APEx guarantees the successful integration of workflows and algorithms and ensures reusability within the broader EO community.
We highlight that the majority of these requirements apply to EO projects that build an on-demand service to be exposed via an HTTP-based API. Projects do not need to publish any service are not affected by these requirements.
In terms of creating on-demand services, APEx currently supports two main interface standards: openEO or OGC API Processes. This selection should support almost any possible on-demand service. When unsure, you can contact the APEx team for advice.
Finally, note that APEx also provides support to projects that need to fulfil these requirements. This support includes offering a framework to run automated tests and providing packages to help with enhancing your algorithms. These are referred to as propagation services.
In general, the aim is to simplify the process of building high-quality on-demand services rather than to add complexity.
ID | Requirement | Description |
---|---|---|
PROV-REQ-01 | EO project results with respect to raster data, shall be delivered as cloud-native datasets. | Where possible, cloud optimized GeoTIFF is preferred. For more complex datasets, CF-Compliant netCDF is a good alternative. Use of the still evolving GeoZarr format requires confirmation by APEx and may result in future incompatibility if the selected flavour is not standardized eventually. |
PROV-REQ-02 | EO project results with respect to vector data, shall be delivered as cloud-native datasets. | Small datasets can use GeoJSON, GeoParquet is recommended for larger datasets. |
PROV-REQ-03 | EO project results with respect to data should be accompanied with metadata in a STAC format, including applicable STAC extensions. | The specific STAC profiles to be applied will be defined throughout the project. |
PROV-REQ-04 | EO project results shall include documentation that addresses the scientific and technical limitations of the algorithm. | For instance, the ability of the algorithm to generalize across space and time, input data requirements, error margins on predicted physical quantities. |
PROV-REQ-05 | The algorithms shall be provided according to one of these options:
|
This ensures that the algorithm can be hosted on one of the APEx-compliant algorithm hosting platforms. The APEx documentation will provide clear guidance and samples demonstrating these two options. |
PROV-REQ-06 | For algorithms to be hosted, the algorithm provider shall demonstrate the code quality via static code analysis tools. | For Python code, tools such as pylint can be used for static code analysis. |
PROV-REQ-07 | For algorithms to be hosted, validated outputs for a given set of input parameters shall be available, preferably on a small area that still allows for relevant testing. | This allows to validate the correct functioning of the algorithm as changes are made. |
PROV-REQ-08 | For algorithms to be hosted, automated tests shall be provided that compare the current output of the software against a persisted sample, for a representative area of interest. | |
PROV-REQ-09 | For algorithms to be hosted, a versioning scheme shall be defined, preferably following a standardized approach such as https://semver.org. | |
PROV-REQ-10 | For algorithms to be hosted, the procedure for releasing new versions shall be clearly documented. | |
PROV-REQ-11 | For open source software developed within the project, a changelog shall be maintained by the project. | This outlines significant changes between versions, providing important information for users of your algorithm and the APEx consortium. These explanations help clarify any differences in outcomes or performance that could impact automated testing. |
PROV-REQ-12 | Non-code dependencies such as custom datasets or machine learning models shall either be packaged with the software or be clearly listed as external dependency. | |
PROV-REQ-13 | Algorithms shall expose a list of well-documented parameters, with examples showing valid combinations of parameters. | |
PROV-REQ-14 | Algorithms shall clearly list software library dependencies, separated into testing, development, and minimal set of runtime dependencies. Supported versions or version ranges shall be indicated. | |
PROV-REQ-15 | Runtime dependencies shall be minimized as much as possible. | For instance, libraries required for training a model should not be included in a version for inference. |
PROV-REQ-16 | Code shall be written in a cross-platform manner, supporting both Linux and Windows operating systems. | |
PROV-REQ-17 | Executables shall offer at least one choice of a non-interactive command line interface, or an API for integration into a larger codebase. | |
PROV-REQ-18 | Algorithms shall be associated with and tested on at least one APEx compliant hosting platform. |
Best Pratices
The following sections provide best practice guidelines for developing APEx-compliant algorithms. While these guidelines are not mandatory, adhering to them will enhance the integration process and improve the overall experience of using the algorithm.
Parameter naming & typing
APEx proposes to standardise openEO UDP and CWL parameter names and types that are exposed to the user. This is best illustrated by an example: parameters such as bounding_box
, bbox
, aoi
, and spatial_extent
likely refer to the same concept. However, without common conventions, algorithms might randomly select one of these variants, complicating the usability of the eventual algorithm library.
At the time of writing, the actual conventions have not yet been defined. This becomes relevant when the first algorithms reach a state where they can be published with a fixed interface. This best practice mostly targets new developments that do not have an existing user base or API.
Licensing requirements
For algorithms to be hosted and curated, APEx requires the ability to execute the algorithms as on-demand services on an APEx-compliant algorithm hosting platform. This is straightforward for fully open-source algorithms, such as those licensed under the Apache 2.0 license. However, for algorithms with more restrictive licenses or those dependent on artifacts like trained machine learning models, the algorithm provider must be able to license APEx to execute the service without incurring additional costs beyond the normal resource usage. Without such a license, the automated benchmarking and testing provided by APEx may need to be disabled for the service in question