Algorithm Developer and Provider Guidelines
Requirements
Table 1 outlines the interoperability prerequisites required for algorithm providers, such as EO projects, to host their workflows and algorithms within APEx. By satisfying these requirements, APEx guarantees the successful integration of workflows and algorithms and ensures reusability within the broader EO community. It is important to highlight that the majority of these requirements apply to EO projects that build an on-demand service to be exposed via an HTTP-based API.
In terms of creating on-demand services, APEx currently supports two main interface standards: openEO [1] or OGC API Processes [2], as described in section 3. This selection should support almost any possible on-demand service.
Finally, note that APEx also provides support to projects that need to fulfil these requirements. This support includes implementing the guidelines in this document, offering a framework to run automated tests, and providing packages to help improve the performance of algorithms. These are referred to as APEx Algorithm Services.
In general, the aim is to simplify the process of building high-quality on-demand services rather than to add complexity.
ID | Requirement | Description |
---|---|---|
PROV-REQ-01 | EO project results with respect to raster data, shall be delivered as cloud-native datasets. | Where possible, cloud optimized GeoTIFF [3] is preferred. For more complex datasets, CF-Compliant netCDF [4] is a good alternative. Use of the still evolving GeoZarr [5] format requires confirmation by APEx and may result in future incompatibility if the selected flavour is not standardised eventually. |
PROV-REQ-02 | EO project results with respect to vector data, shall be delivered as cloud-native datasets. | Small datasets can use GeoJSON [6], GeoParquet [7] is recommended for larger datasets. |
PROV-REQ-03 | EO project results with respect to data should be accompanied with metadata in a STAC [8] format, including applicable STAC extensions. | The specific STAC profiles to be applied will be defined throughout the project. |
PROV-REQ-04 | Algorithms and applications shall include documentation that addresses the scientific and technical limitations of the algorithm. | For instance, the ability of the algorithm to generalize across space and time, input data requirements, error margins on predicted physical quantities. |
PROV-REQ-05 | The algorithms shall be provided according to one of these options:
|
This ensures that the algorithm can be hosted on one of the APEx-compliant algorithm hosting platforms. The APEx documentation will provide clear guidance and samples demonstrating these two options. |
PROV-REQ-06 | For algorithms to be hosted, the algorithm provider shall demonstrate the code quality via static code analysis tools. | For Python code, tools such as pylint can be used for static code analysis. |
PROV-REQ-07 | For algorithms to be hosted, validated outputs for a given set of input parameters shall be made available by the algorithm provider, preferably on a small area that still allows for relevant testing. | This allows to validate the correct functioning of the algorithm as changes are made. |
PROV-REQ-08 | For algorithms to be hosted, automated tests shall be provided that compare the current output of the software against a persisted sample, for a representative area of interest. | These tests enable APEx to automate the periodic validation of algorithms, ensuring they remain functionally available even after the project has finished. |
PROV-REQ-09 | For algorithms to be hosted, a versioning scheme shall be defined, preferably following a standardized approach such as https://semver.org. | The versioning scheme provides a clear framework for communicating algorithm updates to users. By adopting standard practices, projects can highlight breaking changes, ensuring that users have accurate expectations and can adapt accordingly. |
PROV-REQ-10 | For algorithms to be hosted, the procedure for releasing new versions should be clearly documented. | Clear documentation ensures that updates and new versions of algorithms are consistently and correctly released, reducing errors and providing transparency for users who rely on the latest features and fixes. It also helps maintain version control, which is crucial for reproducibility and compliance. |
PROV-REQ-11 | For open source software developed within the project, a changelog should be maintained by the project. | This outlines significant changes between versions, providing important information for users of your algorithm and the APEx consortium. These explanations help clarify any differences in outcomes or performance that could impact automated testing. |
PROV-REQ-12 | Non-code dependencies such as custom datasets or machine learning models shall either be packaged with the software or be clearly listed as external dependency. | This approach prevents issues related to missing dependencies and ensuring users can easily set up the environment for proper execution. It also promotes transparency and simplifies the deployment process. |
PROV-REQ-13 | Algorithms shall expose a list of well-documented parameters, with examples showing valid combinations of parameters. | Good documentation improves the usability of the algorithms by providing users with clear guidance on how to configure them correctly. Well-documented parameters and examples reduce the risk of incorrect usage. |
PROV-REQ-14 | Algorithms shall clearly list software library dependencies, separated into testing, development, and minimal set of runtime dependencies. Supported versions or version ranges shall be indicated. | By clearly listing and categorizing dependencies, users can quickly set up the necessary environment, avoid conflicts, and ensure the algorithm functions as intended across different scenarios. |
PROV-REQ-15 | Runtime dependencies shall be minimized as much as possible. | For instance, libraries required for training a model should not be included in a version for inference. |
PROV-REQ-16 | Code should be written in a cross-platform manner, supporting at least Linux. | The support for Linux is considered crucial to enable the deployment of the code on a cloud-based environment. |
PROV-REQ-17 | Executables shall offer at least one choice of a non-interactive command line interface, or an API for integration into a larger codebase. | A non-interactive command line interface or API enables seamless automation and integration with other components and services. |
PROV-REQ-18 | Algorithms shall be associated with and tested on at least one APEx compliant hosting platform. | Testing and deployment on an APEx-compliant platform guarantees that the algorithm performs correctly within the intended environment and allows the algorithm to be executed as an on-demand service. |
Best Pratices
The following sections provide best practice guidelines for developing APEx-compliant algorithms. While these guidelines are not mandatory, adhering to them will enhance the integration process and improve the overall experience of using the algorithm.
Parameter naming & typing
APEx proposes to standardise openEO UDP [9] and CWL [11] parameter names and types that are exposed to the user. This is best illustrated by an example: parameters such as bounding_box
, bbox
, aoi
, and spatial_extent
likely refer to the same concept. However, without common conventions, algorithms might randomly select one of these variants, complicating the usability of the eventual algorithm library.
At the time of writing, the actual conventions have not yet been defined. This becomes relevant when the first algorithms reach a state where they can be published with a fixed interface. This best practice mostly targets new developments that do not have an existing user base or API.
Licensing requirements
For algorithms to be hosted and curated, APEx requires the ability to execute the algorithms as on-demand services on an APEx-compliant algorithm hosting platform. This is straightforward for fully open-source algorithms, such as those licensed under the Apache 2.0 license. However, for algorithms with more restrictive licenses or those dependent on artefacts like trained machine learning models, the algorithm provider must be able to license APEx to execute the service without incurring additional costs beyond the normal resource usage. Without such a license, the automated benchmarking and testing provided by APEx may need to be disabled for the service in question.