📘
Page Summary

Step by step instructions on how to build an Eratos Model

Example model code and metadata files

We recommend going through this process in its entirety using our example scripts before trying on your own model, you never know what you wish you had known at the start.

Step 1: Write the model as code

The power of the Eratos Operator model is its flexibility. Any code that can be written in R or Python can be packaged up and hosted on the platform.

🚧
To ensure flexibility, several conventions are enforced:

The filename that contains the main function must be called entry.py (for Python) or entry.R (for R).

The code must be structured as a function.

The @operator tag must be provided above the main function and include the desired ern

ie. @operator('ern:e-pn.io:resource:eratos.operators.your-model-name-here')

Outputs should be formatted as a resource, more info below

Model creation tips and tricks

To access data within your model, we recommend using Eratos Data Access Methods described here.. Additional datasets can be added to the model package as well.
Inputs and outputs are optional. For example in the case of IoT you may be ingesting data and writing it to data structures directly
A key change from scripts that run locally, is the addition of the context input parameter. This object securely stores your credentials inside the Eratos adapter whilst your operator is running. See lines 18 - 20.
- adapter_object = context['adapter']

Outputs should be formatted as a Resource. This means that the output data is pushed to Eratos and only the output ERN is outputted from the model. This allows both the Eratos Frontend and other models in a workflow to access the data. An example is shown below:

Output Resource Generation

data_frame = your_data #Your data goes here, in this instance, the .csv method must be callable

with tempfile.TemporaryDirectory() as td: #Create a tempory file to store the data under  
    final_output_df_fname = os.path.join(td, 'minMaxRainTable.csv')    
			data_frame.to_csv(final_output_df_fname)

		#Create the Resource object with metadata
    final_output_res = context['adapter'].Resource(content={
        '@type': 'ern:e-pn.io:schema:dataset',
        'type': 'ern:e-pn.io:resource:eratos.dataset.type.table',
        'name': f'Daily Min Max Rainfall Table at Lat: {round(latitude,5)} Long: {round(longitude,5)} ',
        'description': 'Daily Min Temp, Max Temp and Rainfall data for given lat lon.',
        'updateSchedule': 'ern:e-pn.io:resource:eratos.schedule.noupdate',
        'file': 'minMaxRainTable.csv'
    })
    
    #Push the data to the Resource objects 
    final_output_res.data().push_objects('ern::node:au-1.e-gn.io', {'minMaxRainTable.csv': final_output_df_fname})

    outputs = {
        'minMaxRainTable': final_output_res,
    }
return outputs

An Example entry.py

import os
import pandas as pd
import numpy as np
import tempfile

from eratos.resource import Resource
from eratos.operator import Operator

@operator('ern:e-pn.io:resource:eratos.operators.get-min-max-rain-at-location')
def get_daily_max_min_rain(context,latitude,longitude, startDate,endDate):

    location = f'POINT ({longitude} {latitude})'
    #1. Request acccess to data resource in Eratos
    max_temp_data = context['adapter'].Resource(ern='ern:e-pn.io:resource:eratos.blocks.silo.maxtemperature')
    min_temp_data = context['adapter'].Resource(ern='ern:e-pn.io:resource:eratos.blocks.silo.mintemperature')
    rainfall_data = context['adapter'].Resource(ern='ern:e-pn.io:resource:eratos.blocks.silo.dailyrainfall')
    #2 Convert resource object into gridded data object
    gridded_max_temp_data = max_temp_data.data().gapi()
    gridded_min_temp_data = min_temp_data.data().gapi()
    gridded_rainfall_data = rainfall_data.data().gapi()
    
    # 2
    date_generated_list = pd.date_range(startDate, endDate, freq="D")
    date_range = date_generated_list.strftime("%Y-%m-%d").to_list()
    
    #Query Dataset to Extract desired data
    # max_temp, as found in the dataset variables
    extracted_max_temp_data = gridded_max_temp_data.get_timeseries_at_points(
		gridded_max_temp_data.get_key_variables()[0], [location], startDate, endDate)
    
    # min_temp, as found in the dataset variables
    extracted_min_temp_data = gridded_min_temp_data.get_timeseries_at_points(
		gridded_min_temp_data.get_key_variables()[0], [location], startDate, endDate)
    
    # daily_rain, as found in the dataset variables
    extracted_rainfall_data = gridded_rainfall_data.get_timeseries_at_points(
		gridded_rainfall_data.get_key_variables()[0], [location], startDate, endDate)
    
    data_dict = {"date":date_range,'max_temp (C)':extracted_max_temp_data[0],'min_temp (C)':extracted_min_temp_data[0],
                 'daily_rain (mm)':extracted_rainfall_data[0]}
    data_frame = pd.DataFrame(data_dict)
    
    print('Generating the output resources.')
    with tempfile.TemporaryDirectory() as td:
        final_output_df_fname = os.path.join(td, 'minMaxRainTable.csv')

        data_frame.to_csv(final_output_df_fname)
    
        final_output_res = context['adapter'].Resource(content={
            '@type': 'ern:e-pn.io:schema:dataset',
            'type': 'ern:e-pn.io:resource:eratos.dataset.type.table',
            'name': f'Daily Min Max Rainfall Table at Lat: {round(latitude,5)} Long: {round(longitude,5)} ',
            'description': 'Daily Min Temp, Max Temp and Rainfall data for given lat lon.',
            'updateSchedule': 'ern:e-pn.io:resource:eratos.schedule.noupdate',
            'file': 'minMaxRainTable.csv'
        })
        final_output_res.data().push_objects('ern::node:au-1.e-gn.io', {'minMaxRainTable.csv': final_output_df_fname})

        outputs = {
            'minMaxRainTable': final_output_res,
        }

    return outputs

Step 2: Create the operator.yaml File

The operator.yaml file defines all the metadata for the Operator, including inputs, outputs, ERN, and the area for which the model is defined (if applicable).

operator.yaml

The operator.yaml clearly defines the inputs, outputs, and other key meta-data:

@id: The Eratos Resource Name (ERN) for the operator, effectively its ID in the Eratos system
@type: The resource type: This will always be ern:e-pn.io:schema:operator as this is for the operator descriptor
@geo: The geometry that defines the space where this operator is valid (WGS84)
name: The name of the operator
description: The description of the operator, explaining to users what it is doing.
type: How the operator is packaged up, currently container is the only option
inputs: The Inputs of the Operator
- name: The variable name that links to the entry.py variable name
- description: The description of the variable, and example inputs can be useful here
- required: Whether this variable is required for the operator to run: True or False
- label: The name of the variable displayed on the front end (Optional)
outputs: The Outputs of the Operator same definitions as the inputs
senapsModel: The ID in our workflow management platform Senaps, is often the same as @id above without the 'ern:e-pn.io:resource:' section
senapsInstanceProfile: The size of compute and memory required to run the operator, please refer to the compute table below, running costs are directly related.

Key links between operator.yaml and entry.py/R

To ensure a coherent link between the Operator and metadata files, several rules are enforced:

The variable names of the inputs and outputs in the operator.yaml and the entry.py must be identical (excluding the context variable)
The @id flag and the @operatorflag must contain the same ID
The @type flag must be set to ern:e-pn.io:schema:operator

operator.yaml file creation tips and tricks

The variable names of the inputs and outputs in the operator.yaml and the entry.py must be identical (excluding the context variable). Due to the constraints on variable names in coding languages the label flag was added, this allows the front end to display a more user-friendly name for the input variable than what is stored on the back-end, when label is present this will be displayed, when it is not, the name will be displayed.

Example operator.yaml file

"@id": ern:e-pn.io:resource:eratos.operators.get-min-max-rain-at-location
"@type": ern:e-pn.io:schema:operator
"@geo": POLYGON((112 -44.99365234375, 112 -10, 154.99609375 -10, 154.99609375 -44.99365234375, 112 -44.99365234375))
name: Get Daily Rainfall, Min & Max temperature at given latitude longitude.
description: |
    An operator to generate a timeseries csv of rainfall, Min & Max Temperature data at a given location.
type: Container
inputs:
    - name: latitude
      type: number
      description: The Latitude of the point of interest.
      required: True
    - name: longitude
      type: number
      description: The Longitude of the point of interest.
      required: True
    - name: startDate
      type: string
      description: The Target start date, eg. 2021-09-15
      label: Target Start Date
      required: True
    - name: endDate
      type: string
      description: The Target end date, eg. 2021-10-15
      label: Target End Date
      required: True
outputs:
    - name: minMaxRainTable
      type: resource
      description: Daily Min Temp, Max Temp and Rainfall data for given lat lon.
senapsModel: eratos.operators.get-min-max-rain-at-location
senapsInstanceProfile: S

Input and Output Types Table

where input or output type can be

Value	Description
`string`	A UTF-8 character string.
`number`	A double precision floating point value.
`boolean`	A logical boolean.
`date`	A ISO8601 date.
`timestamp`	A ISO8601 timestamp.
`resource`	A resource.
`geometry`	Either Well-Known Text geometry or a resource with geometry attached.

Compute Table

WhereSenapsInstanceProfile can be

Value	vCPU	Memory
`XS`	`0.25`	`256MB`
`S`	`0.25`	`512MB`
`M`	`0.5`	`1GB`
`L`	`1.0`	`2GB`
`XL`	`1.0`	`4GB`
`2XL`	`2.0`	`8GB`
`3XL`	`2.0`	`16GB`
`4XL`	`4.0`	`32GB`

Step 3: Add Optional Additions

Required Packages File

A Required Packages file that contains the package and the needed version or provider. Note packages that are already in the Eratos SDK do not need to be defined here. Packages in the base image and dont need to be specified can be found below:

Native python/ R packages
pandas and numpy

Example file:

pandas~=1.3.5
numpy~=1.23.4

Data Folder

A Folder of personal data that is required to run the operator, often these are configuration files for certain packages inside the operator, that can be read in upon execution.

Step 4: Block Descriptor File

We now need to define the metadata that allows the model to be visible on the front end as a block. This is done in the Block_Descriptor.yaml file.

The Block_Descriptor.yaml clearly defines the key product information, all the below are required:

@id: The Eratos Resource Name (ERN) for the block, effectively its ID in the Eratos system
@type: The resource type: this will always be 'ern:e-pn.io:schema:block' as this is for the block descriptor
name: The display name of the operator block
description: The description of the operator, explaining to users what it is doing.
dependsOn: Eratos dataset ERN's that this operator depends on
creator: The creator of the operator
licenses: The license the operator falls under
pricing: The pricing model for the operator, ex: Included, pay-per-use, subscription
tags: List of Eratos tags that assist with filtering, searching, and cataloguing on the community

Block_Descriptor.yaml example:

"@id": ern:e-pn.io:resource:eratos.blocks.get-min-max-rain-at-location
"@type": ern:e-pn.io:schema:block
name: Get Daily Rainfall, Min & Max temperature at given latitude longitude.
description: An operator to generate a timeseries csv of 5km rainfall, Min & Max Temperature data at a given location.
dependsOn:
    - ern:e-pn.io:resource:eratos.blocks.silo.dailyrainfall
		- ern:e-pn.io:resource:eratos.blocks.silo.maxtemperature
 		- ern:e-pn.io:resource:eratos.blocks.silo.mintemperature

primary: ern:e-pn.io:resource:eratos.operators.get-min-max-rain-at-location
creator: ern:e-pn.io:resource:eratos.creator.eratos
licenses: 
    - ern:e-pn.io:resource:eratos.licenses.eratoscommunity
pricing: 
    - ern:e-pn.io:resource:eratos.pricing.included
tags: []

Step 5: Format Your Model Folder Structure and Upload

It's time to compile all the files. Before upload, your file structure should be similar to the figure below. Note, not all files are required, such as the requirements.txt file and data folder.

Note: the data folder in this example can be named anything and be nested provided you specify the paths correctly in your code.

R support

The only difference between the R and the Python operators is the entry file, if the above python operator was re-written in R with identical inputs and outputs, all other supporting files would remain unchanged.

Upload the model

Congrats! Your work is done!

Contact [email protected] to upload your model!