How to Build an Eratos Model
Page Summary
- Step by step instructions on how to build an Eratos Model
- Example model code and metadata files
- We recommend going through this process in its entirety using our example scripts before trying on your own model, you never know what you wish you had known at the start.
Step 1: Write the model as code
The power of the Eratos Operator model is its flexibility. Any code that can be written in R or Python can be packaged up and hosted on the platform.
To ensure flexibility, several conventions are enforced:
- The filename that contains the main function must be called entry.py (for Python) or entry.R (for R).
- The code must be structured as a function.
- The @operator tag must be provided above the main function and include the desired ern
- ie.
@operator('ern:e-pn.io:resource:eratos.operators.your-model-name-here')
- Outputs should be formatted as a resource, more info below
Model creation tips and tricks
-
To access data within your model, we recommend using Eratos Data Access Methods described here.. Additional datasets can be added to the model package as well.
-
Inputs and outputs are optional. For example in the case of IoT you may be ingesting data and writing it to data structures directly
-
A key change from scripts that run locally, is the addition of the
context
input parameter. This object securely stores your credentials inside the Eratos adapter whilst your operator is running. See lines 18 - 20.adapter_object = context['adapter']
-
Outputs should be formatted as a Resource. This means that the output data is pushed to Eratos and only the output ERN is outputted from the model. This allows both the Eratos Frontend and other models in a workflow to access the data. An example is shown below:
-
Output Resource Generation
data_frame = your_data #Your data goes here, in this instance, the .csv method must be callable with tempfile.TemporaryDirectory() as td: #Create a tempory file to store the data under final_output_df_fname = os.path.join(td, 'minMaxRainTable.csv') data_frame.to_csv(final_output_df_fname) #Create the Resource object with metadata final_output_res = context['adapter'].Resource(content={ '@type': 'ern:e-pn.io:schema:dataset', 'type': 'ern:e-pn.io:resource:eratos.dataset.type.table', 'name': f'Daily Min Max Rainfall Table at Lat: {round(latitude,5)} Long: {round(longitude,5)} ', 'description': 'Daily Min Temp, Max Temp and Rainfall data for given lat lon.', 'updateSchedule': 'ern:e-pn.io:resource:eratos.schedule.noupdate', 'file': 'minMaxRainTable.csv' }) #Push the data to the Resource objects final_output_res.data().push_objects('ern::node:au-1.e-gn.io', {'minMaxRainTable.csv': final_output_df_fname}) outputs = { 'minMaxRainTable': final_output_res, } return outputs
-
An Example entry.py
import os
import pandas as pd
import numpy as np
import tempfile
from eratos.resource import Resource
from eratos.operator import Operator
@operator('ern:e-pn.io:resource:eratos.operators.get-min-max-rain-at-location')
def get_daily_max_min_rain(context,latitude,longitude, startDate,endDate):
location = f'POINT ({longitude} {latitude})'
#1. Request acccess to data resource in Eratos
max_temp_data = context['adapter'].Resource(ern='ern:e-pn.io:resource:eratos.blocks.silo.maxtemperature')
min_temp_data = context['adapter'].Resource(ern='ern:e-pn.io:resource:eratos.blocks.silo.mintemperature')
rainfall_data = context['adapter'].Resource(ern='ern:e-pn.io:resource:eratos.blocks.silo.dailyrainfall')
#2 Convert resource object into gridded data object
gridded_max_temp_data = max_temp_data.data().gapi()
gridded_min_temp_data = min_temp_data.data().gapi()
gridded_rainfall_data = rainfall_data.data().gapi()
# 2
date_generated_list = pd.date_range(startDate, endDate, freq="D")
date_range = date_generated_list.strftime("%Y-%m-%d").to_list()
#Query Dataset to Extract desired data
# max_temp, as found in the dataset variables
extracted_max_temp_data = gridded_max_temp_data.get_timeseries_at_points(
gridded_max_temp_data.get_key_variables()[0], [location], startDate, endDate)
# min_temp, as found in the dataset variables
extracted_min_temp_data = gridded_min_temp_data.get_timeseries_at_points(
gridded_min_temp_data.get_key_variables()[0], [location], startDate, endDate)
# daily_rain, as found in the dataset variables
extracted_rainfall_data = gridded_rainfall_data.get_timeseries_at_points(
gridded_rainfall_data.get_key_variables()[0], [location], startDate, endDate)
data_dict = {"date":date_range,'max_temp (C)':extracted_max_temp_data[0],'min_temp (C)':extracted_min_temp_data[0],
'daily_rain (mm)':extracted_rainfall_data[0]}
data_frame = pd.DataFrame(data_dict)
print('Generating the output resources.')
with tempfile.TemporaryDirectory() as td:
final_output_df_fname = os.path.join(td, 'minMaxRainTable.csv')
data_frame.to_csv(final_output_df_fname)
final_output_res = context['adapter'].Resource(content={
'@type': 'ern:e-pn.io:schema:dataset',
'type': 'ern:e-pn.io:resource:eratos.dataset.type.table',
'name': f'Daily Min Max Rainfall Table at Lat: {round(latitude,5)} Long: {round(longitude,5)} ',
'description': 'Daily Min Temp, Max Temp and Rainfall data for given lat lon.',
'updateSchedule': 'ern:e-pn.io:resource:eratos.schedule.noupdate',
'file': 'minMaxRainTable.csv'
})
final_output_res.data().push_objects('ern::node:au-1.e-gn.io', {'minMaxRainTable.csv': final_output_df_fname})
outputs = {
'minMaxRainTable': final_output_res,
}
return outputs
Step 2: Create the operator.yaml File
The operator.yaml file defines all the metadata for the Operator, including inputs, outputs, ERN, and the area for which the model is defined (if applicable).
operator.yaml
The operator.yaml
clearly defines the inputs, outputs, and other key meta-data:
@id
: The Eratos Resource Name (ERN) for the operator, effectively its ID in the Eratos system@type
: The resource type: This will always be ern:e-pn.io:schema:operator as this is for the operator descriptor@geo
: The geometry that defines the space where this operator is valid (WGS84)name
: The name of the operatordescription
: The description of the operator, explaining to users what it is doing.type
: How the operator is packaged up, currently container is the only optioninputs
: The Inputs of the Operatorname
: The variable name that links to the entry.py variable namedescription
: The description of the variable, and example inputs can be useful hererequired
: Whether this variable is required for the operator to run: True or Falselabel
: The name of the variable displayed on the front end (Optional)
outputs
: The Outputs of the Operator same definitions as the inputssenapsModel:
The ID in our workflow management platform Senaps, is often the same as@id
above without the 'ern:e-pn.io:resource:' sectionsenapsInstanceProfile
: The size of compute and memory required to run the operator, please refer to the compute table below, running costs are directly related.
Key links between operator.yaml and entry.py/R
To ensure a coherent link between the Operator and metadata files, several rules are enforced:
- The variable names of the inputs and outputs in the operator.yaml and the entry.py must be identical (excluding the context variable)
- The
@id
flag and the@operator
flag must contain the same ID - The
@type
flag must be set toern:e-pn.io:schema:operator
operator.yaml file creation tips and tricks
- The variable names of the inputs and outputs in the operator.yaml and the entry.py must be identical (excluding the context variable). Due to the constraints on variable names in coding languages the
label
flag was added, this allows the front end to display a more user-friendly name for the input variable than what is stored on the back-end, when label is present this will be displayed, when it is not, the name will be displayed.
Example operator.yaml file
"@id": ern:e-pn.io:resource:eratos.operators.get-min-max-rain-at-location
"@type": ern:e-pn.io:schema:operator
"@geo": POLYGON((112 -44.99365234375, 112 -10, 154.99609375 -10, 154.99609375 -44.99365234375, 112 -44.99365234375))
name: Get Daily Rainfall, Min & Max temperature at given latitude longitude.
description: |
An operator to generate a timeseries csv of rainfall, Min & Max Temperature data at a given location.
type: Container
inputs:
- name: latitude
type: number
description: The Latitude of the point of interest.
required: True
- name: longitude
type: number
description: The Longitude of the point of interest.
required: True
- name: startDate
type: string
description: The Target start date, eg. 2021-09-15
label: Target Start Date
required: True
- name: endDate
type: string
description: The Target end date, eg. 2021-10-15
label: Target End Date
required: True
outputs:
- name: minMaxRainTable
type: resource
description: Daily Min Temp, Max Temp and Rainfall data for given lat lon.
senapsModel: eratos.operators.get-min-max-rain-at-location
senapsInstanceProfile: S
Input and Output Types Table
where input or output type
can be
Value | Description |
---|---|
string | A UTF-8 character string. |
number | A double precision floating point value. |
boolean | A logical boolean. |
date | A ISO8601 date. |
timestamp | A ISO8601 timestamp. |
resource | A resource. |
geometry | Either Well-Known Text geometry or a resource with geometry attached. |
Compute Table
WhereSenapsInstanceProfile
can be
Value | vCPU | Memory |
---|---|---|
XS | 0.25 | 256MB |
S | 0.25 | 512MB |
M | 0.5 | 1GB |
L | 1.0 | 2GB |
XL | 1.0 | 4GB |
2XL | 2.0 | 8GB |
3XL | 2.0 | 16GB |
4XL | 4.0 | 32GB |
Step 3: Add Optional Additions
Required Packages File
A Required Packages file that contains the package and the needed version or provider. Note packages that are already in the Eratos SDK do not need to be defined here. Packages in the base image and dont need to be specified can be found below:
- Native python/ R packages
- pandas and numpy
Example file:
pandas~=1.3.5
numpy~=1.23.4
Data Folder
A Folder of personal data that is required to run the operator, often these are configuration files for certain packages inside the operator, that can be read in upon execution.
Step 4: Block Descriptor File
We now need to define the metadata that allows the model to be visible on the front end as a block. This is done in the Block_Descriptor.yaml file.
The Block_Descriptor.yaml
clearly defines the key product information, all the below are required:
@id
: The Eratos Resource Name (ERN) for the block, effectively its ID in the Eratos system@type
: The resource type: this will always be 'ern:e-pn.io:schema:block' as this is for the block descriptorname
: The display name of the operator blockdescription
: The description of the operator, explaining to users what it is doing.dependsOn
: Eratos dataset ERN's that this operator depends oncreator
: The creator of the operatorlicenses
: The license the operator falls underpricing
: The pricing model for the operator, ex: Included, pay-per-use, subscriptiontags
: List of Eratos tags that assist with filtering, searching, and cataloguing on the community
Block_Descriptor.yaml example:
"@id": ern:e-pn.io:resource:eratos.blocks.get-min-max-rain-at-location
"@type": ern:e-pn.io:schema:block
name: Get Daily Rainfall, Min & Max temperature at given latitude longitude.
description: An operator to generate a timeseries csv of 5km rainfall, Min & Max Temperature data at a given location.
dependsOn:
- ern:e-pn.io:resource:eratos.blocks.silo.dailyrainfall
- ern:e-pn.io:resource:eratos.blocks.silo.maxtemperature
- ern:e-pn.io:resource:eratos.blocks.silo.mintemperature
primary: ern:e-pn.io:resource:eratos.operators.get-min-max-rain-at-location
creator: ern:e-pn.io:resource:eratos.creator.eratos
licenses:
- ern:e-pn.io:resource:eratos.licenses.eratoscommunity
pricing:
- ern:e-pn.io:resource:eratos.pricing.included
tags: []
Step 5: Format Your Model Folder Structure and Upload
It's time to compile all the files. Before upload, your file structure should be similar to the figure below. Note, not all files are required, such as the requirements.txt file and data folder.
Note: the data folder in this example can be named anything and be nested provided you specify the paths correctly in your code.

R support
The only difference between the R and the Python operators is the entry file, if the above python operator was re-written in R with identical inputs and outputs, all other supporting files would remain unchanged.
Upload the model
Congrats! Your work is done!
Contact [email protected] to upload your model!
Updated 8 days ago