How to Build an Eratos Model

📘

Page Summary

  • Step by step instructions on how to build an Eratos Model
  • Example model code and metadata files
  • We recommend going through this process in its entirety using our example scripts before trying on your own model, you never know what you wish you had known at the start.

Step 1: Write the model as code

The power of the Eratos Operator model is its flexibility. Any code that can be written in R or Python can be packaged up and hosted on the platform.

🚧

To ensure flexibility, several conventions are enforced:

  • The filename that contains the main function must be called entry.py (for Python) or entry.R (for R).
  • The code must be structured as a function.
  • The @operator tag must be provided above the main function and include the desired ern
    • ie. @operator('ern:e-pn.io:resource:eratos.operators.your-model-name-here')
  • Outputs should be formatted as a resource, more info below

Model creation tips and tricks

  • To access data within your model, we recommend using Eratos Data Access Methods described here.. Additional datasets can be added to the model package as well.

  • Inputs and outputs are optional. For example in the case of IoT you may be ingesting data and writing it to data structures directly

  • A key change from scripts that run locally, is the addition of the context input parameter. This object securely stores your credentials inside the Eratos adapter whilst your operator is running. See lines 18 - 20.

    • adapter_object = context['adapter']
  • Outputs should be formatted as a Resource. This means that the output data is pushed to Eratos and only the output ERN is outputted from the model. This allows both the Eratos Frontend and other models in a workflow to access the data. An example is shown below:

    • Output Resource Generation

      data_frame = your_data #Your data goes here, in this instance, the .csv method must be callable
      
      with tempfile.TemporaryDirectory() as td: #Create a tempory file to store the data under  
          final_output_df_fname = os.path.join(td, 'minMaxRainTable.csv')    
      			data_frame.to_csv(final_output_df_fname)
      
      		#Create the Resource object with metadata
          final_output_res = context['adapter'].Resource(content={
              '@type': 'ern:e-pn.io:schema:dataset',
              'type': 'ern:e-pn.io:resource:eratos.dataset.type.table',
              'name': f'Daily Min Max Rainfall Table at Lat: {round(latitude,5)} Long: {round(longitude,5)} ',
              'description': 'Daily Min Temp, Max Temp and Rainfall data for given lat lon.',
              'updateSchedule': 'ern:e-pn.io:resource:eratos.schedule.noupdate',
              'file': 'minMaxRainTable.csv'
          })
          
          #Push the data to the Resource objects 
          final_output_res.data().push_objects('ern::node:au-1.e-gn.io', {'minMaxRainTable.csv': final_output_df_fname})
      
          outputs = {
              'minMaxRainTable': final_output_res,
          }
      return outputs

An Example entry.py

import os
import pandas as pd
import numpy as np
import tempfile

from eratos.resource import Resource
from eratos.operator import Operator

@operator('ern:e-pn.io:resource:eratos.operators.get-min-max-rain-at-location')
def get_daily_max_min_rain(context,latitude,longitude, startDate,endDate):

    location = f'POINT ({longitude} {latitude})'
    #1. Request acccess to data resource in Eratos
    max_temp_data = context['adapter'].Resource(ern='ern:e-pn.io:resource:eratos.blocks.silo.maxtemperature')
    min_temp_data = context['adapter'].Resource(ern='ern:e-pn.io:resource:eratos.blocks.silo.mintemperature')
    rainfall_data = context['adapter'].Resource(ern='ern:e-pn.io:resource:eratos.blocks.silo.dailyrainfall')
    #2 Convert resource object into gridded data object
    gridded_max_temp_data = max_temp_data.data().gapi()
    gridded_min_temp_data = min_temp_data.data().gapi()
    gridded_rainfall_data = rainfall_data.data().gapi()
    
    # 2
    date_generated_list = pd.date_range(startDate, endDate, freq="D")
    date_range = date_generated_list.strftime("%Y-%m-%d").to_list()
    
    #Query Dataset to Extract desired data
    # max_temp, as found in the dataset variables
    extracted_max_temp_data = gridded_max_temp_data.get_timeseries_at_points(
		gridded_max_temp_data.get_key_variables()[0], [location], startDate, endDate)
    
    # min_temp, as found in the dataset variables
    extracted_min_temp_data = gridded_min_temp_data.get_timeseries_at_points(
		gridded_min_temp_data.get_key_variables()[0], [location], startDate, endDate)
    
    # daily_rain, as found in the dataset variables
    extracted_rainfall_data = gridded_rainfall_data.get_timeseries_at_points(
		gridded_rainfall_data.get_key_variables()[0], [location], startDate, endDate)
    
    data_dict = {"date":date_range,'max_temp (C)':extracted_max_temp_data[0],'min_temp (C)':extracted_min_temp_data[0],
                 'daily_rain (mm)':extracted_rainfall_data[0]}
    data_frame = pd.DataFrame(data_dict)
    
    print('Generating the output resources.')
    with tempfile.TemporaryDirectory() as td:
        final_output_df_fname = os.path.join(td, 'minMaxRainTable.csv')

        data_frame.to_csv(final_output_df_fname)
    
        final_output_res = context['adapter'].Resource(content={
            '@type': 'ern:e-pn.io:schema:dataset',
            'type': 'ern:e-pn.io:resource:eratos.dataset.type.table',
            'name': f'Daily Min Max Rainfall Table at Lat: {round(latitude,5)} Long: {round(longitude,5)} ',
            'description': 'Daily Min Temp, Max Temp and Rainfall data for given lat lon.',
            'updateSchedule': 'ern:e-pn.io:resource:eratos.schedule.noupdate',
            'file': 'minMaxRainTable.csv'
        })
        final_output_res.data().push_objects('ern::node:au-1.e-gn.io', {'minMaxRainTable.csv': final_output_df_fname})

        outputs = {
            'minMaxRainTable': final_output_res,
        }

    return outputs

Step 2: Create the operator.yaml File

The operator.yaml file defines all the metadata for the Operator, including inputs, outputs, ERN, and the area for which the model is defined (if applicable).

operator.yaml

The operator.yaml clearly defines the inputs, outputs, and other key meta-data:

  • @id: The Eratos Resource Name (ERN) for the operator, effectively its ID in the Eratos system
  • @type: The resource type: This will always be ern:e-pn.io:schema:operator as this is for the operator descriptor
  • @geo: The geometry that defines the space where this operator is valid (WGS84)
  • name: The name of the operator
  • description: The description of the operator, explaining to users what it is doing.
  • type: How the operator is packaged up, currently container is the only option
  • inputs: The Inputs of the Operator
    • name: The variable name that links to the entry.py variable name
    • description: The description of the variable, and example inputs can be useful here
    • required: Whether this variable is required for the operator to run: True or False
    • label: The name of the variable displayed on the front end (Optional)
  • outputs: The Outputs of the Operator same definitions as the inputs
  • senapsModel: The ID in our workflow management platform Senaps, is often the same as @id above without the 'ern:e-pn.io:resource:' section
  • senapsInstanceProfile: The size of compute and memory required to run the operator, please refer to the compute table below, running costs are directly related.

Key links between operator.yaml and entry.py/R

To ensure a coherent link between the Operator and metadata files, several rules are enforced:

  • The variable names of the inputs and outputs in the operator.yaml and the entry.py must be identical (excluding the context variable)
  • The @id flag and the @operatorflag must contain the same ID
  • The @type flag must be set to ern:e-pn.io:schema:operator

operator.yaml file creation tips and tricks

  • The variable names of the inputs and outputs in the operator.yaml and the entry.py must be identical (excluding the context variable). Due to the constraints on variable names in coding languages the label flag was added, this allows the front end to display a more user-friendly name for the input variable than what is stored on the back-end, when label is present this will be displayed, when it is not, the name will be displayed.

Example operator.yaml file

"@id": ern:e-pn.io:resource:eratos.operators.get-min-max-rain-at-location
"@type": ern:e-pn.io:schema:operator
"@geo": POLYGON((112 -44.99365234375, 112 -10, 154.99609375 -10, 154.99609375 -44.99365234375, 112 -44.99365234375))
name: Get Daily Rainfall, Min & Max temperature at given latitude longitude.
description: |
    An operator to generate a timeseries csv of rainfall, Min & Max Temperature data at a given location.
type: Container
inputs:
    - name: latitude
      type: number
      description: The Latitude of the point of interest.
      required: True
    - name: longitude
      type: number
      description: The Longitude of the point of interest.
      required: True
    - name: startDate
      type: string
      description: The Target start date, eg. 2021-09-15
      label: Target Start Date
      required: True
    - name: endDate
      type: string
      description: The Target end date, eg. 2021-10-15
      label: Target End Date
      required: True
outputs:
    - name: minMaxRainTable
      type: resource
      description: Daily Min Temp, Max Temp and Rainfall data for given lat lon.
senapsModel: eratos.operators.get-min-max-rain-at-location
senapsInstanceProfile: S

Input and Output Types Table

where input or output type can be

ValueDescription
stringA UTF-8 character string.
numberA double precision floating point value.
booleanA logical boolean.
dateA ISO8601 date.
timestampA ISO8601 timestamp.
resourceA resource.
geometryEither Well-Known Text geometry or a resource with geometry attached.

Compute Table

WhereSenapsInstanceProfile can be

ValuevCPUMemory
XS0.25256MB
S0.25512MB
M0.51GB
L1.02GB
XL1.04GB
2XL2.08GB
3XL2.016GB
4XL4.032GB

Step 3: Add Optional Additions

Required Packages File

A Required Packages file that contains the package and the needed version or provider. Note packages that are already in the Eratos SDK do not need to be defined here. Packages in the base image and dont need to be specified can be found below:

  • Native python/ R packages
  • pandas and numpy

Example file:

pandas~=1.3.5
numpy~=1.23.4

Data Folder

A Folder of personal data that is required to run the operator, often these are configuration files for certain packages inside the operator, that can be read in upon execution.

Step 4: Block Descriptor File

We now need to define the metadata that allows the model to be visible on the front end as a block. This is done in the Block_Descriptor.yaml file.

The Block_Descriptor.yaml clearly defines the key product information, all the below are required:

  • @id: The Eratos Resource Name (ERN) for the block, effectively its ID in the Eratos system
  • @type: The resource type: this will always be 'ern:e-pn.io:schema:block' as this is for the block descriptor
  • name: The display name of the operator block
  • description: The description of the operator, explaining to users what it is doing.
  • dependsOn: Eratos dataset ERN's that this operator depends on
  • creator: The creator of the operator
  • licenses: The license the operator falls under
  • pricing: The pricing model for the operator, ex: Included, pay-per-use, subscription
  • tags: List of Eratos tags that assist with filtering, searching, and cataloguing on the community

Block_Descriptor.yaml example:

"@id": ern:e-pn.io:resource:eratos.blocks.get-min-max-rain-at-location
"@type": ern:e-pn.io:schema:block
name: Get Daily Rainfall, Min & Max temperature at given latitude longitude.
description: An operator to generate a timeseries csv of 5km rainfall, Min & Max Temperature data at a given location.
dependsOn:
    - ern:e-pn.io:resource:eratos.blocks.silo.dailyrainfall
		- ern:e-pn.io:resource:eratos.blocks.silo.maxtemperature
 		- ern:e-pn.io:resource:eratos.blocks.silo.mintemperature

primary: ern:e-pn.io:resource:eratos.operators.get-min-max-rain-at-location
creator: ern:e-pn.io:resource:eratos.creator.eratos
licenses: 
    - ern:e-pn.io:resource:eratos.licenses.eratoscommunity
pricing: 
    - ern:e-pn.io:resource:eratos.pricing.included
tags: []

Step 5: Format Your Model Folder Structure and Upload

It's time to compile all the files. Before upload, your file structure should be similar to the figure below. Note, not all files are required, such as the requirements.txt file and data folder.

Note: the data folder in this example can be named anything and be nested provided you specify the paths correctly in your code.

R support

The only difference between the R and the Python operators is the entry file, if the above python operator was re-written in R with identical inputs and outputs, all other supporting files would remain unchanged.

Upload the model

Congrats! Your work is done!

Contact [email protected] to upload your model!