Create and Run a Processor

This guide is here to help you get up and running with Processing (Intellect) and its processors. Whether you're brand new or just need a refresher, we’ve got you covered with clear steps, simple explanations, and a real example to pull it all together.

This guide can be seen as a simplified extract of what already described here, so it can be particularly interesting for those users who are not familiar with Docker.

It is recommended to read this guide by following the proposed order.

What Is a Processor

A processor is any type of service - such as an algorithm, workflow, or processor - that takes input data and produces output in batch mode. This enables users to analyze or process large volumes of data efficiently.

How To Access Intellect

Processing can be accessed by the ESA PAL main page and by clicking on "Processing (Intellect)". You will be then redirected to a login page.

Required Files (Sources)

To build your own processor, you will need four main sources or files:

A Dockerfile
An entrypoint
The script you want to run within the processor
A requirements file (not always needed)

As said before, in the following it is assumed that the user has a limited knowledge about Docker. For this reason, some pieces of code will be threated as "default" parts, i.e. they can be left unchanged in case you want to use them for creating your own processor.

Dockerfile

The Dockerfile defines the "bone" structure of your processor. Here you define the OS version, all the needed directories and how the other three main files should interact each other. Here below there is an example of Dockerfile that you can use as well.

FROM ubuntu:22.04

LABEL maintainer="ASCEND"

ENV TZ=Etc/UTC

RUN echo $TZ > /etc/timezone

# Install system dependencies and Python libs from Ubuntu repos
RUN apt-get update && apt-get install --yes --no-install-recommends \
    jq zip unzip gdal-bin python3-gdal python3-venv\
    && rm -rf /var/lib/apt/lists/*

# --------- Set up worker environment variables and directories ---------
ARG WORKERDIR=/home/worker
ARG INDIR="$WORKERDIR/workDir/inDir"
ARG OUTDIR="$WORKERDIR/workDir/outDir"
ARG PROCDIR="$WORKERDIR/procDir"
ARG WPS_PROPS="$WORKERDIR/workDir/WPS-INPUT.properties"

# Create directories for processing and I/O
RUN mkdir -p $INDIR $OUTDIR $PROCDIR

ENV IN_DIR="$INDIR"
ENV OUT_DIR="$OUTDIR"
ENV PROC_DIR="$PROCDIR"
ENV WORKERDIR="$WORKERDIR"
ENV WPS_PROPS="$WPS_PROPS"

# Copy requirements.txt and create venv
COPY requirements.txt /tmp/requirements.txt
RUN python3 -m venv /home/worker/procDir/venv \
    && /home/worker/procDir/venv/bin/pip install --upgrade pip \
    && GDAL_VERSION=$(gdal-config --version) /home/worker/procDir/venv/bin/pip install -r /tmp/requirements.txt

ENV PATH="/home/worker/procDir/venv/bin:$PATH"

# Copy your code into the container
COPY * ${PROCDIR}/

# Make sure your entrypoint script is executable
RUN chmod +x $PROCDIR/basic_entrypoint.py

# Set working directory
WORKDIR $PROCDIR

# Ensure venv is used for python/pip
#ENV PATH="/opt/venv/bin:$PATH"

# Run your app using venv Python
ENTRYPOINT ["python3", "/home/worker/procDir/basic_entrypoint.py"]

The line FROM ubuntu:24.10 in a Dockerfile defines the base image from which your custom Docker image will be built. This base image provides the foundational operating system (OS) environment—in this case, Ubuntu version 24.10. you're not limited to Ubuntu. You can use any publicly available Docker image as your base, such as python based ones. The choice depends on the requirements of your application and the environment you want to replicate.

The lines:

ARG WORKERDIR=/home/worker
ARG INDIR="$WORKERDIR/workDir/inDir"
ARG OUTDIR="$WORKERDIR/workDir/outDir"
ARG PROCDIR="$WORKERDIR/procDir"
ARG WPS_PROPS="$WORKERDIR/workDir/WPS-INPUT.properties"

define the directories in which your processor takes the needed inputs and saves the outputs (INDIR and OUTDIR), while WPS_PROPS defines place in which the configuration file is. It is recommended to leave these lines (WORKERDIR, INDIR, OUTDIR, PROCDIR and WPS_PROPS) as they are, since the tool expects input and output data to be located in a subdirectory named "inDir" and "outDir". This line defines an argument that the backend uses to map your processor’s inputs and outputs. Specifically, the backend expects a file called WPS-INPUT.properties at this location, which contains key information about how your processor should handle input data. By including this line in your Dockerfile, you ensure that your processor is compatible with the platform’s execution environment and can correctly receive and process the inputs provided by users. If this line is missing, the backend won’t be able to properly configure or run your processor, which could lead to errors or unexpected behavior.

You can create a requirement file naming it as you want. In case you decide to rename your requirements file, make sure to change the name even at the following line:

RUN python3 -m venv /home/worker/procDir/venv \
    && /home/worker/procDir/venv/bin/pip install --upgrade pip \
    && GDAL_VERSION=$(gdal-config --version) /home/worker/procDir/venv/bin/pip install -r /tmp/requirements.txt

Finally, the entrypoint file is called:

RUN chmod +x $PROCDIR/basic_entrypoint.py

ENTRYPOINT ["python3", "/home/worker/procDir/basic_entrypoint.py"]

And even in this case, you can rename your entrypoint as you prefer, but putting attention to change the name in the lines above.

Entrypoint

Also here, you can use the entrypoint file shown below as a template for your processor.

#!home/worker/procDir/venv/bin/ python3

import os
import logging
import subprocess
import zipfile
import glob
from jproperties import Properties

logging.basicConfig(
    encoding="utf-8",
    level=logging.INFO,
    format="%(asctime)s %(levelname)-8s %(message)s",
    datefmt="%Y-%m-%dT%H:%M:%S",
)

PROC_DIR = os.environ.get("PROC_DIR")
WORKERDIR = os.environ.get("WORKERDIR")
IN_DIR = os.environ.get("IN_DIR")
OUT_DIR = os.environ.get("OUT_DIR")
WPS_PROPS = os.environ.get("WPS_PROPS")


def check_prop_file(file:str)->bool:
    '''
    it checks if properties files is not empty
    Args:
        file= path to properties file
    Return:
        bool for validity
    '''
    configs = Properties()

    with open(file, 'rb') as read_prop:
        configs.load(read_prop)
    if not len(configs)>0:
        logging.info("No input parameters given")
    return len(configs)>0

def get_parameter(name:str,file:str)->str:
    '''
    It extracts input parameter given as
    defined in GUI when launching the code
    Args:
        name = name of the paramenter i.e. bbox
        file = path to input parameter file
    Returns:
        a string that defines the given parameter
        i.e. "[44.265, 12.470, 43.220, 13.913]"
    '''
    configs = Properties()
    with open(file, 'rb') as read_prop:
        configs.load(read_prop)

    definition=configs.get(name).data
    return definition

def main():

    python_exec = os.path.join(PROC_DIR, "venv", "bin", "python3") 

    script_path = os.path.join(PROC_DIR, "fastcopier.py")

    # Optional: Check if files exist

    if not os.path.exists(python_exec):
        logging.error(f"Python executable not found at {python_exec}")
        return
    if not os.path.exists(script_path):
        logging.error(f"Script not found at {script_path}")
        return
    
    # # prepare input

    # Run the script
    cmd = [python_exec, script_path, f"{IN_DIR}/input", f"{OUT_DIR}/output"]
    logging.info(f"Running command: {' '.join(cmd)}")

    try:
        subprocess.run(cmd, check=True)
        logging.info("Script executed successfully.")
    except subprocess.CalledProcessError as e:
        logging.error(f"Script execution failed with code {e.returncode}")

if __name__ == "__main__":
    main()

In the following lines:

PROC_DIR = os.environ.get("PROC_DIR")
WORKERDIR = os.environ.get("WORKERDIR")
IN_DIR = os.environ.get("IN_DIR")
OUT_DIR = os.environ.get("OUT_DIR")
WPS_PROPS = os.environ.get("WPS_PROPS")

the directories defined in the Dockerfile are assigned to a variable to be used within the entrypoint file. Then, the functions check_prop_file and get_parameter perform a check on the input data, ensuring the proper definition and usage. The path of the script you want to run as a processor (in this case "fastcopier.py") is defined in the line:

script_path = os.path.join(PROC_DIR, "fastcopier.py")

One line at which you have to keep attention is the following:

cmd = [python_exec, script_path, f"{IN_DIR}/input", f"{OUT_DIR}/output"]

This line constructs the command that runs your processing script, passing the input and output directories as arguments. The folder names (input and output) must match the IDs defined in your processor’s input and output configuration.

Requirements File

The requirement file is a .txt file in which you define all the packages that need to be installed in order to run your code. It can be thought as a list of libraries or packages that you normally install using pip. For example, let's suppose that your algorithm needs the following packages to run:

rasterio
pandas
numpy

You can write a simple .txt file as described below:

rasterio
pandas
numpy

Inputs and Outputs Definition

Your Inputs and outputs are defined in the respective tabs after sources, i.e. under Input defintions and Output definitions. You can define more than one input/output, by clicking on the "plus" symbol in green:

Here you can assign a name to your input (ID) and a title. As mentioned before, the ID that you choose should be the same reported in:

cmd = [python_exec, script_path, f"{IN_DIR}/input", f"{OUT_DIR}/output"]

Same for the output ID. The title will then appear when you need to run your processor, above the data selection bar:

You can choose different types of input:

String
Number
Enum
Catalogue Product
AOI (Area Of Interest): WKT format, e.g. POLYGON ((37.5 65.75, 42.5 65.75, 44.5 68.25, 38.5 68.25, 37.5 65.75))
Date: in ISO 8601 format, e.g. "2022-10-01T00:00:00.000Z"

Similarly, for the output you can choose a defined format:

Tif: type of "raster image file" that includes geographic metadata. File Extension should be .tif (or .TIF)
ShapeFile: "vector data format" for geographic information system
Other

Currently, the supported image formats are .tif and .TIF (not .tiff).

It is highly recommended to directly create a Cloud Optimized GeoTIFF and/or convert your GeoTiff with gdalwarp.

To convert not_cog.tif into now_a_cog.tif you can refer to the following line of code:

gdalwarp -of COG -t_srs EPSG:4326 -co COMPRESS=DEFLATE not_cog.tif  now_a_cog.tif

You can find a more detailed explanation of the supported input and output types here.

Run a Processor

To run your processor, you can go to "Process some data" and choose the type of processing you want to perform (depending on what type of processor you implemented). Then you can select your processor from the ones appearing on your service list (on the right part of the screen) or you can filter the results to by service type or owner, to find it easier. Once selecting your processor, just insert the input you need to process and select in which folder you want to save the output. Finally, by clicking on "Run service", you can start your processing and wait until your algorithm performs its tasks.

Practical Examples

At this point, you can find some usage examples of how to create a processor. First, you have to refer to the "Develop" tab on the top bar or, equivalently, you can click on "Integrate your own algorithm", and then "Processing services".

Example 1 - Gaussian Filter

In this first example, you will understand how to create a processor for applying a Gaussian filter to a .tif image. In particular, the aim is to take as input one or more .tif images and produce as output the corresponding filtered image/images. You can find here below two Python codes, one for the filtering (fastprocessor.py) and the other for the images copy (fastcopier.py):

#!/usr/bin/env python3
"""
FastProcessR - Apply Gaussian smoothing to .TIF images.

Features:
- Accepts both single files and directories.
- Applies Gaussian filtering with a user-defined kernel size.
- Automatically adjusts kernel size to an odd number.
- Handles errors and logs important information.
"""

import sys
import os
import logging
import argparse
import numpy as np
import rasterio
from scipy.ndimage import gaussian_filter

# ----------------- Logging Configuration -----------------
logging.basicConfig(
    stream=sys.stdout,
    level=os.environ.get("LOG_LEVEL", "INFO"),
    format="%(asctime)s - %(levelname)s - %(message)s"
)

# ----------------- Image Processing Function -----------------
def process_image(input_file, output_file, kernel_size):
    """
    Applies Gaussian smoothing to a single raster image.
    """
    try:
        logging.info(f"Processing image: {input_file}")

        # Check if file exists
        if not os.path.isfile(input_file):
            raise FileNotFoundError(f"Input file does not exist: {input_file}")

        # Ensure kernel_size is odd
        if kernel_size % 2 == 0:
            kernel_size += 1
            logging.info(f"Adjusted kernel_size to next odd number: {kernel_size}")

        # Read raster
        with rasterio.open(input_file) as src:
            img = src.read()
            profile = src.profile

        # Apply Gaussian smoothing per band
        img_out = np.empty_like(img)
        for b in range(img.shape[0]):
            img_out[b] = gaussian_filter(img[b], sigma=kernel_size)

        # Ensure destination folder exists
        os.makedirs(os.path.dirname(output_file), exist_ok=True)

        # Save processed raster
        with rasterio.open(output_file, 'w', **profile) as dst:
            dst.write(img_out)

        logging.info(f"Processed image saved: {output_file}")
        return 0

    except Exception as e:
        logging.error(f"Error processing image {input_file}: {e}")
        return 1

# ----------------- Main Entry Point -----------------
def main():
    parser = argparse.ArgumentParser(description="Smooth .TIF images using Gaussian filter")
    parser.add_argument("input_path", help="Path to the input file or folder")
    parser.add_argument("output_path", help="Path for the processed file or folder")
    parser.add_argument("kernel_size", type=int, help="Gaussian kernel size (integer)")

    args = parser.parse_args()

    input_path = args.input_path
    output_path = args.output_path
    kernel_size = args.kernel_size

    logging.info(f"Received kernel_size argument: {args.kernel_size} (type: {type(args.kernel_size)})")


    # Handle directory vs. single file input
    if os.path.isdir(input_path):
        logging.info(f"Input is a directory: {input_path}")
        os.makedirs(output_path, exist_ok=True)

        tif_files = [f for f in os.listdir(input_path) if f.lower().endswith(".tif")]
        if not tif_files:
            logging.warning(f"No .TIF files found in {input_path}")
            sys.exit(1)

        # Process each .TIF file in the directory
        for tif in tif_files:
            in_file = os.path.join(input_path, tif)
            out_file = os.path.join(output_path, tif)
            ret = process_image(in_file, out_file, kernel_size)
            if ret != 0:
                sys.exit(ret)

        logging.info("Batch processing complete.")
        sys.exit(0)

    elif os.path.isfile(input_path):
        logging.info(f"Input is a single file: {input_path}")
        sys.exit(process_image(input_path, output_path, kernel_size))

    else:
        logging.error(f"Invalid input path: {input_path}")
        sys.exit(1)

if __name__ == "__main__":
    main()

import shutil   # High-level file operations (copying, moving, deleting)
import os       # Interacting with the operating system (path checks, directories)
import sys      # Access to system-specific functions (exit codes, stdout, etc.)
import argparse # Handling command-line arguments
import logging  # Logging events and debugging information



def fast_copier(source_folder, destination_folder):
    # Checks if source_folder is a directory or file
    # copies entire folder or copies a single file 
    try:
        if os.path.isdir(source_folder):
            shutil.copytree(source_folder, destination_folder)
        else:
            shutil.copy(source_folder, destination_folder)
        print(f"Copied from {source_folder} to {destination_folder}")
        return 0
    except Exception as e:
        print(f"Error copying: {e}")
        return 1



if __name__ == '__main__':
    # Logging
    logging.basicConfig(stream=sys.stdout, level=os.environ.get("LOG_LEVEL", "INFO"))

    # Argument parser, expecting two command line-argumetns: 
    # input_path -> source file or folder 
    # output_path -> destination file or folder 
    parser = argparse.ArgumentParser(description='Shapefile ingestor')
    parser.add_argument('input_path', help='The inputs location')
    parser.add_argument('output_path', help='The processing outputs file path')
    args = parser.parse_args()
  
    # Executes the copier 
    sys.exit(fast_copier(
        source_folder=args.input_path,
        destination_folder=args.output_path
    ))

Once the main codes have been defined, to integrate them into a processor you can follow the same order proposed above in the guides:

Docker File
Entrypoint
Requirements File
Inputs and Outputs Definition

Docker File

Starting from the provided sample of Docker, you now need to add some system dependencies and Python libraries:

# Install system dependencies and Python libs from Ubuntu repos
RUN apt-get update && apt-get install --no-install-recommends -y \
        python3 \
        python3-pip \
        python3-venv \
        python3-dev \
        build-essential \
        gdal-bin \
        libgdal-dev \
        tree \
        file \
        curl \
        wget \
        ca-certificates \
    && apt-get clean && rm -rf /var/lib/apt/lists/*

Entrypoint

The changes applied to the entrypoint file involve, in this case, only the main function:

def main():

    python_exec = os.path.join(PROC_DIR, "venv", "bin", "python3") 
    copier_path = os.path.join(PROC_DIR, "fastcopier.py")
    processor_path = os.path.join(PROC_DIR, "fastprocessor.py") 
    
    # Optional: Check if files exist

    if not os.path.exists(python_exec):
        logging.error(f"Python executable not found at {python_exec}")
        return
    if not os.path.exists(copier_path):
        logging.error(f"Script not found at {copier_path}")
        return
    if not os.path.exists(processor_path):
        logging.error(f"Python executable not found at {processor_path}")
        return
    
    # Prepare inputs
    #input=get_parameter("input", WPS_PROPS)
    kernel_size = get_parameter("kernelsz", WPS_PROPS) 
    clean_kernel = kernel_size.strip('"') 
    print(f"Clean kernel {clean_kernel}")

    # Run fastcopier.py
    cmd = [python_exec, copier_path, f"{IN_DIR}/input", f"{OUT_DIR}/output_001"]
    logging.info(f"Running command: {' '.join(cmd)}")

    try:
        subprocess.run(cmd, check=True)
        logging.info("Script executed successfully.")
    except subprocess.CalledProcessError as e:
        logging.error(f"Script execution failed with code {e.returncode}")

    # Run fastprocessor.py
    cmd_process = [python_exec, processor_path, f"{IN_DIR}/input", f"{OUT_DIR}/output_002", clean_kernel]
    print("DEBUG - Command to execute:", cmd_process)
    logging.info(f"Running processor: {' '.join(cmd_process)}")
    try:
        subprocess.run(cmd_process, check=True)
        logging.info("Processor executed successfully.")
    except subprocess.CalledProcessError as e:
        logging.error(f"Processor failed with code {e.returncode}")
        return

First, the paths of the two Python codes are defined (line 4 and 5). Then, in lines 21-23 it is defined where to find the variable for the kernel size. Another thing to notice is that you will have two outputs from the two Python codes, so they should be declared twice (output_001 and output_002).

Since there are two .py files to run, some passages should be repeated twice (lines 4-5 and lines 26-36).

Requirements File

The required packages to correctly run this example are listed here below:

jproperties==2.1.2
rasterio==1.4.3
scipy==1.16.1
numpy==2.3.2

Inputs and Outputs Definition

You will need two kinds of inputs for this example:

a .tif image
a kernel (Kernel Size for Gaussian Fitler)

Then you need two kinds of output as well:

Copy of the original image
Filtered image (Smoothed image after applying Gaussian Filter)

Let's select one .tif image to filter:

The resulting filtered image, in this case, is showed below:

Additional Resources

Docker official documentation: https://docs.docker.com/get-started