Batch
This plugin is exclusively available on the Cloud and Enterprise editions of Kestra.
type: "io.kestra.plugin.ee.gcp.runner.Batch"
Task runner that executes a task inside a job in Google Cloud Batch.
This plugin is only available in the Enterprise Edition (EE).
This task runner is container-based so the containerImage
property must be set.
You need to have roles 'Batch Job Editor' and 'Logs Viewer' to be able to use it.
To access the task's working directory, use the {{ workingDir }}
Pebble expression or the WORKING_DIR
environment variable. Input files and namespace files will be available in this directory.
To generate output files you can either use the outputFiles
task's property and create a file with the same name in the task's working directory, or create any file in the output directory which can be accessed by the {{ outputDir }}
Pebble expression or the OUTPUT_DIR
environment variables.
To use inputFiles
, outputFiles
or namespaceFiles
properties, make sure to set the bucket
property. The bucket serves as an intermediary storage layer for the task runner. Input and namespace files will be uploaded to the cloud storage bucket before the task run. Similarly, the task runner will store outputFiles in this bucket during the task run. In the end, the task runner will make those files available for download and preview from the UI by sending them to internal storage.
The task runner will generate a folder in the configured bucket
for each task run. You can access that folder using the {{ bucketPath }}
Pebble expression or the BUCKET_PATH
environment variable.
Warning, contrarily to other task runners, this task runner didn't run the task in the working directory but in the root directory. You must use the {{ workingDir }}
Pebble expression or the WORKING_DIR
environment variable to access files.
Note that when the Kestra Worker running this task is terminated, the batch job will still runs until completion, then after restarting, the Worker will resume processing on the existing job unless resume
is set to false.
Examples
Execute a Shell command.
id: new-shell
namespace: company.team
tasks:
- id: shell
type: io.kestra.plugin.scripts.shell.Commands
taskRunner:
type: io.kestra.plugin.ee.gcp.runner.Batch
projectId: "{{vars.projectId}}"
region: "{{vars.region}}"
commands:
- echo "Hello World"
Pass input files to the task, execute a Shell command, then retrieve output files.
id: new-shell-with-file
namespace: company.team
inputs:
- id: file
type: FILE
tasks:
- id: shell
type: io.kestra.plugin.scripts.shell.Commands
inputFiles:
data.txt: "{{inputs.file}}"
outputFiles:
- out.txt
containerImage: centos
taskRunner:
type: io.kestra.plugin.ee.gcp.runner.Batch
projectId: "{{vars.projectId}}"
region: "{{vars.region}}"
bucket: "{{vars.bucker}}"
commands:
- cp {{workingDir}}/data.txt {{workingDir}}/out.txt
Properties
delete
- Type: boolean
- Dynamic: ❌
- Required: ✔️
- Default:
true
Whether the job should be deleted upon completion.
machineType
- Type: string
- Dynamic: ✔️
- Required: ✔️
- Default:
e2-medium
The GCP machine type.
region
- Type: string
- Dynamic: ✔️
- Required: ✔️
The GCP region.
resume
- Type: boolean
- Dynamic: ❌
- Required: ✔️
- Default:
true
Whether to reconnect to the current job if it already exists.
bucket
- Type: string
- Dynamic: ✔️
- Required: ❌
Google Cloud Storage Bucket to use to upload (inputFiles
and namespaceFiles
) and download (outputFiles
) files.
It's mandatory to provide a bucket if you want to use such properties.
completionCheckInterval
- Type: string
- Dynamic: ❌
- Required: ❌
- Default:
5.000000000
- Format:
duration
Determines how often Kestra should poll the container for completion. By default, the task runner checks every 5 seconds whether the job is completed. You can set this to a lower value (e.g. PT0.1S
= every 100 milliseconds) for quick jobs and to a lower threshold (e.g. PT1M
= every minute) for long-running jobs. Setting this property to a lower value will reduce the number of API calls Kestra makes to the remote service — keep that in mind in case you see API rate limit errors.
computeResource
- Type: Batch-ComputeResource
- Dynamic: ❌
- Required: ❌
Compute resource requirements.
ComputeResource defines the amount of resources required for each task. Make sure your tasks have enough compute resources to successfully run. If you also define the types of resources for a job to use with the InstancePolicyOrTemplate field, make sure both fields are compatible with each other.
entryPoint
- Type: array
- SubType: string
- Dynamic: ✔️
- Required: ❌
Container entrypoint to use.
lifecyclePolicies
- Type: array
- SubType: Batch-LifecyclePolicy
- Dynamic: ❌
- Required: ❌
Lifecycle management schema when any task in a task group is failed.
Currently we only support one lifecycle policy. When the lifecycle policy condition is met, the action in the policy will execute. If task execution result does not meet with the defined lifecycle policy, we consider it as the default policy. Default policy means if the exit code is 0, exit task. If task ends with non-zero exit code, retry the task with max_retry_count.
maxRetryCount
- Type: integer
- Dynamic: ❌
- Required: ❌
- Minimum:
>= 0
- Maximum:
<= 10
**Maximum number of retries on failures. **
The default, 0, which means never retry.
networkInterfaces
- Type: array
- SubType: Batch-NetworkInterface
- Dynamic: ❌
- Required: ❌
Network interfaces.
projectId
- Type: string
- Dynamic: ✔️
- Required: ❌
The GCP project ID.
reservation
- Type: string
- Dynamic: ✔️
- Required: ❌
Compute reservation.
scopes
- Type: array
- SubType: string
- Dynamic: ✔️
- Required: ❌
- Default:
[https://www.googleapis.com/auth/cloud-platform]
The GCP scopes to be used.
serviceAccount
- Type: string
- Dynamic: ✔️
- Required: ❌
The GCP service account key.
waitForLogInterval
- Type: string
- Dynamic: ❌
- Required: ❌
- Default:
5.000000000
- Format:
duration
Additional time after the job ends to wait for late logs.
waitUntilCompletion
- Type: string
- Dynamic: ❌
- Required: ❌
- Default:
3600.000000000
- Format:
duration
The maximum duration to wait for the job completion unless the task timeout
property is set which will take precedence over this property.
Google Cloud Batch will automatically timeout the job upon reaching such duration and the task will be failed.
Outputs
Definitions
io.kestra.plugin.ee.gcp.runner.Batch-LifecyclePolicyAction
Properties
exitCodes
- Type: array
- SubType: integer
- Dynamic: ❌
- Required: ❌
Exit codes of a task execution.
If there are more than 1 exit codes, when task executes with any of the exit code in the list, the condition is met and the action will be executed.
io.kestra.plugin.ee.gcp.runner.Batch-LifecyclePolicy
Properties
action
- Type: string
- Dynamic: ❌
- Required: ❌
- Possible Values:
ACTION_UNSPECIFIED
RETRY_TASK
FAIL_TASK
UNRECOGNIZED
Action on task failures based on different conditions.
actionCondition
- Type: Batch-LifecyclePolicyAction
- Dynamic: ❌
- Required: ❌
Conditions for actions to deal with task failures.
io.kestra.plugin.ee.gcp.runner.Batch-NetworkInterface
Properties
network
- Type: string
- Dynamic: ✔️
- Required: ✔️
Network identifier with the format projects/HOST_PROJECT_ID/global/networks/NETWORK
.
subnetwork
- Type: string
- Dynamic: ✔️
- Required: ❌
Subnetwork identifier in the format projects/HOST_PROJECT_ID/regions/REGION/subnetworks/SUBNET
io.kestra.plugin.ee.gcp.runner.Batch-ComputeResource
Properties
bootDisk
- Type: string
- Dynamic: ❌
- Required: ❌
Extra boot disk size for each task.
cpu
- Type: string
- Dynamic: ❌
- Required: ❌
The milliCPU count.
Defines the amount of CPU resources per task in milliCPU units. For example,
1000
corresponds to 1 vCPU per task. If undefined, the default value is2000
. If you also define the VM's machine type using themachineType
property in InstancePolicy field or inside theinstanceTemplate
in the InstancePolicyOrTemplate field, make sure the CPU resources for both fields are compatible with each other and with how many tasks you want to allow to run on the same VM at the same time.
For example, if you specify the n2-standard-2
machine type, which has 2 vCPUs, you can set the cpu
to no more than 2000
. Alternatively, you can run two tasks on the same VM if you set the cpu
to 1000
or less.
memory
- Type: string
- Dynamic: ❌
- Required: ❌
Memory in MiB.
Defines the amount of memory per task in MiB units. If undefined, the default value is
2048
. If you also define the VM's machine type using themachineType
in InstancePolicy field or inside theinstanceTemplate
in the InstancePolicyOrTemplate field, make sure the memory resources for both fields are compatible with each other and with how many tasks you want to allow to run on the same VM at the same time.
For example, if you specify the n2-standard-2
machine type, which has 8 GiB of memory, you can set the memory
to no more than 8192
.
Was this page helpful?