ForEachItemSplit​For​Each​Item​Split

ForEachItemSplit ForEachItemSplit

yaml
type: "io.kestra.plugin.core.flow.ForEachItem$ForEachItemSplit"

Properties

batch

items

  • Type: string
  • Dynamic:
  • Required:

Outputs

splits

  • Type: string
  • Required:
  • Format: uri

Definitions

io.kestra.plugin.core.flow.ForEachItem-Batch

Properties

bytes
  • Type: string
  • Dynamic: ✔️
  • Required:

Split a large file into multiple chunks with a maximum file size of bytes.

Can be provided as a string in the format "10MB" or "200KB", or the number of bytes. This allows you to process large files, slit them into smaller chunks by lines and process them in parallel. For example, MySQL by default limits the size of a query size to 16MB per query. Trying to use a bulk insert query with input data larger than 16MB will fail. Splitting the input data into smaller chunks is a common strategy to circumvent this limitation. By dividing a large data set into chunks smaller than the max_allowed_packet size (e.g., 10MB), you can insert the data in multiple smaller queries. This approach not only helps to avoid hitting the query size limit but can also be more efficient and manageable in terms of memory utilization, especially for very large datasets. In short, by splitting the file by bytes, you can bulk-insert smaller chunks of e.g. 10MB in parallel to avoid this limitation.

partitions
  • Type:
    • integer
    • string
  • Dynamic: ✔️
  • Required:
rows
  • Type:
    • integer
    • string
  • Dynamic: ✔️
  • Required:
separator
  • Type: string
  • Dynamic: ✔️
  • Required:
  • Default: \n

The separator used to split a file into chunks. By default, it's a newline \n character. If you are on Windows, you might want to use \r\n instead.

Was this page helpful?