Split
type: "io.kestra.plugin.core.storage.Split"
Split a file from the Kestra's internal storage into multiple files.
Examples
Split a file by size.
id: "split"
type: "io.kestra.plugin.core.storage.Split"
from: "kestra://long/url/file1.txt"
bytes: 10MB
Split a file by rows count.
id: "split"
type: "io.kestra.plugin.core.storage.Split"
from: "kestra://long/url/file1.txt"
rows: 1000
Split a file in a defined number of partitions.
id: "split"
type: "io.kestra.plugin.core.storage.Split"
from: "kestra://long/url/file1.txt"
partitions: 8
Properties
from
- Type: string
- Dynamic: ✔️
- Required: ✔️
The file to be split.
bytes
- Type: string
- Dynamic: ✔️
- Required: ❌
Split a large file into multiple chunks with a maximum file size of bytes
.
Can be provided as a string in the format "10MB" or "200KB", or the number of bytes. This allows you to process large files, slit them into smaller chunks by lines and process them in parallel. For example, MySQL by default limits the size of a query size to 16MB per query. Trying to use a bulk insert query with input data larger than 16MB will fail. Splitting the input data into smaller chunks is a common strategy to circumvent this limitation. By dividing a large data set into chunks smaller than the
max_allowed_packet
size (e.g., 10MB), you can insert the data in multiple smaller queries. This approach not only helps to avoid hitting the query size limit but can also be more efficient and manageable in terms of memory utilization, especially for very large datasets. In short, by splitting the file by bytes, you can bulk-insert smaller chunks of e.g. 10MB in parallel to avoid this limitation.
partitions
- Type:
- integer
- string
- Dynamic: ✔️
- Required: ❌
rows
- Type:
- integer
- string
- Dynamic: ✔️
- Required: ❌
separator
- Type: string
- Dynamic: ✔️
- Required: ❌
- Default:
\n
The separator used to split a file into chunks. By default, it's a newline \n
character. If you are on Windows, you might want to use \r\n
instead.
Outputs
uris
- Type: array
- SubType: string
- Required: ❌
The URIs of split files in the Kestra's internal storage.
Definitions
Was this page helpful?