PluginsSerializationTasksParquetIonToParquet

IonToParquet

yaml

type: "io.kestra.plugin.serdes.parquet.IonToParquet"

Read a provided file containing ion serialized data and convert it to parquet.

Examples

Read a CSV file, transform it and store the transformed data as a parquet file.

yaml

id: ion_to_parquet
namespace: company.team

tasks:
  - id: download_csv
    type: io.kestra.plugin.core.http.Download
    description: salaries of data professionals from 2020 to 2023 (source ai-jobs.net)
    uri: https://huggingface.co/datasets/kestra/datasets/raw/main/csv/salaries.csv

  - id: avg_salary_by_job_title
    type: io.kestra.plugin.jdbc.duckdb.Query
    inputFiles:
      data.csv: "{{ outputs.download_csv.uri }}"
    sql: |
      SELECT
        job_title,
        ROUND(AVG(salary),2) AS avg_salary
      FROM read_csv_auto('{{ workingDir }}/data.csv', header=True)
      GROUP BY job_title
      HAVING COUNT(job_title) > 10
      ORDER BY avg_salary DESC;
    store: true

  - id: result
    type: io.kestra.plugin.serdes.parquet.IonToParquet
    from: "{{ outputs.avg_salary_by_job_title.uri }}"
    schema: |
      {
        "type": "record",
        "name": "Salary",
        "namespace": "com.example.salary",
        "fields": [
          {"name": "job_title", "type": "string"},
          {"name": "avg_salary", "type": "double"}
        ]
      }

Properties

`from`

Type: string
Dynamic: ✔️
Required: ✔️

Source file URI

`schema`

Type: string
Dynamic: ✔️
Required: ✔️

The avro schema associated to the data

`compressionCodec`

Type: string
Dynamic: ❌
Required: ❌
Default: GZIP
Possible Values:
- UNCOMPRESSED
- SNAPPY
- GZIP
- ZSTD

The compression to used

`dateFormat`

Type: string
Dynamic: ✔️
Required: ❌
Default: yyyy-MM-dd[XXX]

Format to use when parsing date

`datetimeFormat`

Type: string
Dynamic: ✔️
Required: ❌
Default: yyyy-MM-dd'T'HH:mm[:ss][.SSSSSS][XXX]

Format to use when parsing datetime

Default value is yyyy-MM-dd'T'HH:mm[][.SSSSSS]XXX

`decimalSeparator`

Type: string
Dynamic: ✔️
Required: ❌
Default: .

Character to recognize as decimal point (e.g. use ‘,’ for European data).

Default value is '.'

`dictionaryPageSize`

Type: integer
Dynamic: ❌
Required: ❌
Default: 1048576

Max dictionary page size

`falseValues`

Type: array
SubType: string
Dynamic: ✔️
Required: ❌
Default: [f, false, disabled, 0, off, no, ]

Values to consider as False

`inferAllFields`

Type: boolean
Dynamic: ❌
Required: ❌
Default: false

Try to infer all fields

If true, we try to infer all fields with trueValues, trueValues & nullValues.If false, we will infer bool & null only on field declared on schema as null and bool.

`nullValues`

Type: array
SubType: string
Dynamic: ✔️
Required: ❌
Default: [, #N/A, #N/A N/A, #NA, -1.#IND, -1.#QNAN, -NaN, 1.#IND, 1.#QNAN, NA, n/a, nan, null]