Skip to content

HDFS Writer

HDFS Writer provides the ability to write files in formats like TextFile, ORCFile, Parquet etc. to specified paths in HDFS file system. File content can be associated with tables in Hive.

Configuration Example

json
{
  "job": {
    "setting": {
      "speed": {
        "channel": 2,
        "bytes": -1
      }
    },
    "content": {
      "reader": {
        "name": "streamreader",
        "parameter": {
          "column": [
            {
              "value": "Addax",
              "type": "string"
            },
            {
              "value": 19890604,
              "type": "long"
            },
            {
              "value": "1989-06-04 00:00:00",
              "type": "date"
            },
            {
              "value": true,
              "type": "bool"
            },
            {
              "value": "test",
              "type": "bytes"
            },
            {
              "value": "['tag1', 'tag2', 'tag3']",
              "type": "string"
            },
            {
              "value": "{'loc':'HZ','num':'12'}",
              "type": "string"
            }
          ],
          "sliceRecordCount": 1000
        },
        "writer": {
          "name": "hdfswriter",
          "parameter": {
            "defaultFS": "hdfs://xxx:port",
            "fileType": "orc",
            "path": "/user/hive/warehouse/writerorc.db/orcfull",
            "fileName": "xxxx",
            "column": [
              {
                "name": "col1",
                "type": "string"
              },
              {
                "name": "col2",
                "type": "int"
              },
              {
                "name": "col3",
                "type": "string"
              },
              {
                "name": "col4",
                "type": "boolean"
              },
              {
                "name": "col5",
                "type": "string"
              },
              {
                "name": "col6",
                "type": "array<string>"
              },
              {
                "name": "col7",
                "type": "map<string,string>"
              }
            ],
            "writeMode": "overwrite",
            "fieldDelimiter": "\u0001",
            "compress": "SNAPPY"
          }
        }
      }
    }
  }
}

Parameters

ConfigurationRequiredData TypeDefault ValueDescription
pathYesstringNoneFile path to read
defaultFSYesstringNoneDetailed description below
fileTypeYesstringNoneFile type, detailed description below
fileNameYesstringNoneFilename to write, used as prefix
columnYeslist<map>NoneList of fields to write
writeModeYesstringNoneWrite mode, detailed description below
skipTrashNobooleanfalseWhether to skip trash, related to writeMode configuration
fieldDelimiterNostring,Field delimiter for text files, not needed for binary files
encodingNostringutf-8File encoding configuration, currently only supports utf-8
nullFormatNostringNoneDefine characters representing null, e.g. if user configures "\\N", then if source data is "\N", treat as null field
haveKerberosNobooleanfalseWhether to enable Kerberos authentication, if enabled, need to configure the following two items
kerberosKeytabFilePathNostringNoneCredential file path for Kerberos authentication, e.g. /your/path/addax.service.keytab
kerberosPrincipalNostringNoneCredential principal for Kerberos authentication, e.g. addax/node1@WGZHAO.COM
compressNostringNoneFile compression format, see below
hadoopConfigNomapNoneCan configure some advanced parameters related to Hadoop, such as HA configuration
preShellNolistNoneShell commands to execute before writing data, e.g. hive -e "truncate table test.hello"