Skip to content

S3 Reader

S3 Reader plugin is used to read data on Amazon AWS S3 storage. In implementation, this plugin is written based on S3's official SDK 2.0.

This plugin also supports reading storage services compatible with S3 protocol, such as MinIO.

Configuration Example

The following sample configuration is used to read two files from S3 storage and print them out

json
{
  "job": {
    "setting": {
      "speed": {
        "channel": 1,
        "bytes": -1
      },
      "errorLimit": {
        "record": 0,
        "percentage": 0.02
      }
    },
    "content": {
      "reader": {
        "name": "s3reader",
        "parameter": {
          "endpoint": "https://s3.amazonaws.com",
          "accessId": "xxxxxxxxxxxx",
          "accessKey": "xxxxxxxxxxxxxxxxxxxxxxx",
          "bucket": "test",
          "object": [
            "1.csv",
            "aa.csv",
            "upload_*.csv",
            "bb_??.csv"
          ],
          "column": [
            "*"
          ],
          "region": "ap-northeast-1",
          "fileFormat": "csv",
          "fieldDelimiter": ","
        }
      },
      "writer": {
        "name": "streamwriter",
        "parameter": {
          "print": true
        }
      }
    }
  }
}

Parameters

ConfigurationRequiredData TypeDefault ValueDescription
endpointYesstringNoneS3 Server EndPoint address, e.g. s3.xx.amazonaws.com
regionYesstringNoneS3 Server Region address, e.g. ap-southeast-1
accessIdYesstringNoneAccess ID
accessKeyYesstringNoneAccess Key
bucketYesstringNoneBucket to read
objectYeslistNoneObjects to read, can specify multiple and wildcard patterns, see description below
columnYeslistNoneColumn information of objects to read, refer to column description in RDBMS Reader
fieldDelimiterNostring,Field delimiter for reading, only supports single character
compressNostringNoneFile compression format, default is no compression
encodingNostringutf8File encoding format
writeModeNostringnonConflict
pathStyleAccessEnabledNobooleanfalseWhether to enable path-style access mode

object

When specifying a single object, the plugin can currently only use single-threaded data extraction.