AWS S3 Integration

When will the data be dumped?

It is hard to define an exact time due to the varying size of the data that needs to be processed. If you plan to integrate your systems based on data dump completion, it is recommended to start your batch job around 21:00 UTC.

To what path will the data be dumped to?

Different paths are used for web, app and tracking link data.

  • Web events: s3://{bucket_name}/{app_name}/web/{version}/date={YYYY-MM-DD}/
  • App events: s3://{bucket_name}/{app_name}/app/{version}/date={YYYY-MM-DD}/
  • Tracking link events: s3://{bucket_name}/{app_name}/tracking-link/{version}/date={YYYY-MM-DD}/

For example, if event data from an app called "airbridge" is dumped to an S3 bucket called "airbridge_data" on January 1st 2020, it will be dumped to the below paths.

  • Web events: s3://airbridge_data/airbridge/web/v1.1/date=2020-01-01/
  • App events: s3://airbridge_data/airbridge/app/v1.1/date=2020-01-01/
  • Tracking link events: s3://airbridge_data/airbridge/tracking-link/v1.1/date=2020-01-01/

Several files in folder

In order to quickly access several terabytes of data when dumping raw data, raw data queries are processed in parallel. This leads to several files less than 128MB being dumped per day. Since all files follow the same column format, please make sure you load all files (*) when loading data through an application.

What does "v1.1" mean in the data dump path?

Versioning is used to let clients know that there has been a change in the column format (e.g. new columns, changed columns). When a new version is about to be released, clients will be notified at least two weeks before the change is applied.

Which columns are wrapped in double-quotes (") in the csv file?

rfc4180 standards are applied.
If line breaks (CRLF) are included, a particular row may be seen as several lines. This can be prevented if double-quotes are properly processed while reading data.

v1.2 (Integrations after September 1st, 2020)

  • Header columns are always wrapped in double-quotes (").
  • String type columns are wrapped in double-quotes, unless the value is null. If the string itself contains double-quotes, it is escaped using double-quotes. If the string contains a line break, the string value is continued after the line break.
  • Number type columns and boolean type columns are not wrapped in double-quotes. It is empty if the value is null.

Below is an example of each column. The below data is for a single record.

line","column header test"

v1.1 (Integrations before September 1st, 2020)

  • Columns are wrapped in double-quotes if line breaks, double-quotes and commas are included in the value. Please refer to rfc4180 for more details. S3 integration csv files follow the rfc1480 format.