The intermix.io Collector extracts data from customer data environments. The Collector will first stage the data in an S3 bucket owned by the customer.
By default, the intermix.io Collector does not delete S3 data from the customer bucket. This can be a problem for organizations who wish to delete old data at some point to save money on S3.
The policy can be configured via a docker environment variable DAYS_TO_RETAIN_DATA.
Who Should Use this Procedure
This article is for organizations who wish to implement Data Retention on Amazon S3, and launched an intermix.io Collector without setting a data retention policy.
Steps to Update Collector when using the intermix.io EC2 Instance.
If your collector was created by the intermix.io CloudFormation script, then following these instructions. If you are running the container within an orchestration system, then follow your internal procedures to add docker environment settings.
- SSH into the Collector instance
- Get a root shell
- Edit the file "/collector.env" and add a value of DAYS_TO_RETAIN_DATA.
- The value for DAYS_TO_RETAIN_DATA should be a positive non-zero integer equal to the # of days to retain data.
e.g. if DAYS_TO_RETAIN_DATA=7, then 7 days of data will be retained in the customer's S3 bucket.
- If the value is '0', or if the setting DAYS_TO_RETAIN_DATA does not exist, then data will not be deleted and will be retained forever.
- Stop the container (it will be restarted automatically within 5 minutes)
“root# docker stop intermix_collector”
That’s it. When the collector restarts, it will begin deleting data older than the configured value. This process may take a couple of days depending on the total amount of data in the bucket