The intermix.io Collector is a software agent which runs on the customer AWS environment and extracts metadata from a Redshift cluster and moves it to an S3 bucket which is accessible byintermix.io. intermix.io then retrieves the data from this S3 bucket to provide the service.
The collector is an alternative method for using the Intermix service if you have specific requirements that prevent you from using the standard extraction method.
When to Use the Collector
Some customers have specific security and privacy requirements which require the use of the Collector. Use the Collector if:
2. for security reasons, you are unable to add an inbound firewall rule to have our system access your Redshift cluster, or
3. for security reasons, you are unable to provide Intermix with a superuser account name / password on your Redshift cluster.
The Collector requires superuser access to the User’s Redshift database. The superuser account login and password will never be sent to intermix.io. The credentials will be stored on your EC2 instance which will host the Collector. You can monitor Intermix query activity by searching for queries tagged with the string “IntermixSQL”. Click here for instructions on creating a superuser account on Redshift.
AWS Infrastructure Provided by Customer
The following assets will be created via a CloudFormation script provided by intermix.io.
|IAM User||Read & Write files to the customer S3 bucket.|
|EC2 Instance||Hosts the Collector agent which runs as a Docker container. It will be an M4.MEDIUM size.|
|Subnet||The subnet must have access to Redshift and (optionally) a route to the internet.|
|Public IP||(optional) If the Subnet does not have a route to the internet (e.g. via a NAT), then the EC2 instance must be assigned a public IP.|
|S3 Bucket||To store the Redshift data|
Sign up for intermix.io
Go to the signup page and create an account. Following the instruction to create a user account following the confirmation link sent to you email.
Request a Bucket Key from intermix.io Support
Email email@example.com to request a bucket key. This key uniquely identifies your cluster in our system and will be provided to you by intermix.io.
Go to AWS CloudFormation Console
1. Ensure you are in the same region as your Redshift cluster. If you don't know the Region, ask your AWS admin.
2. Select "Create Stack" from the upper left corner.
4. You should now be presented with a set of fields which require your Input.
- The EC2 instance requires a route to your Redshift cluster, as well as an outbound route to the internet.
- The chosen subnet should be in the same VPC as your Redshift cluster.
- The security group should allow outbound access to dockerhub.com (used for docker image hosting) and papertrail.com (used for sending system logs from the instance so we can help monitor it's uptime) and S3.
- If the Subnet has an outbound route to the internet (as in via a NAT gateway) then you can select "No" for public IP. Otherwise select 'Yes' for a public IP.
- See the intermix.io Collector Security Guide for details on how the Scrubber works.
5. Select "Next" and enter optional parameters.
Finally, you will be asked to confirm the creation of the stack.
The Collector will automatically update itself when a newer version is available. The update mechanism is done via a scheduled task running on the EC2 instance which polls dockerhub.com to determine if a new version of the intermix.io Collector Docker Image is available. If so, a script will pull down the new image and restart the container.
The CloudFormation stack will create an instance based on a custom AMI which will automatically start the Collector. Ensure the CloudFormation script completes successfully.
Nothing remains for you to do. You will be sent an email from intermix.io when your data is ready to view in the dashboard.