When running Elvis 6 within Amazon Web Services (AWS) it is possible to use S3 as a storage engine.
Notes:
|
Configuration considerations
When using Amazon S3 as a storage engine for Elvis, please take note of the following:
- When running the complete Elvis installation on AWS, Amazon S3 is the recommended storage solution. As well as being cost-effective, it ensures availability and proper performance.
- When running Elvis together with a WoodWing Enterprise integration and both are set up on AWS, please also view Integrating Elvis 6 in Enterprise Server 10 on AWS - Load balancer information.
- When running the Elvis installation in a non-Amazon environment (such as a local office or a data center), consider the network speed between the Elvis installation and Amazon S3 and how this relates to the speed of uploading and downloading assets:
- A slow connection to AWS is suitable only for storing archived files that are accessed on occasion
- A fast connection makes it possible to store highly requested production files. See Using S3 as archive storage when running Elvis 6 in a non-Amazon environment.
- When using Amazon S3 as a storage engine, be aware that the feature that automatically creates an Elasticsearch backup by default references sharedDataLocation in its setting elasticsearch.backup.location. If the Search nodes will not have a file system based shared data location, the backup should be disabled. Instead, you can manually configure and manage an S3-based backup repository in Elasticsearch.
AWS setup
Note: Experience in working with AWS is needed to determine the correct setup, depending on what your requirements are. We strongly advise to use an infrastructure management tool such as AWS CloudFormation or Terraform to create, update, manage, and document your deployment.
Ports and Security Group
Ensure that the Security Group for the Elvis nodes has the proper TCP port configuration:
- 5701, only accessible from the Elvis nodes Security Group itself.
- 9300, only accessible from the Elvis nodes Security Group itself.
- 80, accessible from the Security Group of the Load Balancer. This assumes that HTTPS termination on the Load Balancer is used.
Alternative to using port 80:
Do this when:
|
The Security Group name is also used for discovery between the Elvis nodes, see the cluster.join.aws.securityGroupName configuration property mentioned below.
IAM Role and Policy
Set up an IAM Role that grants permissions for the EC2 instances that are running Elvis. The role should be assigned to the EC2 instance(s). When running on an EC2 instance, Elvis will automatically assume that role to connect to S3 and other AWS services.
At least the following IAM policies must be configured for the Role.
Grant access to S3:
{
"Effect": "Allow",
"Action": [
"s3:PutObject",
"s3:GetObject",
"s3:DeleteObject",
"s3:ListBucket"
],
"Resource": [
"arn:aws:s3:::your-storage-bucket/*",
"arn:aws:s3:::your-storage-bucket"
]
}
Allow the nodes to query the AWS environment to find other Elvis nodes with the same Security Group and form a cluster:
{
"Effect": "Allow",
"Action": [
"ec2:DescribeInstances"
],
"Resource": [
"*"
]
}
Load Balancer
We advise using an EC2 Application Load Balancer if you need one. Make sure that sticky sessions are turned ON, this will improve cache efficiency within Elvis.
Configuring Elvis
On each node, configure specific Elvis S3 and AWS related configuration properties.
Step 1. Configure the node-config.properties.txt file:
# Since IPs are typically dynamic on EC2, the tcpip.members method does not work very well.
cluster.join.tcpip.enabled=false
# Instead, we use the aws join method. Specify the correct region and the name of the SecurityGroup assigned to the Elvis nodes.
cluster.join.aws.enabled=true
cluster.join.aws.region=eu-west-1
cluster.join.aws.securityGroupName=sg-...
# Just used for S3 temp file storage, use fast local storage
sharedDataLocation=
# Set it to local, otherwise it will wait until shared-data is mounted
fileStoreType=local
Step 2. Also add the following to the node-config.properties.txt file:
For Elvis 6.16 or higher:
# Set storage engine type to S3
storage.engine.type=S3
# The name of your S3 bucket
storage.engine.s3.bucket=
# Region of the S3 bucket e.g. eu-west-1
storage.engine.s3.region=
For Elvis 6.15 or lower:
# Set storage engine type to S3
storage.engine.type=S3
# The name of your S3 bucket
storage.engine.s3.bucket=
# S3 endpoint e.g. s3-eu-west-1.amazonaws.com
storage.engine.s3.endpoint=
Comment
Do you have corrections or additional information about this article? Leave a comment! Do you have a question about what is described in this article? Please contact Support.
4 comments
If one is to create a snapshot with name "elvis-shared-data" and a settings location or type that differs from what is set initially, will Elvis still create snapshots at the scheduled time simply using that identifier?
Hi Sybrand. Looking at the code, the Elvis server will delete and add the snapshot repository before each backup, to ensure that it uses the settings as configured in the Elvis server. For example in case you change the backup location. This operation does not affect the contents of the snapshot repository, but it does force the "elvis-shared-data" to be an 'fs' type repository. So your proposed trick won't work.
Nico, do you think a feature request makes sense? Currently we use EFS as a shared file system, and the only reason we need it, is for the ES snapshot repo. We currently schedule an " s3 sync --delete" to get the snapshot somewhere sane. With a 14:1 cost ratio per GB/month for EFS:S3, it's very much worth being able to write directly to an S3 based snapshot repo.
Hello Siebrand. The S3 backup plugin for Elasticsearch has an issue, making it not really the ideal solution. The incremental backup tries to read all existing backup snapshot files before it makes a new backup. This causes it to become slower and slower at performing incremental backups as the backup get's larger.
So either you have to delete old snapshots, or make full backups every time.
We use the last method for doing hourly backups on Swivle. But you will definitely want to add some monitoring and keep an eye on the backup duration and put some alarms on it when it takes too long.
Including the monitoring, it might make more sense if you set up Elasticsearch S3 backup yourself.
You may still file a feature request for it.
Please sign in to leave a comment.