All Articles

Cost-effective S3 for Subscription Data Distribution

Scenario

  1. You are a company that has data that you want to offer to customers.
  2. The amount of data is of significant size (where definitions of significance vary).
  3. You sell access to this data; your customers are paying you. If they stop paying, they should no longer have access to the data.
  4. You do not want to limit your paying customer’s ability to transfer these datasets as often as they wish. Customers should be able to access the data conveniently.
  5. You do not have an unlimited budget to pay for internet data transfer feeds. If a customer wants to download the data 100,000 times a week, okay, but you do not want to pay that bill or take on the credit risk involved with billing the customer for their usage.

Solution / Architecture

This solution presumes a single entitlement to the data per bucket. A single entitlement means that any customer can retrieve any data in the bucket as often as they’d like. If you have a scenario where there are multiple entitlements, see the section “Multi-Entitlement Adaptations.”

  1. Create a KMS Customer Master Key.
  2. Create an S3 bucket to store and serve the data to customers.

    1. Designate the S3 bucket as ”Requestor Pays.”
    2. Specify that the S3 bucket should use an S3 Bucket Key to utilize SSE-KMS encryption.

      • Any newly created keys in the S3 bucket will use the KMS Customer Master Key by default.
      • If there are existing keys in the bucket, you will need to ensure they use SSE-KMS.
    3. Store the data in an S3 bucket.
    4. Ensure the ACL on the S3 keys (i.e., the data) is marked so that any authenticated AWS user can retrieve the data.

      • The KMS service will enforce access. If the requesting user can’t decrypt the key, the S3 request will fail.
  3. For users that download data, create KMS grants to the customer master key (CMK) as necessary to entitle access to the data.

    • It will be necessary for users to provide you with the ARN of their AWS credentials to create the grant. You can obtain the ARN by calling the GetCallerIdentity method of the AWS Security Token Service service.
  4. When customers unsubscribe, you must revoke the grant of the CMK previously created.

Requirements

  1. All customers must obtain an AWS account with credentials to access the data. This AWS account will incur the S3 request and internet data transfer (if any).

    • Your customers may not be aware of the ability for them to incur charges to their AWS account in this manner.
  2. Customers need to pass additional S3 request headers to calls to S3 GetObject and ListObjects methods, to signal that they acknowledge they will be paying for the requests.

Limitations

  1. Grants for KMS Customer Master Keys are eventually consistent. There may be a delay before changes to CMK grants are effective.

    • You may need to implement a retry logic for the customer to retry requests.
  2. There is a limit of 50,000 grants for each CMK allowed. If you have more than 50,000 customers, this may cause a complication. But then again, success brings challenges.
  3. In enterprise environments where IAM policies restrict AWS Users’ ability to access S3 buckets, there may be the need for the IAM policies to be modified to allow access to the S3 bucket.

    • Access is to the destination bucket is first governed by the IAM policies of the calling AWS identity.

Implementation Details

The customer-side implementation of this solution needs to focus on making it easy for the customer to get to the data. To accomplish the journey, I’d explain two different approaches.

  1. Full API documentation - explain to customers how to make the S3 requests in their applications.
  2. Offer a command-line tool written in a cross-platform language that prompts the AWS credentials (or integrates with the existing AWS CLI credential stores).

    • The tool would allow customers to:

      1. List available S3 keys
      2. Get an S3 key
      3. Synchronize a local directory to the contents of the S3 bucket.

In the case of having a large (> 50,000) number of customers, it may be beneficial to implement a request to an API to provision the CMK grant as needed and expire those grants after a fixed amount of time or using a least-recently-used replacement strategy.

On the provider side, there are a few needs to be addressed:

  1. A service that manages the lifecycle of the KMS CMK grants when customers subscribe or unsubscribe for access.

    • This service can likely become a part of the existing subscription or provisioning service.
  2. In the case of >50,000 customers, there may be an additional service that revokes grants when they are inactive.

Replication and Reliability

S3 will be using the KMS service to decrypt the S3 key’s content on each request. If an AWS region experiences a service interruption, requests will fail. My suggestion is to only use KMS keys in the same AWS region as the S3 bucket. When replicating the data to a different AWS region, store the S3 keys using a KMS key specific to that AWS region.

Scaling Concerns

Utilizing the S3 bucket keys feature reduces the number of API calls to the KMS service because the S3 service caches the KMS information across requests. Without using S3 bucket keys, each customer request makes an individual request to KMS for decryption and is subject to the service limits.

If S3 transfer speed is insufficient, you may want to investigate the use of S3 Transfer Acceleration.

Cost Analysis

Your customers will pay for:

  1. Internet transfer costs, if applicable.
  2. Per-request cost for S3 GetObject.
  3. The costs for listing the objects available in the bucket. S3 ListObject.
  4. The costs for decrypting the data from the S3 bucket. KMS method Decrypt.

    • S3 makes this API call internally.

If your customer is accessing the data from EC2 in the same region as the S3 bucket, there may not be any internet transit fees.

You, as a service provider, will pay for:

  1. Monthly S3 storage fees.
  2. Monthly KMS CMK fees. ~$1 a month per CMK.
  3. The KMS API calls to manage the grants on the Customer Master Key, (CreateGrant, RevokeGrant).

Multi-Entitlement Adaptations

The preceding solution presumes a single entitlement to the data per S3 bucket because S3 Bucket Keys enforce access to all objects using a single KMS Customer Master Key.

If you have many different products, there are other approaches building on the previously outlined pattern:

  1. When the number of products is small, create a separate bucket for each product and separate KMS CMK for each bucket. There are costs involved for each KMS CMK per month, but you could continue to use S3 Bucket Keys to reduce the number of API calls to the KMS service.

    • Be mindful of the limits of the number of total CMKs per AWS account.
  2. If the number of products is large, you can specify a different CMK to use for each key stored in an S3 Bucket when you create the key with S3’s PutObject method.

    • Again this is subject to the total number of CMKs allowed per region per AWS account.