An Image Host of Your Own, Part 1
The result of this post is live and running at img.olafalo.net. Check it out! (Please excuse my front-end.)
Intro
A few years ago, starting an image host was a daunting task. The first few users might not have been too hard, but scaling was a problem. How do you reliably serve images to millions of users?
The answer these days is, of course, to let someone else do it. AWS S3 is brilliant for this. You can generally stuff whatever you want into an S3 bucket, and anyone can reliably access it. Add on tools like CloudFront, and now anyone can reliably access it quickly.
In this post, we’ll create a backend for the simplest MVP image host possible. All it will do is allow users to upload an image, then get back a sharable link to that image. The cool part, though, is that this setup could theoretically scale to millions of global users at any time without breaking a sweat, because all the technology we’ll be using (mostly S3, Lambda, and CloudFront) scales very well.
Step 0: Setup
If you want to follow along, you’ll need an AWS account. You’ll probably also want to set up the AWS CLI tool. Some background knowledge would be helpful too, but I’ll explain the basics as we go.
Step 1: S3
I’m going to create 3 S3 buckets:
- Deployments: A private bucket which we use for lambda deployments, more on that later
- StaticWebsite: A public bucket that’s set up as a website, to host the static part of our website (
index.html
and such) - ImageBucket: The bucket that the actual images will go into
I make an effort to use CloudFormation for everything, because “make an image host without writing any code!” sounds much better than it actually is, and I didn’t want to spend an entire day taking screenshots of the AWS console. Here’s basestack.yaml
, which describes a CFN stack with those three buckets:
Resources:
Deployments:
Type: AWS::S3::Bucket
Properties:
AccessControl: Private
StaticWebsite:
Type: AWS::S3::Bucket
Properties:
AccessControl: PublicRead
ImageBucket:
Type: AWS::S3::Bucket
Properties:
AccessControl: PublicRead
To deploy this stack, we can use the aws
CLI tool:
aws cloudformation deploy template-file basestack.yaml --stack-name BaseStack --capabilities CAPABILITY_IAM
The --capabilities
flag may not be necessary in this case, but you generally need it when you’re changing permissions. The docs have this to say:
Some stack templates might include resources that can affect permissions in your AWS account, for example, by creating new AWS Identity and Access Management (IAM) users. For those stacks, you must explicitly acknowledge their capabilities by specifying this parameter. The only valid values are CAPABILITY_IAM and CAPABILITY_NAMED_IAM.
If the deployment went well, congrats! You have some S3 buckets. I recommend creating a shell script or two for building and deploying, once we have something to build.
Step 2: “Serverless” Just Means Someone Else’s Servers
To allow image uploads, we’re going to create a lambda function that has upload access to our ImageBucket
. We’re also going to create an API Gateway endpoint that allows anyone to send image information to our lambda.
Lambdas can be written in several languages, but I went with Go for this one. I’m a big fan of Go; it’s quick, predictable, and there’s no need to sprinkle async
keywords everywhere or to put your entire lambda in one long promise chain. Anyway, here’s upload.go
:
package main
var awsSession *session.Session
// Get the name of the upload bucket as a config var
var bucketName = os.Getenv("UPLOAD_BUCKET")
// getSession returns a new session if one did not already exist, and the
// existing session if it had already been created.
func getSession() *session.Session {
if awsSession != nil {
return awsSession
}
// Initialize a session that the SDK will use to load configuration,
// credentials, and region from the shared config file. (~/.aws/config).
awsSession := session.Must(session.NewSessionWithOptions(session.Options{
SharedConfigState: session.SharedConfigEnable,
}))
return awsSession
}
// Lambda input structure
type imageEvent struct {
Image64 string // image contents in base64
Extension string // file extension, e.g. "png"
}
// Lambda output structure
type uploadResult struct {
URL string
}
// The lambda function itself
func uploadImage(ctx context.Context, event imageEvent) (uploadResult, error) {
uploader := s3manager.NewUploader(getSession())
image, err := base64.StdEncoding.DecodeString(event.Image64)
if err != nil {
return uploadResult{}, err
}
// Create a key for the image to store in S3, a UUID with an extension
imageID := uuid.NewV4().String()
imageKey := fmt.Sprintf("i/%s.%s", imageID, event.Extension)
imageReader := bytes.NewReader(image)
uploadParams := &s3manager.UploadInput{
Bucket: &bucketName,
Key: &imageKey,
Body: imageReader,
// I know, I know. We could definitely find a better way to set the
// content-type, but this works well enough as an MVP.
ContentType: aws.String(fmt.Sprintf("image/%s", event.Extension)),
}
// Do the upload and return its URL
_, err = uploader.Upload(uploadParams)
if err != nil {
return uploadResult{}, err
}
return uploadResult{URL: fmt.Sprintf("%s", imageKey)}, nil
}
func main() {
lambda.Start(uploadImage)
}
I saved this file as upload.go
in a directory called image-upload
. To build this for use in a lambda, we have to do something like this:
mkdir -p tmp/bin
GOOS=linux GOARCH=amd64 go build -o tmp/bin/image-upload ./image-upload
mkdir -p tmp/dist
zip -j tmp/dist/image-upload.zip tmp/bin/image-upload
This gives us tmp/dist/image-upload.zip
, which we can use as the code for the lambda.
Now we need another CloudFormation stack, which will contain the lambda and the API Gateway endpoint. The resources we need are:
- The lambda itself
- A role for the lambda that allows uploading to S3
- A permission that allows API Gateway to invoke the lambda
- An API Gateway REST API
- A POST method for that API (the upload endpoint)
- An OPTIONS method for the API (so that we don’t run into CORS issues)
- An API Gateway deployment for our API
Yikes. We also need a few parameters, so that we know the names of our S3 buckets, and so that we can have a deployment prefix that updates each time we deploy. Putting it all together, this is what we get, stack.yaml
:
Resources:
S3UploadRole:
Type: AWS::IAM::Role
Properties:
AssumeRolePolicyDocument:
Version: 2012-10-17
Statement:
-
Effect: Allow
Principal:
Service:
- lambda.amazonaws.com
Action:
- sts:AssumeRole
Path: /
Policies:
-
PolicyName: S3UploadPolicy
PolicyDocument:
Version: 2012-10-17
Statement:
-
Effect: Allow
Action: s3:*
Resource: "*"
ImageUploadLambda:
Type: AWS::Lambda::Function
Properties:
Handler: image-upload
Runtime: go1.x
Environment:
Variables:
UPLOAD_BUCKET: !Ref ImageBucket
Role: !GetAtt S3UploadRole.Arn
Code:
S3Bucket: !Ref DeploymentsBucket
S3Key:
Fn::Join:
- "/"
-
- !Ref DeployPrefix
- image-upload.zip
ImageUploadLambdaPermission:
Type: AWS::Lambda::Permission
Properties:
Action: lambda:InvokeFunction
FunctionName: !Ref ImageUploadLambda
Principal: apigateway.amazonaws.com
SourceArn: !Sub "arn:aws:execute-api:${AWS::Region}:${AWS::AccountId}:${UploadAPI}/*/POST/*"
DependsOn:
- ImageUploadLambda
- UploadAPI
UploadAPI:
Type: AWS::ApiGateway::RestApi
Properties:
Name: UploadAPI
BinaryMediaTypes:
- image/jpeg
- image/png
- image/gif
- image/webp
EndpointConfiguration:
Types:
- EDGE
UploadAPIPost:
Type: AWS::ApiGateway::Method
Properties:
AuthorizationType: NONE
HttpMethod: POST
ResourceId:
Fn::GetAtt:
- UploadAPI
- RootResourceId
RestApiId: !Ref UploadAPI
MethodResponses:
-
StatusCode: 200
ResponseModels:
application/json: 'Empty'
ResponseParameters:
method.response.header.Access-Control-Allow-Headers: false
method.response.header.Access-Control-Allow-Methods: false
method.response.header.Access-Control-Allow-Origin: false
-
StatusCode: 500
ResponseModels:
application/json: 'Empty'
ResponseParameters:
method.response.header.Access-Control-Allow-Headers: false
method.response.header.Access-Control-Allow-Methods: false
method.response.header.Access-Control-Allow-Origin: false
Integration:
Type: AWS
IntegrationHttpMethod: POST
IntegrationResponses:
-
StatusCode: 200
ResponseParameters:
method.response.header.Access-Control-Allow-Headers: "'Content-Type,X-Amz-Date,Authorization,X-Api-Key,X-Amz-Security-Token'"
method.response.header.Access-Control-Allow-Methods: "'POST,OPTIONS'"
method.response.header.Access-Control-Allow-Origin: "'*'"
-
StatusCode: 500
SelectionPattern: .+
ResponseParameters:
method.response.header.Access-Control-Allow-Headers: "'Content-Type,X-Amz-Date,Authorization,X-Api-Key,X-Amz-Security-Token'"
method.response.header.Access-Control-Allow-Methods: "'POST,OPTIONS'"
method.response.header.Access-Control-Allow-Origin: "'*'"
Uri:
Fn::Join:
- ""
-
- "arn:aws:apigateway:"
- !Ref AWS::Region
- ":lambda:path/2015-03-31/functions/"
- !GetAtt ImageUploadLambda.Arn
- "/invocations"
UploadAPIOptions:
Type: AWS::ApiGateway::Method
Properties:
AuthorizationType: NONE
RestApiId: !Ref UploadAPI
ResourceId: !GetAtt UploadAPI.RootResourceId
HttpMethod: OPTIONS
Integration:
IntegrationResponses:
- StatusCode: 200
ResponseParameters:
method.response.header.Access-Control-Allow-Headers: "'Content-Type,X-Amz-Date,Authorization,X-Api-Key,X-Amz-Security-Token'"
method.response.header.Access-Control-Allow-Methods: "'POST,OPTIONS'"
method.response.header.Access-Control-Allow-Origin: "'*'"
ResponseTemplates:
application/json: ''
PassthroughBehavior: WHEN_NO_MATCH
RequestTemplates:
application/json: '{"statusCode": 200}'
Type: MOCK
MethodResponses:
- StatusCode: 200
ResponseModels:
application/json: 'Empty'
ResponseParameters:
method.response.header.Access-Control-Allow-Headers: false
method.response.header.Access-Control-Allow-Methods: false
method.response.header.Access-Control-Allow-Origin: false
UploadAPIDeployment:
Type: AWS::ApiGateway::Deployment
Properties:
RestApiId: !Ref UploadAPI
StageName: v0
DependsOn: UploadAPIPost
Parameters:
DeployPrefix:
Type: String
Default: "0"
DeploymentsBucket:
Type: String
ImageBucket:
Type: String
I’ve never heard anyone call CloudFormation “terse”.
Anyway, here’s my deploy.sh
that I used to deploy the whole backend in one go:
#!/bin/bash
TEMPLATE_FILE=basestack.yaml
STACK_NAME=BaseStack
function getResourceId {
aws cloudformation describe-stack-resource \
--stack-name $STACK_NAME \
--logical-resource-id $1 \
| jq --raw-output '.StackResourceDetail.PhysicalResourceId'
}
# create base stack
aws cloudformation deploy \
--template-file $TEMPLATE_FILE \
--stack-name $STACK_NAME \
--capabilities CAPABILITY_IAM
# sync dist folder
DEPLOY_PREFIX=deploy-$(date +%s)
aws s3 sync ./tmp/dist s3://$(getResourceId Deployments)/$DEPLOY_PREFIX --delete
# sync website (once we have one)
# aws s3 sync ./tmp/web s3://$(getResourceId StaticWebsite) --delete
DEPLOY_BUCKET=$(getResourceId Deployments)
IMAGE_BUCKET=$(getResourceId ImageBucket)
TEMPLATE_FILE=stack.yaml
STACK_NAME=ImageHost
aws cloudformation deploy \
--template-file $TEMPLATE_FILE \
--stack-name $STACK_NAME \
--capabilities CAPABILITY_IAM \
--parameter-overrides DeployPrefix=$DEPLOY_PREFIX \
DeploymentsBucket=$DEPLOY_BUCKET \
ImageBucket=$IMAGE_BUCKET
You should be able to deploy the whole backend with one command now, which is neat. We now have an endpoint that you can hit to upload an image and get back a link to it! We can’t actually view the uploaded image, though, because we haven’t set an access policy on the S3 bucket we use for hosting. Let’s set up a CloudFront distribution to get around that (and also get nice CDN junk, as well as SSL and a custom domain).
Step 3: Distribute Content (with a Network)
We could make our CloudFront distribution with CloudFormation, but we’ve had plenty of CFN already, and the console is pretty easy for this part. Go to CloudFront in the AWS web console, and click the friendly blue “Create Distribution” button.
We want to create a web distribution. We want an Origin Domain Name of the static website bucket (we’ll make another distribution for the image hosting bucket later). We do want to Restrict Bucket Access, Create a New Identity for the Origin Access Identity, and Yes, Update Bucket Policy. This will allow us to access the contents of the bucket through the CDN. The Default Root Object should probably be index.html
. Maybe you also want to force HTTPS, but that’s optional. Everything else should be good by default.
Once that’s done, we can edit the distribution we just created. We should create another origin for the distribution, which points to the image hosting bucket (do the same restriction of bucket access, create another new identity, and let CloudFront update the bucket policy for you). We should also create a new Behavior, with a path pattern of i/*
, with an origin of the image hosting bucket.
Once all that is done, we should have our subdomain ([keyboard-mashing].cloudfront.net). This will host all content from our static website bucket, unless the path starts with i/*
, in which case it will host content from our images bucket. Here’s an example of the flow of things:
- Browser requests [whatever].cloudfront.net
- CloudFront returns our hosted
index.html
as well as any other hosted files - User uses our web frontend to POST image data to our lambda, which gets uploaded to the image host S3 bucket
- Lambda returns a link to [whatever].cloudfront.net/i/something.png
- CloudFront gets requests for [whatever].cloudfront.net/i/something.png and serves up the requested image, from the image host bucket (because the path starts with
i/*
).
Step 4: Congratulations
We’re done with the backend! We now have infrastructure to host a static website and any number of images, as well as an upload API. Next, we need the web front-end that will allow people to actually use our host.
For extra credit, use CloudFront to allow your API to be accessed from the same .cloudfront.net domain. This will remove the requirement for an OPTIONS endpoint, because same-domain requests (exact domain matches, to be clear) don’t need to be pre-flighted.