An Image Host of Your Own, Part 1

The result of this post is live and running at img.olafalo.net. Check it out! (Please excuse my front-end.)

Intro

A few years ago, starting an image host was a daunting task. The first few users might not have been too hard, but scaling was a problem. How do you reliably serve images to millions of users?

The answer these days is, of course, to let someone else do it. AWS S3 is brilliant for this. You can generally stuff whatever you want into an S3 bucket, and anyone can reliably access it. Add on tools like CloudFront, and now anyone can reliably access it quickly.

In this post, we’ll create a backend for the simplest MVP image host possible. All it will do is allow users to upload an image, then get back a sharable link to that image. The cool part, though, is that this setup could theoretically scale to millions of global users at any time without breaking a sweat, because all the technology we’ll be using (mostly S3, Lambda, and CloudFront) scales very well.

Step 0: Setup

If you want to follow along, you’ll need an AWS account. You’ll probably also want to set up the AWS CLI tool. Some background knowledge would be helpful too, but I’ll explain the basics as we go.

Step 1: S3

I’m going to create 3 S3 buckets:

Deployments: A private bucket which we use for lambda deployments, more on that later
StaticWebsite: A public bucket that’s set up as a website, to host the static part of our website (index.html and such)
ImageBucket: The bucket that the actual images will go into

I make an effort to use CloudFormation for everything, because “make an image host without writing any code!” sounds much better than it actually is, and I didn’t want to spend an entire day taking screenshots of the AWS console. Here’s basestack.yaml, which describes a CFN stack with those three buckets:

Resources:
  Deployments:
    Type: AWS::S3::Bucket
    Properties:
      AccessControl: Private
  StaticWebsite:
    Type: AWS::S3::Bucket
    Properties:
      AccessControl: PublicRead
  ImageBucket:
    Type: AWS::S3::Bucket
    Properties:
      AccessControl: PublicRead

To deploy this stack, we can use the aws CLI tool:

aws cloudformation deploy template-file basestack.yaml --stack-name BaseStack --capabilities CAPABILITY_IAM

The --capabilities flag may not be necessary in this case, but you generally need it when you’re changing permissions. The docs have this to say:

Some stack templates might include resources that can affect permissions in your AWS account, for example, by creating new AWS Identity and Access Management (IAM) users. For those stacks, you must explicitly acknowledge their capabilities by specifying this parameter. The only valid values are CAPABILITY_IAM and CAPABILITY_NAMED_IAM.

If the deployment went well, congrats! You have some S3 buckets. I recommend creating a shell script or two for building and deploying, once we have something to build.

Step 2: “Serverless” Just Means Someone Else’s Servers

To allow image uploads, we’re going to create a lambda function that has upload access to our ImageBucket. We’re also going to create an API Gateway endpoint that allows anyone to send image information to our lambda.

Lambdas can be written in several languages, but I went with Go for this one. I’m a big fan of Go; it’s quick, predictable, and there’s no need to sprinkle async keywords everywhere or to put your entire lambda in one long promise chain. Anyway, here’s upload.go:

package main

var awsSession *session.Session

// Get the name of the upload bucket as a config var
var bucketName = os.Getenv("UPLOAD_BUCKET")

// getSession returns a new session if one did not already exist, and the
// existing session if it had already been created.
func getSession() *session.Session {
	if awsSession != nil {
		return awsSession
	}
	// Initialize a session that the SDK will use to load configuration,
	// credentials, and region from the shared config file. (~/.aws/config).
	awsSession := session.Must(session.NewSessionWithOptions(session.Options{
		SharedConfigState: session.SharedConfigEnable,
	}))
	return awsSession
}

// Lambda input structure
type imageEvent struct {
	Image64   string // image contents in base64
	Extension string // file extension, e.g. "png"
}

// Lambda output structure
type uploadResult struct {
	URL string
}

// The lambda function itself
func uploadImage(ctx context.Context, event imageEvent) (uploadResult, error) {
	uploader := s3manager.NewUploader(getSession())
	image, err := base64.StdEncoding.DecodeString(event.Image64)
	if err != nil {
		return uploadResult{}, err
	}
	// Create a key for the image to store in S3, a UUID with an extension
	imageID := uuid.NewV4().String()
	imageKey := fmt.Sprintf("i/%s.%s", imageID, event.Extension)
	imageReader := bytes.NewReader(image)
	uploadParams := &s3manager.UploadInput{
		Bucket: &bucketName,
		Key:    &imageKey,
		Body:   imageReader,
		// I know, I know. We could definitely find a better way to set the
		// content-type, but this works well enough as an MVP.
		ContentType: aws.String(fmt.Sprintf("image/%s", event.Extension)),
	}
	// Do the upload and return its URL
	_, err = uploader.Upload(uploadParams)
	if err != nil {
		return uploadResult{}, err
	}
	return uploadResult{URL: fmt.Sprintf("%s", imageKey)}, nil
}

func main() {
	lambda.Start(uploadImage)
}

I saved this file as upload.go in a directory called image-upload. To build this for use in a lambda, we have to do something like this:

mkdir -p tmp/bin
GOOS=linux GOARCH=amd64 go build -o tmp/bin/image-upload ./image-upload
mkdir -p tmp/dist
zip -j tmp/dist/image-upload.zip tmp/bin/image-upload

This gives us tmp/dist/image-upload.zip, which we can use as the code for the lambda.

Now we need another CloudFormation stack, which will contain the lambda and the API Gateway endpoint. The resources we need are:

The lambda itself
A role for the lambda that allows uploading to S3
A permission that allows API Gateway to invoke the lambda
An API Gateway REST API
A POST method for that API (the upload endpoint)
An OPTIONS method for the API (so that we don’t run into CORS issues)
An API Gateway deployment for our API

Yikes. We also need a few parameters, so that we know the names of our S3 buckets, and so that we can have a deployment prefix that updates each time we deploy. Putting it all together, this is what we get, stack.yaml:

Resources:
  S3UploadRole:
    Type: AWS::IAM::Role
    Properties:
      AssumeRolePolicyDocument:
        Version: 2012-10-17
        Statement:
          -
            Effect: Allow
            Principal:
              Service:
                - lambda.amazonaws.com
            Action:
              - sts:AssumeRole
      Path: /
      Policies:
        -
          PolicyName: S3UploadPolicy
          PolicyDocument:
            Version: 2012-10-17
            Statement:
              -
                Effect: Allow
                Action: s3:*
                Resource: "*"
  ImageUploadLambda:
    Type: AWS::Lambda::Function
    Properties:
      Handler: image-upload
      Runtime: go1.x
      Environment:
        Variables:
          UPLOAD_BUCKET: !Ref ImageBucket
      Role: !GetAtt S3UploadRole.Arn
      Code:
        S3Bucket: !Ref DeploymentsBucket
        S3Key:
          Fn::Join:
            - "/"
            -
              - !Ref DeployPrefix
              - image-upload.zip
  ImageUploadLambdaPermission:
    Type: AWS::Lambda::Permission
    Properties:
      Action: lambda:InvokeFunction
      FunctionName: !Ref ImageUploadLambda
      Principal: apigateway.amazonaws.com
      SourceArn: !Sub "arn:aws:execute-api:${AWS::Region}:${AWS::AccountId}:${UploadAPI}/*/POST/*"
    DependsOn:
      - ImageUploadLambda
      - UploadAPI
  UploadAPI:
    Type: AWS::ApiGateway::RestApi
    Properties:
      Name: UploadAPI
      BinaryMediaTypes:
        - image/jpeg
        - image/png
        - image/gif
        - image/webp
      EndpointConfiguration:
        Types:
          - EDGE
  UploadAPIPost:
    Type: AWS::ApiGateway::Method
    Properties:
      AuthorizationType: NONE
      HttpMethod: POST
      ResourceId:
        Fn::GetAtt:
          - UploadAPI
          - RootResourceId
      RestApiId: !Ref UploadAPI
      MethodResponses:
          -
            StatusCode: 200
            ResponseModels:
              application/json: 'Empty'
            ResponseParameters:
                method.response.header.Access-Control-Allow-Headers: false
                method.response.header.Access-Control-Allow-Methods: false
                method.response.header.Access-Control-Allow-Origin: false
          -
            StatusCode: 500
            ResponseModels:
              application/json: 'Empty'
            ResponseParameters:
                method.response.header.Access-Control-Allow-Headers: false
                method.response.header.Access-Control-Allow-Methods: false
                method.response.header.Access-Control-Allow-Origin: false
      Integration:
        Type: AWS
        IntegrationHttpMethod: POST
        IntegrationResponses:
          -
            StatusCode: 200
            ResponseParameters:
              method.response.header.Access-Control-Allow-Headers: "'Content-Type,X-Amz-Date,Authorization,X-Api-Key,X-Amz-Security-Token'"
              method.response.header.Access-Control-Allow-Methods: "'POST,OPTIONS'"
              method.response.header.Access-Control-Allow-Origin: "'*'"
          -
            StatusCode: 500
            SelectionPattern: .+
            ResponseParameters:
              method.response.header.Access-Control-Allow-Headers: "'Content-Type,X-Amz-Date,Authorization,X-Api-Key,X-Amz-Security-Token'"
              method.response.header.Access-Control-Allow-Methods: "'POST,OPTIONS'"
              method.response.header.Access-Control-Allow-Origin: "'*'"
        Uri:
          Fn::Join:
            - ""
            -
              - "arn:aws:apigateway:"
              - !Ref AWS::Region
              - ":lambda:path/2015-03-31/functions/"
              - !GetAtt ImageUploadLambda.Arn
              - "/invocations"
  UploadAPIOptions:
    Type: AWS::ApiGateway::Method
    Properties:
      AuthorizationType: NONE
      RestApiId: !Ref UploadAPI
      ResourceId: !GetAtt UploadAPI.RootResourceId
      HttpMethod: OPTIONS
      Integration:
        IntegrationResponses:
        - StatusCode: 200
          ResponseParameters:
            method.response.header.Access-Control-Allow-Headers: "'Content-Type,X-Amz-Date,Authorization,X-Api-Key,X-Amz-Security-Token'"
            method.response.header.Access-Control-Allow-Methods: "'POST,OPTIONS'"
            method.response.header.Access-Control-Allow-Origin: "'*'"
          ResponseTemplates:
            application/json: ''
        PassthroughBehavior: WHEN_NO_MATCH
        RequestTemplates:
          application/json: '{"statusCode": 200}'
        Type: MOCK
      MethodResponses:
      - StatusCode: 200
        ResponseModels:
          application/json: 'Empty'
        ResponseParameters:
            method.response.header.Access-Control-Allow-Headers: false
            method.response.header.Access-Control-Allow-Methods: false
            method.response.header.Access-Control-Allow-Origin: false
  UploadAPIDeployment:
    Type: AWS::ApiGateway::Deployment
    Properties:
      RestApiId: !Ref UploadAPI
      StageName: v0
    DependsOn: UploadAPIPost
Parameters:
  DeployPrefix:
    Type: String
    Default: "0"
  DeploymentsBucket:
    Type: String
  ImageBucket:
    Type: String

I’ve never heard anyone call CloudFormation “terse”.

Anyway, here’s my deploy.sh that I used to deploy the whole backend in one go:

#!/bin/bash
TEMPLATE_FILE=basestack.yaml
STACK_NAME=BaseStack

function getResourceId {
    aws cloudformation describe-stack-resource \
        --stack-name $STACK_NAME \
        --logical-resource-id $1 \
        | jq --raw-output '.StackResourceDetail.PhysicalResourceId'
}

# create base stack
aws cloudformation deploy \
    --template-file $TEMPLATE_FILE \
    --stack-name $STACK_NAME \
    --capabilities CAPABILITY_IAM

# sync dist folder
DEPLOY_PREFIX=deploy-$(date +%s)
aws s3 sync ./tmp/dist s3://$(getResourceId Deployments)/$DEPLOY_PREFIX --delete

# sync website (once we have one)
# aws s3 sync ./tmp/web s3://$(getResourceId StaticWebsite) --delete

DEPLOY_BUCKET=$(getResourceId Deployments)
IMAGE_BUCKET=$(getResourceId ImageBucket)
TEMPLATE_FILE=stack.yaml
STACK_NAME=ImageHost

aws cloudformation deploy \
    --template-file $TEMPLATE_FILE \
    --stack-name $STACK_NAME \
    --capabilities CAPABILITY_IAM \
    --parameter-overrides DeployPrefix=$DEPLOY_PREFIX \
        DeploymentsBucket=$DEPLOY_BUCKET \
        ImageBucket=$IMAGE_BUCKET

You should be able to deploy the whole backend with one command now, which is neat. We now have an endpoint that you can hit to upload an image and get back a link to it! We can’t actually view the uploaded image, though, because we haven’t set an access policy on the S3 bucket we use for hosting. Let’s set up a CloudFront distribution to get around that (and also get nice CDN junk, as well as SSL and a custom domain).

Step 3: Distribute Content (with a Network)

We could make our CloudFront distribution with CloudFormation, but we’ve had plenty of CFN already, and the console is pretty easy for this part. Go to CloudFront in the AWS web console, and click the friendly blue “Create Distribution” button.

We want to create a web distribution. We want an Origin Domain Name of the static website bucket (we’ll make another distribution for the image hosting bucket later). We do want to Restrict Bucket Access, Create a New Identity for the Origin Access Identity, and Yes, Update Bucket Policy. This will allow us to access the contents of the bucket through the CDN. The Default Root Object should probably be index.html. Maybe you also want to force HTTPS, but that’s optional. Everything else should be good by default.

Once that’s done, we can edit the distribution we just created. We should create another origin for the distribution, which points to the image hosting bucket (do the same restriction of bucket access, create another new identity, and let CloudFront update the bucket policy for you). We should also create a new Behavior, with a path pattern of i/*, with an origin of the image hosting bucket.

Once all that is done, we should have our subdomain ([keyboard-mashing].cloudfront.net). This will host all content from our static website bucket, unless the path starts with i/*, in which case it will host content from our images bucket. Here’s an example of the flow of things:

Browser requests [whatever].cloudfront.net
CloudFront returns our hosted index.html as well as any other hosted files
User uses our web frontend to POST image data to our lambda, which gets uploaded to the image host S3 bucket
Lambda returns a link to [whatever].cloudfront.net/i/something.png
CloudFront gets requests for [whatever].cloudfront.net/i/something.png and serves up the requested image, from the image host bucket (because the path starts with i/*).

Step 4: Congratulations

We’re done with the backend! We now have infrastructure to host a static website and any number of images, as well as an upload API. Next, we need the web front-end that will allow people to actually use our host.

For extra credit, use CloudFront to allow your API to be accessed from the same .cloudfront.net domain. This will remove the requirement for an OPTIONS endpoint, because same-domain requests (exact domain matches, to be clear) don’t need to be pre-flighted.