CDN Review
This is an overview and analysis of three different CDN’s (content delivery network or content distribution network).
– Cloudfront
– Cloudflare
– Fastly
For each CDN, answer the following questions:
– what is cached in the CDN
– how long it is cached the CDN
– how long it is cached in the browser
– Load Balancing or sticky rules
– pricing
– interesting facts about how that particular CDN works
Also briefly discuss how web servers (Nginx, Apache) and browsers (Firefox, and Chrome) deal with caching.
– what is cached in the browser
– how long it is cached in the browser
– how do you configure the web server caching settings
CDN’s, at least as much and perhaps even more than other technologies, contains Alice in Wonderland rabbit holes. On the surface, the concept is incredibly simple, or seems so. The CDN caches everything. It distributes it to the edge. It’s a one-click enable. You don’t need to know anything about a CDN. Just turn it on. It works. And to some extent, this is even true. However once you get into tuning a CDN, and customizing it, the level of detail is quite complex.
AWS CLOUDFRONT
WHAT IS CACHED:
“By default, the response to GET and HEAD methods will be cached at CloudFront edge locations. You may choose to configure your Amazon CloudFront distribution to cache the response for the OPTIONS request. Other HTTP methods (POST, PUT, DELETE, and PATCH) are not cached and simply proxied to the origin by Amazon CloudFront edge locations. ”
This means that static content (GET method) is served from the cache.
Dynamic content (POST method) is sent to the origin.
Dynamic content (GET method) must be evaluated and customized as necessary. By default cookies, query strings, and headers, are all blocked. Actually, this greatly limits serving dynamic content. Those should be enabled for specific sections within a website which utilize those features. At the same time, enabling cookies globally would greatly impede caching from working properly.
HOW LONG IS IT CACHED:
“By default, each object stays in an edge location for 24 hours before it expires.”
“Amazon CloudFront uses the expiration period you set on your files (through cache control headers) to determine whether it needs to check the origin for an updated version of the file. ”
The answer to this seems fairly clear. Let’s review these in order from the simplest to the more involved case.
– Case: Nothing special is configured on either the origin or CF. Then CF will cache the object for the default TTL of 24 hrs.
– Case: The Cloudfront default TTL on CF may be adjusted, for example 1 hr instead of 24 hrs.
– Case: The origin (such as Apache or Nginx) specifies Cache-Control headers for some objects. The Cache-Control headers take precedence over default TTL.
– Case: Cloudfront minimum or maximum TTL’s are set. These will override all other settings, in terms of setting absolute min or max values. Typically though, the Default TTL will take effect, or Cache-Control headers if specified.
HOW LONG IS IT CACHED IN THE BROWSER:
Cloudfront will pass the Cache-Control headers back to the browser. See section on browsers later.
LOAD BALANCING OR STICKY RULES:
Load balance the servers behind a ELB. Cloudfront itself does not have Load Balancing.
PRICING:
no minimum fee. pay per TB, $0.085/TB to start.
GENERAL INFO:
– you can add CNAME’s, alternate names, the default names are like d111111abcdef8.cloudfront.net
– Amazon Cloudfront marketing literature makes a big deal about now “supporting” dynamic content. In fact,
– they just mean proxying dynamic requests back to the origin server
– most other CDN competitors in the market also do this
– to add security, create an “origin access identity”, give it access to bucket, remove read-all.
– “you must have at least as many cache behaviors (including the default cache behavior) as you have origins.”
– supports geolocation restrictions
– Cloudfront has many features regarding streaming video and RTMP video files
– invalidations are not the recommended Cloudfront strategy. As an alternative “…we recommend that you primarily use object versioning…”
– “If you configure CloudFront to forward query strings to your origin, CloudFront will include the query string portion of the URL when caching the object.”
– “If you configure CloudFront to forward cookies to your origin, CloudFront caches based on cookie values.”
– query string isn’t ordered, which is bad and results in more caching than necessary.
– the cache is case sensitive
– “If you configure CloudFront to forward all headers to your origin, CloudFront doesn’t cache the objects associated with this cache behavior. Instead, it sends every request to the origin.”
– you can set a DefaultRootObject, which is like a Document Root index page for an S3 website.
– CF can be set to compress files
– CF automatically adds X-Forward-For header.
– CF http headers (removing them, adding them) is complicated. A 47-row table of different headers is included in the Users Manual.
– The handling of User-Agent is customized in CF. User-Agent = Amazon CloudFront. The app will need to be modified to parse headers such as CloudFront-Is-Desktop-Viewer instead, if it depends on User-Agent.
– query strings can be all, or nothing, or a whitelist.
– cookies can be all, or nothing, or a whitelist.
– Amazon Certificate Manager is free
– host header “CloudFront sets the value to the domain name of the origin that is associated with the requested object.”
– SIGNED URL’s:
A question you may have when reading about Cloudfront signed URL’s is – what do they protect against, and what don’t they protect against? What are they really good for, and what aren’t they for?
A signed URL will not prevent someone from downloading a file, and then turning around and re-sharing the file elsewhere.
A signed URL will not prevent someone from quickly re-distributing the supposedly secret signed URL itself, as long as it’s within the window of time while the URL is active. (unless the signing policy also protects via IP address range)
A signed URL scheme will prevent easy browsing of all your web files.
A signed URL scheme will prevent simple sharing and linking of URL’s that aren’t signed.
– leveraging the CDN to support multiple distinct origins, creates a dependency on the CDN.
– The most interesting conundrum in Cloudfront is the fact that (query strings, cookies, headers) must be configured. Leave them off, then dynamic content is broken. Turn them on, and then caching could be broken.
A solution for a somewhat dynamic site:
(query strings:ON, cookies:ON, headers:OFF) just for a subsection of pages, such as admin or shopping cart
(query strings:ON, cookies:OFF, headers:OFF) most of the site. Send a no-cache http header from the app for more dynamic pages.
CLOUDFLARE
WHAT IS CACHED:
In order from higher precedence to lower precedence.
– If the Cache-Control header is set to “private”, “no-store”, “no-cache”, or “max-age=0”, or if there is a cookie in the response, then CloudFlare will not cache the resource.
– POST requests are passed to the origin
– a “don’t cache” pagerule will skip the cache
– a “cache everything” pagerule is a way to cache html
– by default, cloudflare caches static assets (css, js, jpg, etc) and not html
HOW LONG IS IT CACHED:
The default is 2 hours. More details:
In order from higher precedence to lower precedence.
– a pagerule with edge cache ttl
– Cache control header with s-maxage, which take precedence over max-age in a CDN. Cache-Control: s-maxage=200, max-age=60
– Cache control header with max-age. Cache-Control: max-age=60
– CloudFlare will cache static content by default. We will default to the following TTL depending on the return code:
200 301 120m;
302 303 20m;
403 1m;
404 10m;
any 0s;
HOW LONG IS IT CACHED IN THE BROWSER:
Default is 4hrs, set by Cloudflare, and sent to the browser.
In order from higher precedence to lower precedence.
– Browser Cache TTL pagerule or general Caching rule (if it is longer than the origin’s). CloudFlare will replace the Cache-Control header added by the origin server only if the Browser Cache TTL is longer than the TTL in the Cache-Control header set by the origin.
– origin server’s Cache control header with max-age. Cache-Control: max-age=60
– Browser Cache TTL pagerule or general Caching rule (if it is shorter than the origin’s).
– no browser caching time. (actually, the browser will invent it’s own rules. see section on browsers later.)
LOAD BALANCING OR STICKY RULES:
Load balance the servers behind an ELB or dedicated LB.
Cloudfront is developing their own LB solution at the time of this writing.
PRICING:
free plan available. Free, Pro, Business, Enterprise.
GENERAL INFO:
– cname flattening, allows a root level DNS CNAME.
– how do cloudflare certs work? Subject Alternative Name field has 32 entries, 16 for wildcard, 16 for original DNS. Cloudflare is configuring your domain onto it’s certificate with many other customers.
– query strings – 3 caching levels. No query string, ignore query string, Standard. The standard setting is recommended.
This is a simple and clear answer to query strings. You would want to include them in the cache.
Query strings should be enabled, contrary to what Cloudfront does. This is a contraction. Cloudflare recommends “Standard”,
and Cloudfront defaults to “Off”. Why? Cloudfront has a long history of serving static content from S3, where
accident query strings serve very little purpose.
– cookies – in the default realm, where Cloudflare doesn’t cache dynamic content, then it should just pass the cookie back to the origin.
In the Enterprise Version setup, where you “Cache Everything”, and then “Bypass Cache on Cookie”, it again gets sent back to the origin. It is not cached.
Cloudfront either has cookies turned off, by default, or makes the cookie into part of the cache key, which would cause too much caching.
There is no reason to think Cloudflare automatically makes a cookie part of the cache key. So, having a cookie in a request shouldn’t break static assets.
– headers – “Currently, the only vary header that we support is the accept-encoding header.”
Also, “headers are passed through from the client to the server.” Although for a cache which is by definition serving the same content to many visitors, it could just as well to block the headers, than to send the headers, retrieve the content, and then ignore headers for subsequent cache hits. Cloudfront blocks headers by default.
– The default setting is to not cache html, and thus dynamic content is always sent to the origin. An optimization for business customers, set Cache Everything to also cache dynamic html, then set “Bypass Cache on Cookie” so that customized pages will get through the cache. At the top of the Page Rule’s list, add caching for static assets, or they would also be bypassed.
FASTLY
WHAT IS CACHED:
“Default caching behavior of HTTP verbs: By default, the results of GET requests are cached. HEAD requests are not proxied as is, but are handled locally if an object is in cache or a GET is done to the backend to get the object into the cache. Anything other than HEAD or GET requests are proxied and not cached by default.”
https://docs.fastly.com/guides/debugging/using-get-instead-of-head-for-command-line-caching-tests
POST requests are passed to the origin.
“Fastly caches the following response status codes by default. In addition to these statuses, you can force an object to cache under other states using conditions and responses.” 200 203 300 301 302 404 410
https://docs.fastly.com/guides/about-fastly-services/http-status-codes-cached-by-default
“If you don’t want certain resources cached, set the header as follows: Cache-Control: private”
If there is a cookie in the Response it is not cached.
A Cache Setting can prevent items from being cached.
https://docs.fastly.com/guides/performance-tuning/controlling-caching
HOW LONG IS IT CACHED:
In Settings, Fallback TTL is one hour.
If Cache-Control, Surrogate-Control, or Expires has been set by the Origin, that is maintained.
HOW LONG IS IT CACHED IN THE BROWSER:
Browsers will observe the http headers they receive. Therefore, caching time can be set at the origin, or since Fastly allows manipulation of the headers via the UI, that would be the way to set a browser specific TTL such as “Cache-Control: max-age=86400” during a reply.
LOAD BALANCING OR STICKY RULES:
No sticky sessions by default. You would have to switch from a “random director” to a “client director”. It will make a choice based on client.identity.
From a chat session: ”
the default director and the only one you can activate via the UI is the random one
So if you want to implement sticky load balancing you’d have to create a new ‘client’ director
by default that would use the client IP
https://docs.fastly.com/api/config#director
> type integer What type of load balance group to use. Integer, 1 to 4. Values: 1=random, 2=round-robin, 3=hash, 4=client. (default: 1)
”
PRICING:
minimum of $50 per month. Based on bandwidth and requests.
GENERAL INFO:
– Fastly is based on Varnish.
– “Fastly’s Varnish is based on Varnish 2.1 and our Varnish syntax is specifically compatible with Varnish 2.1.5.”
– They have a free SSL option, however it’s usage would be somewhat restrictive because the domain must be “https://.global.ssl.fastly.net/”. Not what you’d expect such as www.example.com.
– “To order TLS service, contact customer support via email at [email protected].” TLS (SSL) is very common, it would be nice if this were automated.
– Fastly has a gzip setting and many others.
– Fastly let’s you set headers, as with Varnish, of course.
– “Create a Request Setting” can be used to force a “Pass”. i.e. avoid caching.
– Query strings: they are cached. As with Standard on Cloudflare. https://docs.fastly.com/guides/performance-tuning/making-query-strings-agnostic
“Under normal circumstances, Fastly would consider these URLs different objects that are cached separately: http://example.com http://example.com?asdf=asdf http://example.com?asdf=zxcv”
– Cookies:
From an article: “Fastly’s default VCL is different (from Varnish) and does not pass for Cookie headers.” So, requests with cookies are cached.
If a cookie is sent from the origin, don’t cache:
if (beresp.http.Set-Cookie) {
set req.http.Fastly-Cachetype = “SETCOOKIE”;
return(pass);
}
Fastly does not use cookies as part of the cache key, by default, so there would be no reason to strip them out of a request to improve caching. Code similar to this would skip caching on cookies “if (req.http.Cookie) { return(pass); }” That could be implemented in the UI, rather than VCL, with “Create a request setting”. Add a condition for req.http.Cookie. Also add a condition for interesting URL path’s. Then action is Pass.
– Headers. Both request and response headers can be added, removed, changed, etc.
– Controlling caching from the origin: “Cache-Control: no-cache, no-store” are caching directives to influence browser behavior, not the caching layer. “Cache-Control: private” prevents caching in the CDN.
APACHE and NGINX
There are two sets of features which affect caching:
1. ( Cache-Control: max-age header + Expires: header )
2. ( Etags header + Last Modified header )
Cache-Control is newer and takes precedence over Expires, so should be preferred. Within Cache-Control, you have max-age and s-maxage. The first one, max-age, will specify the cache setting for both CDN’s and browsers. However, if the CDN value should be different, then set s-maxage, and it will be used by the CDN instead.
While you may expect that “Cache-Control: max-age” would be a primary factor in determining browser caching, it turns out that when Etags are present, the browser often ignores the max-age setting and chooses to re-validate the object every time based on the Etag.
Enabling NGINX Cache-Control:
expires 24h;
Enabling APACHE Cache-Control:
a2enmod expires
ExpiresActive on
ExpiresDefault “access plus 1 day”
Both Chrome and Firefox appear to respect these settings, at first. However on page reload, Firefox persists in re-checking the files with the server. Due to Etags the server returns 304 Not Modified, so there is a bandwidth savings, but still not full cached behavior.
Chrome will serve static assets from cache, but re-check html and receive 304 Not Modified.
If no expiration time is set, Firefox will invent one, based on last modified time. For example, 10% of the time since last modified. But again, the expiration time on the browser side doesn’t seem to be an important factor anymore, if Etags cause revalidation at every reload.
Enabling NGINX Etags:
“All recent versions of Nginx (as of 2016) will automatically set these…. the response headers Etag + Last-Modified headers will be returned.”
Both Etag and Last-Modified are sent from the server. Both Etag and Last-Modified are returned by the client. Every page refresh checks both Etag and Last-Modified. A change in either one will cause a resend of the content.
Enabling APACHE Etags:
“Support for ETag is part of the Apache Core and it is enabled by default.” However in Apache 2.4 Etags are broken by default, because the deflate module appends -gzip to the etag, and all requests result in 200 instead of 304 Not Modified. Apache 2.5 includes a fix for this issue.