Expires headers 是什么?

Expires headers:直接翻译是过期头。

Expires headers 告诉浏览器是否应该从服务器请求一个特定的文件或者是否应该从浏览器的缓存抓住它。

Expires headers 的设计目的是希望使用缓存来减少HTTP requests的数量,从而减少HTTP相应的大小。

Expires headers 中的 Expires 说明了 Expires headers 是有时间限制的,只有在这个指定的时间期限内,浏览器才会从缓存读取数据,而超过这个时间期限,再次访问同一个页面时浏览器还是会向服务器发起  HTTP requests,从服务器端下载页面所需的文件。

It is important to specify one of Expires or Cache-Control max-age, and one of Last-Modified or ETag, for all cacheable resources. It is redundant to specify both Expires and Cache-Control: max-age, or to specify both Last-Modified and ETag.

LeverageBrowserCaching – from Google

为什么要 Add Expires headers?

当你访问一个网站,你的浏览器负责从服务器下载所需的所有文件。这里的下载就是我们前面介绍过的HTTP requests。由于HTTP协议是无状态协议,所以如果不加任何处理的话,浏览器在访问同一个页面时是会反复向服务器请求相同文件的,这样会给服务器带来不必要的下载负担。而随着网页内容变得越来越丰富,所以页面的 HTTP requests 也越来越多。

设置 Expires headers,让浏览器从本地读取已经下载过的缓存文件就会减少很多 HTTP requests 了。当然,页面的加载速度就会快很多。

有两点需要明确:

  1. Add Expires headers 能够减少  HTTP requests,是指的在浏览器再次访问同一个页面(或者再次请求同一个文件)时,浏览器才会从本地读取缓存文件。而用户第一次访问页面时,Expires headers是不起作用的。因此在首次加载Web页面时的 HTTP requests 通常会比再次访问时的 HTTP requests 多。
  2. Expires headers 是有时间限制的,超过了指定的过期时间,浏览器会再次想服务器发出 HTTP requests,而不是读取本地的缓存文件。

Add Expires headers 的规则

There are two aspects to this rule:

  • For static components: implement “Never expire” policy by setting far future Expires header
  • For dynamic components: use an appropriate Cache-Control header to help the browser with conditional requests

我们看到,有两点规则:

  1. 静态文件:落实“永不过期”的原则,通过设置一个足够长的过期时间来实现。
  2. 动态文件:设置一个适当的 Cache-Control 头。怎么才算适当?原则很简单,根据这个动态内容的变更频率做出设置,经常修改的,就设置较短的过期时间,不怎么变动的就设置长一些的过期时间。

YSlow 建议静态文件最短的过期时间为6天。 而 PageSpeed 则认为要设置得更就一些,至少30天。

如何 Add Expires headers?

你需要先确定你要给哪些(这里主要是针对静态文件)文件设置 Expires headers。通常需要缓存的文件有这些:

  • Images: jpg, gif, png
  • favicon/ico
  • JavaScript: js
  • CSS: css
  • Flash: swf
  • PDF: pdf
  • media files:视频,音频文件

HTML 文件不要设置 Expires headers。实际的开发经验告诉我,给HTML文件添加 Expires headers 会带来很多的麻烦。即便你要添加  Expires headers,也尽量设置较短的过期时间。

In general, HTML is not static, and shouldn’t be considered cacheable.

PageSpeed – Leverage browser caching

知道要缓存哪些资源文件后,接着就是预判这些文件的变更频率,设置合适的过期时间。还是前面的原则,变更频繁的 Expire 时间就越短,不怎么变动的就可以设置长的过期时间,也就是落实“永不过期”的原则

Apache 服务器配置 Expires headers

接下来就是在服务器端设置 Expires headers 了,这里以Apache服务器为例,我们在 .htaccess 文件中配置(.htaccess是跟目录下的一个隐藏文件)添加如下代码:

<IfModule mod_expires.c>
# Enable expirations
# 开启 Expires headers
ExpiresActive On 
# Default directive
# 默认的过期时间
ExpiresDefault "access plus 1 month"
Cache-Control max-age=2592000
# My favicon
# 针对 ICON 文件的配置
ExpiresByType image/x-icon "access plus 1 year"
# Images
# 针对图片的配置
ExpiresByType image/gif "access plus 1 month"
ExpiresByType image/png "access plus 1 month"
ExpiresByType image/jpg "access plus 1 month"
ExpiresByType image/jpeg "access plus 1 month"
# CSS
# 针对 CSS 文件的配置
ExpiresByType text/css "access 1 month"
# Javascript
# 针对 JavaScript 文件的配置
ExpiresByType application/javascript "access plus 1 year"
</IfModule>

Ngnix 服务器配置 Expires headers

现实的情况是现在的在服务器端网站的静态文件通常都是通过 Ngnix 服务器处理的。然后通过Ngnix配置反向带代理指向Apache服务器处理动态内容。在Ngnix 服务器的配置代码如下:

# 以下代码加入到ngnix服务器的server区块中
server {
        # cache static files
        location ~* \.(gif|jpe?g|png|ico|swf)$ {
		        # d - 天
				# h - 小时
				# m - 分钟
                expires 168h;
                add_header Pragma public;
                add_header Cache-Control "public, must-revalidate, proxy-revalidate";
        }
		
		# 由于js和css文件需要改动,设置的时间为5分钟
		location ~* \.(css|js)$ {
                expires 5m;
                add_header Pragma public;
                add_header Cache-Control "public, must-revalidate, proxy-revalidate";
        }
}

这里还要另外补充一下 PageSpeed 文档中提及的设置缓存的一些技巧:

Use fingerprinting to dynamically enable caching.

For resources that change occasionally, you can have the browser cache the resource until it changes on the server, at which point the server tells the browser that a new version is available. You accomplish this by embedding a fingerprint of the resource in its URL (i.e. the file path). When the resource changes, so does its fingerprint, and in turn, so does its URL. As soon as the URL changes, the browser is forced to re-fetch the resource. Fingerprinting allows you to set expiry dates long into the future even for resources that change more frequently than that. Of course, this technique requires that all of the pages that reference the resource know about the fingerprinted URL, which may or may not be feasible, depending on how your pages are coded.

Set the Vary header correctly for Internet Explorer.

Internet Explorer does not cache any resources that are served with the Vary header and any fields but Accept-Encoding and User-Agent. To ensure these resources are cached by IE, make sure to strip out any other fields from the Vary header, or remove the Vary header altogether if possible.

Avoid URLs that cause cache collisions in Firefox.

The Firefox disk cache hash functions can generate collisions for URLs that differ only slightly, namely only on 8-character boundaries. When resources hash to the same key, only one of the resources is persisted to disk cache; the remaining resources with the same key have to be re-fetched across browser restarts. Thus, if you are using fingerprinting or are otherwise programmatically generating file URLs, to maximize cache hit rate, avoid the Firefox hash collision issue by ensuring that your application generates URLs that differ on more than 8-character boundaries.

Use the Cache control: public directive to enable HTTPS caching for Firefox.

Some versions of Firefox require that the Cache control: public header to be set in order for resources sent over SSL to be cached on disk, even if the other caching headers are explicitly set. Although this header is normally used to enable caching by proxy servers (as described below), proxies cannot cache any content sent over HTTPS, so it is always safe to set this header for HTTPS resources.

  • Use fingerprinting to dynamically enable caching:fingerprinting 处理是针对偶尔会修改的文件,但不确定什么时候修改的时候采取的处理措施。其实这个处理技巧我觉得最简单的应用就是我前面提到的时间戳的处理技巧。你可以给这个文件设计较长的过期时间,但是你却可以比较频繁的修改,一旦版本确定后,长的过期时间还是会发挥作用。
  • Set the Vary header correctly for Internet Explorer:针对IE浏览器设置 Vary 头。IE 浏览器不会缓存被送达 Vary 头和任何领域的任何资源,但接受Accept-EncodingUser-Agent。所以为了确保IE浏览能够正确缓存资源,应该去掉 Vary 头信息中的其他信息,如果可以干脆就清空Vary 头信息。不过服务器端我们通常设置 Vary:Accept-Encoding,并且是针对文本类型的资源文件。
  • Avoid URLs that cause cache collisions in Firefox:简单的讲,这是针对Firefox对相同URL地址文件只缓存其中一个要注意的问题。在使用fingerprint类似技巧自动命名文件名的时候,生成的文件名一定要超过8个字符长度,避免Firefox重名文件产生相同的hash key。(不过这种情况,我实际开发式还没有遇到过,不过大家可以参考一下 Remove duplicate JavaScript and CSS 规则中介绍的关于重复调用资源的问题。)
  • Use the Cache control: public directive to enable HTTPS caching for Firefox:也是针对Firefox的设置。在 Cache-Control头中添加 public值可以确保Firefox缓存HTTPS协议中请求的资源。实际上在我上面的 Ngnix 服务器中配置时就设置了public值:add_header Cache-Control "public, must-revalidate, proxy-revalidate";

I am not a perfect professional front-end optimization engineer. But after I read through this, I went to google and youtube the keyword and watch these following videos to seek the pratical way to solve this problem:

Moz Explaination Link

Looking at also some youtube comments on the outstanding videos.

I decided to have this final decision:

DON’T CHANGE ANYTHING!