之前搭了一个github的镜像给小伙伴使用,小伙伴表示镜像下载代码速度很香,然后就问有没有什么路子帮他们加快一下微软pdb server的下载,我当时想得很简单,应该就和普通的github反代一样,把流量反代给微软msdl的那个服务器就好了,于是一把梭写了一份配置直接扔上了github镜像同款服务器。直到今天闲得无聊检查日志的时候发现,微软的pdb服务器并不直接提供pdb文件,而是通过返回302跳转到Azure storage blob的域名来实际返回二进制数据,这样的话之前一把梭无脑做的一个简单代理就显得毫无意义了(毕竟文件没有经过我这边服务器‘加速’)。
最简单的办法那当然是nginx去follow 302请求,比如
server {
...
location / {
proxy_pass http://backend;
# You may need to uncomment the following line if your redirects are relative, e.g. /foo/bar
#proxy_redirect / /;
proxy_intercept_errors on;
error_page 301 302 307 = @handle_redirect;
}
location @handle_redirect {
set $saved_redirect_location '$upstream_http_location';
proxy_pass $saved_redirect_location;
}
}
当然,我这人属于闲的蛋疼的那种,本着不折腾白不折腾的原则,研究了一下blob服务器的规律。实际上,msdl最后302的域名格式最终都形如“vsblobprodscussu5shard29.blob.core.windows.net”,其中加粗部分就是存储账号的名字,最后的‘29’是分区的编号,范围1-99。
返回回来的下载链接类似于:
https://vsblobprodscussu5shard29.blob.core.windows.net/b-4712e0edc5a240eabf23330d7df68e77/0F751EB0FD2CCF2FA0E24E4FD662F0D2BF37968BC915CB4E2B2ABD508E226A5600.blob?sv=2019-07-07&sr=b&si=1&sig=BxJ9VQUw8xEBO8iMOW6ktDtXNbXsEmZIiApS2B.......
所以只要想办法把这条302请求rewrite到自己定义的一个目录下,配合正则表达式抓取其中的存储账号名称就可以了,然后在反代的时候带上签名query string请求blob文件。
最终实现的效果
curl 'https://msdl.sunflyer.cn/download/symbols/imm32.pdb/265BCB59F6886A8F9F482A4A4903527B1/imm32.pdb' -vL
* Trying x.x.x.x:443...
* TCP_NODELAY set
* Connected to msdl.sunflyer.cn (x.x.x.x) port 443 (#0)
> GET /download/symbols/imm32.pdb/265BCB59F6886A8F9F482A4A4903527B1/imm32.pdb HTTP/2
> Host: msdl.sunflyer.cn
> user-agent: curl/7.68.0
> accept: */*
>
< HTTP/2 302
< server: nginx
< date: Wed, 01 Jun 2022 13:01:12 GMT
< content-length: 0
< location: https://msdl.sunflyer.cn/msdl/blob/vsblobprodscussu5shard29/b-4712e0edc5a240eabf23330d7df68e77/0F751EB0FD2CCF2FA0E24E4FD662F0D2BF37968BC915CB4E2B2ABD508E226A5600.blob?sv=2019-07-07&sr=b&si=1&sig=vm5C........
< x-cache: TCP_MISS
< x-msedge-ref: Ref A: AB9FDF8E2EC24843B2FC5106804646E2 Ref B: LAXEDGE1612 Ref C: 2022-06-01T13:01:12Z
<
* Connection #0 to host msdl.sunflyer.cn left intact
* Found bundle for host msdl.sunflyer.cn: 0x562d2e23a030 [can multiplex]
* Using Stream ID: 3 (easy handle 0x562d2e2422f0)
> GET /msdl/blob/vsblobprodscussu5shard29/b-4712e0edc5a240eabf23330d7df68e77/0F751EB0FD2CCF2FA0E24E4FD662F0D2BF37968BC915CB4E2B2ABD508E226A5600.blob?sv=2019-07-07&sr=b&si=1&sig=vm5........ HTTP/2
> Host: msdl.sunflyer.cn
> user-agent: curl/7.68.0
> accept: */*
>
< HTTP/2 200
< server: nginx
< date: Wed, 01 Jun 2022 13:01:13 GMT
< content-type: application/octet-stream
< content-length: 241664
< content-language: x-e2eid-8a548e8f-775640f8-92f75c6a-39acad2f-session-0f6f7bb5-6af84688-81185f71-f1a680c2
< last-modified: Thu, 19 Nov 2020 21:38:25 GMT
< accept-ranges: bytes
< etag: "0x8D88CD371ED33BF"
< x-ms-request-id: 33f4d3e6-501e-00e0-1fb7-7570d0000000
< x-ms-version: 2019-07-07
< x-ms-creation-time: Thu, 19 Nov 2020 21:38:25 GMT
< x-ms-lease-status: unlocked
< x-ms-lease-state: available
< x-ms-blob-type: BlockBlob
< x-ms-server-encrypted: true
< access-control-expose-headers: Content-Length
< access-control-allow-origin: *
< x-upstream-server: vsblobprodscussu5shard29.blob.core.windows.net
< x-upstream-path: b-4712e0edc5a240eabf23330d7df68e77/0F751EB0FD2CCF2FA0E24E4FD662F0D2BF37968BC915CB4E2B2ABD508E226A5600.blob
<
* Connection #0 to host msdl.sunflyer.cn left intact
Nginx配置文件如下
location / {
proxy_pass https://msdl.microsoft.com;
proxy_http_version 1.1;
set $addr $remote_addr;
proxy_set_header X-Forwarded-For $addr;
proxy_set_header Connection '';
# 这一行将msdl服务器返回的302重定向到我自己的/msdl/blob/<存储账户名>/<文件路径> ,然后再实际请求对应的文件
proxy_redirect ~https://(vsblobprodscussu5shard([0-9]+)).blob.core.windows.net/(?<filepath>.*) /msdl/blob/\/$filepath;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header Host msdl.microsoft.com;
proxy_ssl_name msdl.microsoft.com;
proxy_ssl_server_name on;
}
# 实际的反代请求到对应的blob目录
location ~ /msdl/blob/(vsblobprodscussu5shard([0-9]+))/(?<filepath>.*) {
set $blobhost $1.blob.core.windows.net;
add_header X-Upstream-Server $blobhost 'always';
add_header X-Upstream-Path $filepath 'always';
# 需要带上 $args (query string包含blob签名,否则请求会404)
proxy_pass https://$blobhost:443/$filepath$is_args$args;
proxy_set_header X-Forwarded-For $addr;
proxy_set_header Connection '';
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header Host $blobhost;
proxy_ssl_name $blobhost;
proxy_ssl_server_name on;
}
你问我这么做的好处都有啥?没有,其实就是闲的蛋疼折腾一下(草),这个方案比较浪费性能(因为正则表达式解析的问题),所以如果不是跟我一样闲的蛋疼的话建议直接nginx里面follow 302就好了。(大草)
哦当然,如果你有需要的话可以拿这个镜像去用,地址是 https://msdl.sunflyer.cn,替换 https://msdl.microsoft.com 就好了。比如常见的地址应该是 https://msdl.microsoft.com/download/symbols,那就改成 https://msdl.sunflyer.cn/download/symbols