基于Nginx反向代理的微软调试符号服务器(msdl pdb server)镜像

之前搭了一个github的镜像给小伙伴使用,小伙伴表示镜像下载代码速度很香,然后就问有没有什么路子帮他们加快一下微软pdb server的下载,我当时想得很简单,应该就和普通的github反代一样,把流量反代给微软msdl的那个服务器就好了,于是一把梭写了一份配置直接扔上了github镜像同款服务器。直到今天闲得无聊检查日志的时候发现,微软的pdb服务器并不直接提供pdb文件,而是通过返回302跳转到Azure storage blob的域名来实际返回二进制数据,这样的话之前一把梭无脑做的一个简单代理就显得毫无意义了(毕竟文件没有经过我这边服务器‘加速’)。

最简单的办法那当然是nginx去follow 302请求,比如

server {
    ...

    location / {
        proxy_pass http://backend;
        # You may need to uncomment the following line if your redirects are relative, e.g. /foo/bar
        #proxy_redirect / /;
        proxy_intercept_errors on;
        error_page 301 302 307 = @handle_redirect;
    }

    location @handle_redirect {
        set $saved_redirect_location '$upstream_http_location';
        proxy_pass $saved_redirect_location;
    }
}

当然,我这人属于闲的蛋疼的那种,本着不折腾白不折腾的原则,研究了一下blob服务器的规律。实际上,msdl最后302的域名格式最终都形如“vsblobprodscussu5shard29.blob.core.windows.net”,其中加粗部分就是存储账号的名字,最后的‘29’是分区的编号,范围1-99。

返回回来的下载链接类似于:

https://vsblobprodscussu5shard29.blob.core.windows.net/b-4712e0edc5a240eabf23330d7df68e77/0F751EB0FD2CCF2FA0E24E4FD662F0D2BF37968BC915CB4E2B2ABD508E226A5600.blob?sv=2019-07-07&sr=b&si=1&sig=BxJ9VQUw8xEBO8iMOW6ktDtXNbXsEmZIiApS2B.......

所以只要想办法把这条302请求rewrite到自己定义的一个目录下,配合正则表达式抓取其中的存储账号名称就可以了,然后在反代的时候带上签名query string请求blob文件。

最终实现的效果

curl 'https://msdl.sunflyer.cn/download/symbols/imm32.pdb/265BCB59F6886A8F9F482A4A4903527B1/imm32.pdb' -vL
*   Trying x.x.x.x:443...
* TCP_NODELAY set
* Connected to msdl.sunflyer.cn (x.x.x.x) port 443 (#0)
> GET /download/symbols/imm32.pdb/265BCB59F6886A8F9F482A4A4903527B1/imm32.pdb HTTP/2
> Host: msdl.sunflyer.cn
> user-agent: curl/7.68.0
> accept: */*
> 

< HTTP/2 302 
< server: nginx
< date: Wed, 01 Jun 2022 13:01:12 GMT
< content-length: 0
< location: https://msdl.sunflyer.cn/msdl/blob/vsblobprodscussu5shard29/b-4712e0edc5a240eabf23330d7df68e77/0F751EB0FD2CCF2FA0E24E4FD662F0D2BF37968BC915CB4E2B2ABD508E226A5600.blob?sv=2019-07-07&sr=b&si=1&sig=vm5C........
< x-cache: TCP_MISS
< x-msedge-ref: Ref A: AB9FDF8E2EC24843B2FC5106804646E2 Ref B: LAXEDGE1612 Ref C: 2022-06-01T13:01:12Z
< 
* Connection #0 to host msdl.sunflyer.cn left intact

* Found bundle for host msdl.sunflyer.cn: 0x562d2e23a030 [can multiplex]
* Using Stream ID: 3 (easy handle 0x562d2e2422f0)
> GET /msdl/blob/vsblobprodscussu5shard29/b-4712e0edc5a240eabf23330d7df68e77/0F751EB0FD2CCF2FA0E24E4FD662F0D2BF37968BC915CB4E2B2ABD508E226A5600.blob?sv=2019-07-07&sr=b&si=1&sig=vm5........ HTTP/2
> Host: msdl.sunflyer.cn
> user-agent: curl/7.68.0
> accept: */*
> 
< HTTP/2 200 
< server: nginx
< date: Wed, 01 Jun 2022 13:01:13 GMT
< content-type: application/octet-stream
< content-length: 241664
< content-language: x-e2eid-8a548e8f-775640f8-92f75c6a-39acad2f-session-0f6f7bb5-6af84688-81185f71-f1a680c2
< last-modified: Thu, 19 Nov 2020 21:38:25 GMT
< accept-ranges: bytes
< etag: "0x8D88CD371ED33BF"
< x-ms-request-id: 33f4d3e6-501e-00e0-1fb7-7570d0000000
< x-ms-version: 2019-07-07
< x-ms-creation-time: Thu, 19 Nov 2020 21:38:25 GMT
< x-ms-lease-status: unlocked
< x-ms-lease-state: available
< x-ms-blob-type: BlockBlob
< x-ms-server-encrypted: true
< access-control-expose-headers: Content-Length
< access-control-allow-origin: *
< x-upstream-server: vsblobprodscussu5shard29.blob.core.windows.net
< x-upstream-path: b-4712e0edc5a240eabf23330d7df68e77/0F751EB0FD2CCF2FA0E24E4FD662F0D2BF37968BC915CB4E2B2ABD508E226A5600.blob
< 

* Connection #0 to host msdl.sunflyer.cn left intact

Nginx配置文件如下

location / {
		proxy_pass https://msdl.microsoft.com;
		proxy_http_version 1.1;
		set $addr $remote_addr;

		proxy_set_header X-Forwarded-For $addr;
		proxy_set_header Connection '';


# 这一行将msdl服务器返回的302重定向到我自己的/msdl/blob/<存储账户名>/<文件路径> ,然后再实际请求对应的文件
		proxy_redirect ~https://(vsblobprodscussu5shard([0-9]+)).blob.core.windows.net/(?<filepath>.*) /msdl/blob/\/$filepath;
		proxy_set_header X-Real-IP $remote_addr;
		proxy_set_header Host msdl.microsoft.com;

		proxy_ssl_name msdl.microsoft.com;
		proxy_ssl_server_name on;
	}

# 实际的反代请求到对应的blob目录
	location ~ /msdl/blob/(vsblobprodscussu5shard([0-9]+))/(?<filepath>.*) {
		set $blobhost $1.blob.core.windows.net;

		add_header X-Upstream-Server $blobhost 'always';
		add_header X-Upstream-Path $filepath 'always';

# 需要带上 $args (query string包含blob签名,否则请求会404)
		proxy_pass https://$blobhost:443/$filepath$is_args$args;
		proxy_set_header X-Forwarded-For $addr;
                proxy_set_header Connection '';
 		proxy_set_header X-Real-IP $remote_addr;
                proxy_set_header Host $blobhost;
		proxy_ssl_name $blobhost;
		proxy_ssl_server_name on;
	}

你问我这么做的好处都有啥?没有,其实就是闲的蛋疼折腾一下(草),这个方案比较浪费性能(因为正则表达式解析的问题),所以如果不是跟我一样闲的蛋疼的话建议直接nginx里面follow 302就好了。(大草)

哦当然,如果你有需要的话可以拿这个镜像去用,地址是 https://msdl.sunflyer.cn,替换 https://msdl.microsoft.com 就好了。比如常见的地址应该是 https://msdl.microsoft.com/download/symbols,那就改成 https://msdl.sunflyer.cn/download/symbols

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注