关于 nginx 日志分析

Context

这次是自己的需求,刚好周六,闲来无事来公司加班,好好总结下。

自己做了个小程序,用来收集平时的位置信息,还有微信步数什么的,租的国外的服务器,,翻墙和部署项目两不误,顺便做做linux的练习,完美~ 在初期,部署项目以及一些静态资源时,经常404,肯定nginx哪里配置错了,,直接在linux服务器上用命令行翻日志又很麻烦,ssh还总断(扶额)。把nginx日志文件当静态资源访问好啦,这样直接在浏览器就可以访问,完美~这一步并不难,好好配置nginx.conf中的location属性就好,略过不表。

但我担心的是访问路径暴露了怎么办。。有必要加上基础的访问限制,验证用户名密码什么的。还要自己写个页面?在数据库配置用户名密码?太不优雅了。还好有nginx的auth_basic模块,完美解决问题~

这之后,有一次闲来没事,翻翻nginx日志,,呦,被吓一跳!👇

1
2
3
4
5
6
7
8
9
10
{"timestamp":"2018-11-09T19:58:25-05:00","host":"66.98.120.58","client":"180.180.243.223","size":162,"requestlengh":195,"requesttime":0.000,"responsetime":"-","domain":"66.98.120.58","url":"/dbadmin/index.php","referer":"-","agent":"Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:59.0) Gecko/20100101 Firefox/59.0","status":"404","x_forwarded_for":"-"}
{"timestamp":"2018-11-09T19:58:26-05:00","host":"66.98.120.58","client":"180.180.243.223","size":162,"requestlengh":202,"requesttime":0.000,"responsetime":"-","domain":"66.98.120.58","url":"/web/phpMyAdmin/index.php","referer":"-","agent":"Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:59.0) Gecko/20100101 Firefox/59.0","status":"404","x_forwarded_for":"-"}
{"timestamp":"2018-11-09T19:58:26-05:00","host":"66.98.120.58","client":"180.180.243.223","size":162,"requestlengh":197,"requesttime":0.000,"responsetime":"-","domain":"66.98.120.58","url":"/admin/pma/index.php","referer":"-","agent":"Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:59.0) Gecko/20100101 Firefox/59.0","status":"404","x_forwarded_for":"-"}
{"timestamp":"2018-11-09T19:58:26-05:00","host":"66.98.120.58","client":"180.180.243.223","size":162,"requestlengh":197,"requesttime":0.000,"responsetime":"-","domain":"66.98.120.58","url":"/admin/PMA/index.php","referer":"-","agent":"Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:59.0) Gecko/20100101 Firefox/59.0","status":"404","x_forwarded_for":"-"}
{"timestamp":"2018-11-09T19:58:26-05:00","host":"66.98.120.58","client":"180.180.243.223","size":162,"requestlengh":199,"requesttime":0.000,"responsetime":"-","domain":"66.98.120.58","url":"/admin/mysql/index.php","referer":"-","agent":"Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:59.0) Gecko/20100101 Firefox/59.0","status":"404","x_forwarded_for":"-"}
{"timestamp":"2018-11-09T19:58:26-05:00","host":"66.98.120.58","client":"180.180.243.223","size":162,"requestlengh":200,"requesttime":0.000,"responsetime":"-","domain":"66.98.120.58","url":"/admin/mysql2/index.php","referer":"-","agent":"Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:59.0) Gecko/20100101 Firefox/59.0","status":"404","x_forwarded_for":"-"}
{"timestamp":"2018-11-09T19:58:27-05:00","host":"66.98.120.58","client":"180.180.243.223","size":162,"requestlengh":204,"requesttime":0.000,"responsetime":"-","domain":"66.98.120.58","url":"/admin/phpmyadmin/index.php","referer":"-","agent":"Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:59.0) Gecko/20100101 Firefox/59.0","status":"404","x_forwarded_for":"-"}
{"timestamp":"2018-11-09T19:58:27-05:00","host":"66.98.120.58","client":"180.180.243.223","size":162,"requestlengh":204,"requesttime":0.000,"responsetime":"-","domain":"66.98.120.58","url":"/admin/phpMyAdmin/index.php","referer":"-","agent":"Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:59.0) Gecko/20100101 Firefox/59.0","status":"404","x_forwarded_for":"-"}
{"timestamp":"2018-11-09T19:58:27-05:00","host":"66.98.120.58","client":"180.180.243.223","size":162,"requestlengh":205,"requesttime":0.000,"responsetime":"-","domain":"66.98.120.58","url":"/admin/phpmyadmin2/index.php","referer":"-","agent":"Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:59.0) Gecko/20100101 Firefox/59.0","status":"404","x_forwarded_for":"-"}
{"timestamp":"2018-11-09T19:58:27-05:00","host":"66.98.120.58","client":"180.180.243.223","size":162,"requestlengh":198,"requesttime":0.000,"responsetime":"-","domain":"66.98.120.58","url":"/mysqladmin/index.php","referer":"-","agent":"Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:59.0) Gecko/20100101 Firefox/59.0","status":"404","x_forwarded_for":"-"}

👆以上是今天nginx日志中的一部分,,谢谢来自IP180.180.243.223(泰国)朋友的问候,,嗯,访问的路径都是/mysqladmin/index.php/admin/mysql/index.php/admin/PMA/index.php什么的,太暴力了!!我还是个孩子呀。。啥都别说了,nginx必须配置动态黑名单!

basic_auth

直接看nginx官方文档就好ngx_http_auth_basic_module,教程也很多,比较简单,,也是略过不表。

1
2
3
4
5
6
# 关于配置访问用户名和密码,,
# 第一次需要创建密码文件,加 -c 参数
htpasswd -c 密码文件路径 用户名

# 生成密码文件之后,再添加用户名密码
htpasswd 密码文件路径 用户名

添加动态ip黑名单

这一块,最早是想用ELK的,,然后服务器崩溃了。嗯,1G内存,放弃了。
还是用awk吧,之前看到 awk 的时候,命令那么长,看都没看直接略过了。后来拖太久又没找到好的解决方案,硬着头皮上吧,,花点时间读一读文档,其实也没太难(至少实现本节内容不算太难)。这儿主要参考了 Hjqjk’s Blog-Nginx动态黑名单,这位仁兄的博客看样子也是用的HEXO框架,哈哈同道中人~

需要说明的是,,因为还想过用Java IO将日志持久化到数据库,这样就可以用语句分析访问行为了,生成图表也方便。想想,工程量有点大,,后期日志数据可能比较收集到的位置信息和微信步数数据还要大,嗯排期吧先(笑哭)。所以,将nginx日志格式配置成了json格式👇

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
# json 格式配置
log_format logstash_json '{"timestamp":"$time_iso8601",'
'"host":"$server_addr",'
'"client":"$remote_addr",'
'"size":$body_bytes_sent,'
'"requestlengh":$request_length,'
'"requesttime":$request_time,'
'"responsetime":"$upstream_response_time",'
'"domain":"$host",'
'"url":"$request_uri",'
'"referer":"$http_referer",'
'"agent":"$http_user_agent",'
'"status":"$status",'
'"x_forwarded_for":"$http_x_forwarded_for"}';
# 启用logstash_json日志格式配置
access_log /var/log/nginx/access.log logstash_json;

以下是我筛选IP的脚本👇

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
#!/bin.bash
# (每天)对用户真实ip访问量做统计,粗略统计每天大于20次的加入 nginx 黑名单
# @latest_date 2018-11-2
# @author Shang

# 黑名单文件
conf_path=/pathToBlockips/blockips.conf
# nginx 日志文件
log_path='/pathToAccessLog/access.log-'$(date +%Y%m%d)
# nginx 命令绝对路径
nginx_command=/usr/sbin/nginx
# 过滤掉正常的蜘蛛访问 grep -i -v -E ${spider}|
# spider="Google|Baidu|msnbot|FeedSky|Sogou|360|bing|yahoo"

# 因为我的nginx使用yum安装的,会自动按日期切割日志,文件也不大,所以一次性全部读入日志文件
cat ${log_path}| \
# 指定','为分割符(认按空格分割),输出第三个field,即 "client":"180.180.243.223"
awk -F'[,]' '{print $3}' | \
# 指定':'为分割符,将第二个field(即"180.180.243.223")去掉首尾'"'后输出
awk -F: '{print substr($2,2,length($2)-2)}'| \
# 计数,类SQL中的count(ip) group by ip
uniq -c| \
# 倒叙排列,类SQL中的order by count(ip) desc
sort -rn| \
# 如果计数大于20次,以apeend的方式添加到ip黑名单文件中
awk '{ if($1 > 20) print "deny "$2 ";"}' >> ${conf_path}

# 检查nginx.conf是否正确
${nginx_command} -t
if [ $? -eq 0 ]
then
# 如果返回结果正常,则重新载入nginx.conf文件
${nginx_command} -s reload
fi

创建好该脚本(auto_add_nginx_blockips.sh),,记得赋予它可执行的权限,,

1
2
3
4
5
6
7
8
9
10
11
12
# 添加 x 权限
chmod u=rwx pathToAuto_add_nginx_blockips.sh

# 测试下先
./auto_add_nginx_blockips.sh

# 如果一切正常的话,就可以添加定时任务了~
# 编辑当前用户下的定时任务文件
crontab -e
# 然后添加定时任务,每天23:50执行脚本(顺便将正常执行之后的输出内容重定向到'无底洞'中,否则每天一封邮件,受不鸟..)
50 23 * * * /bin/sh /path/to/auto_add_nginx_blockips.sh 1>/dev/null
# :wq后会自动保存到`/var/spool/cron/$user`中

刚开始,制定的执行计划并不是在23:50,,唉,都是因为自动切割日志,切割也就算了,关键还有压缩成.gz文件,这样脚本的复杂度就上升了,又一直搞不明白logrotate什么时候执行切割任务,,费解。所以,最初一直blockips.conf一直没反应,后来开始查/var/log/cron,发现任务也执行了👇

1
2
3
4
5
6
7
Nov  9 23:01:01 host run-parts(/etc/cron.hourly)[6988]: starting 0anacron
Nov 9 23:01:01 host run-parts(/etc/cron.hourly)[6997]: finished 0anacron
## auto_add_nginx_blockips 脚本已执行
Nov 9 23:50:01 host CROND[9407]: (root) CMD (/bin/sh /etc/nginx/conf.d/auto_add_nginx_blockips.sh)
Nov 10 00:01:02 host CROND[9975]: (root) CMD (run-parts /etc/cron.hourly)
Nov 10 00:01:02 host run-parts(/etc/cron.hourly)[9975]: starting 0anacron
...

后来也不记得在哪看到的,如果定时任务执行的执行情况,会发送到mail,,可以去/var/spool/mail/当前用户名中查看👇

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
From root@host.localdomain  Tue Nov  6 01:30:02 2018
Return-Path: <root@host.localdomain>
X-Original-To: root
Delivered-To: root@host.localdomain
Received: by host.localdomain (Postfix, from userid 0)
id 0CF4D4E1C; Tue, 6 Nov 2018 01:30:02 -0500 (EST)
From: "(Cron Daemon)" <root@host.localdomain>
To: root@host.localdomain
Subject: Cron <root@host> /bin/sh /etc/nginx/conf.d/auto_add_nginx_blockips.sh
Content-Type: text/plain; charset=UTF-8
Auto-Submitted: auto-generated
Precedence: bulk
X-Cron-Env: <XDG_SESSION_ID=567>
X-Cron-Env: <XDG_RUNTIME_DIR=/run/user/0>
X-Cron-Env: <LANG=en_US.UTF-8>
X-Cron-Env: <SHELL=/bin/sh>
X-Cron-Env: <HOME=/root>
X-Cron-Env: <PATH=/usr/bin:/bin>
X-Cron-Env: <LOGNAME=root>
X-Cron-Env: <USER=root>
Message-Id: <20181106063002.0CF4D4E1C@host.localdomain>
Date: Tue, 6 Nov 2018 01:30:02 -0500 (EST)

# 日志文件路径出问题了,,估计是已经压缩成.gz文件了。。
cat: /var/log/nginx/access.log-20181106: No such file or directory
nginx: the configuration file /etc/nginx/nginx.conf syntax is ok
nginx: configuration file /etc/nginx/nginx.conf test is successful

如果你也遇到了类似的问题,可以尝试去mail中找找线索~ Done!