nginx服务整理日志分析(shell+python)的两种方法

python脚本

log_format main ‘$remote_addr – $remote_user [$time_iso8601] “$request” ‘

‘$status $body_bytes_sent “$http_referer” ‘

‘”$http_user_agent” “$http_x_forwarded_for” ‘

‘ “$upstream_addr” “$upstream_status” “$request_time”`;

cat website.access.log| awk ‘{print $(nf)}’ | awk -f “\”” ‘{print $2′}>a.txt

paste -d ” ” website.access.log a.txt > b.txt

cat b.txt |awk ‘($nf>1){print $6$7 ” ” $nf}’>c.txt

linux下使用awk,wc,sort,uniq,grep对nginx日志进行分析和统计

b). 字段含义(如下说明)

column1:ip_address

column2:log_time

column3:request

column4:status_code

column5:send_bytes

column6:referer

需求一:统计总记录数,总成功数,各种失败数:404,403,500

cat data.log|awk -f ‘\t’ ‘{if($4 > 0) print $4}’|wc -l|

awk ‘{print “total items:”$1}’

2. 提取成功、各种失败总数

cat data.log|awk -f ‘\t’ ‘{if($4>0 && $4==200) print $4}’|wc -l

需求二:各种错误中,哪类url出现的次数最多,要求剔除重复项,并倒叙给出结果

cat data.log|awk -f ‘\t’ ‘{if($4>0 && $4==500) print $3}’|awk ‘{print $2}’|sort|uniq -c|sort -k1 nr

需求三:要统计url中文件名出现的次数,结果中要包含code 和 referer。但是 url和 referer中都包含 / 字符,对于过滤有干扰,尝试去解决。

cat data.log|awk ‘{print $5,$7,$9}’|grep 200|

sed ‘s#.*/\(.*\)#\1#’|sort -k1|uniq -c

wc -l access.log |awk ‘{print $1}’ 总请求数

awk ‘{print $1}’ access.log|sort |uniq |wc -l 独立ip数

awk -f'[ []’ ‘{print $5}’ access.log|sort|uniq -c|sort -rn|head -5 每秒客户端请求数 top5

awk ‘{print $1}’ access.log|sort |uniq -c | sort -rn |head -5 访问最频繁ip top5

awk ‘{print $7}’ access.log|sort |uniq -c | sort -rn |head -5 访问最频繁的url top5

awk ‘{if ($12 > 10){print $7}}’ access.log|sort|uniq -c|sort -rn |head -5

响应大于10秒的url top5

awk ‘{if ($13 != 200){print $13}}’ access.log|sort|uniq -c|sort -rn|head -5

分析请求数大于50000的源ip的行为

awk ‘{print $1}’ access.log|sort |uniq -c |sort -rn|awk ‘{if ($1 > 50000){print $2}}’ > tmp.txt

for i in $(cat tmp.txt)

do

echo $i >> analysis.txt

echo “访问行为统计” >> analysis.txt

grep $i access.log|awk ‘{print $6}’ |sort |uniq -c | sort -rn |head -5 >> analysis.txt

echo “访问接口统计” >> analysis.txt

grep $i access.log|awk ‘{print $7}’ |sort |uniq -c | sort -rn |head -5 >> analysis.txt

echo -e “\n” >> /root/analysis/$ydate.txt

done

如果源ip来自代理服务器,应将第一条命令过滤地址改为$http_x_forwarded_for地址

awk ‘{print $nf}’ access.log|sort |uniq -c |sort -rn|awk ‘{if ($1 > 50000){print $2}}’ > tmp.txt

5.性能指标

并发连接数

客户端向服务器发起请求,并建立了tcp连接。每秒钟服务器链接的总tcp数量,就是并发连接数

pv(page view) uv(unique visitor) 独立ip

6.故障

1.nginx connection 不夠用 的參數調整

2.nginx+php-fpm出现502

3.线上nginx的一次“no live upstreams while connecting to upstream ”分析

4.nginx proxy_pass末端神奇的斜线

5.nginx+tomcat使用apache的ftpclient上传图片时由于多线程问题导致的文件大小为0的问题

案例一
ip – – [23/mar/2017:00:17:49 +0800] “get / http/1.1” 302 0 “-” “pycurl/7.19.7”
log_format access ‘$http_x_real_ip – $remote_user [$time_local] “$request”‘
‘$status $body_bytes_sent “$http_referer” ‘
‘”$http_user_agent” $http_x_forwarded_for’;
192.168.21.1 – – [27/jan/2014:11:28:53 +0800] “get /2.php http/1.1” 200 133 “-” “mozilla/5.0 (windows nt 6.1; wow64) applewebkit/537.36 (khtml, like gecko) chrome/33.0.1707.0 safari/537.36” “-“192.168.21.128 200 127.0.0.1:9000 0.119 0.119
#log_format main ‘$remote_addr – $remote_user [$time_local] “$request” ‘
# ‘$status $body_bytes_sent “$http_referer” ‘
# ‘”$http_user_agent” “$http_x_forwarded_for”‘;
$http_host:用户在浏览器中输入的url(ip或着域名)地址 192.168.21.128
$upstream_status: upstream状态 200
$upstream_addr: 后端upstream地址及端口 127.0.0.1:9000
$request_time: 页面访问总时间 0.119
$upstream_response_time:页面访问中upstream响应时间 0.119
$10 $body_bytes_sent
$1 $remote_addr
$7 $request
$11 $http_referer
$9 $status
$6 http_user_agent
1、总访问量
2、总带宽
3、独立访客量
4、访问ip统计
5、访问url统计
6、来源统计
7、404统计
8、搜索引擎访问统计(谷歌,百度)
9、搜索引擎来源统计(谷歌,百度)
#!/bin/bash
log_path=/home/www.centos.bz/log/access.log.1
domain=”centos.bz”
email=”log@centos.bz”
maketime=`date +%y-%m-%d” “%h”:”%m`
logdate=`date -d “yesterday” +%y-%m-%d`
total_visit=`wc -l ${log_path} | awk ‘{print $1}’`
total_bandwidth=`awk -v total=0 ‘{total+=$10}end{print total/1024/1024}’ ${log_path}`
total_unique=`awk ‘{ip[$1]++}end{print asort(ip)}’ ${log_path}`
ip_pv=`awk ‘{ip[$1]++}end{for (k in ip){print ip[k],k}}’ ${log_path} | sort -rn | head -20`
url_num=`awk ‘{url[$7]++}end{for (k in url){print url[k],k}}’ ${log_path} | sort -rn | head -20`
referer=`awk -v domain=$domain ‘$11 !~
/http:\/\/[^/]*'”$domain”‘/{url[$11]++}end{for (k in url){print
url[k],k}}’ ${log_path} | sort -rn | head -20`
notfound=`awk ‘$9 == 404 {url[$7]++}end{for (k in url){print url[k],k}}’ ${log_path} | sort -rn | head -20`
spider=`awk -f'”‘ ‘$6 ~ /baiduspider/ {spider[“baiduspider”]++} $6 ~
/googlebot/ {spider[“googlebot”]++}end{for (k in spider){print
k,spider[k]}}’ ${log_path}`
search=`awk -f'”‘ ‘$4 ~ /http:\/\/www\.baidu\.com/
{search[“baidu_search”]++} $4 ~ /http:\/\/www\.google\.com/
{search[“google_search”]++}end{for (k in search){print k,search[k]}}’
${log_path}`
#echo -e “概况\n报告生成时间:${maketime}\n总访问量:${total_visit}\n总带宽:${total_bandwidth}m\n独
立访客:${total_unique}\n\n访问ip统计\n${ip_pv}\n\n访问url统计\n${url_num}\n\n来源页面统计
\n${referer}\n\n404统计\n${notfound}\n\n蜘蛛统计\n${spider}\n\n搜索引擎来源统计
\n${search}” | mail -s “$domain $logdate log statistics” ${email}案例二
# tar zxvf pymongo-1.11.tar.gz
# cd pymongo-1.11
# python setup.py install
python连接mongodb样例
$ cat conn_mongodb.py
#!/usr/bin/python
import pymongo
import random
conn = pymongo.connection(“127.0.0.1”,27017)
db = conn.tage #连接库
db.authenticate(“tage”,”123″)
#用户认证
db.user.drop()
#删除集合user
db.user.save({‘id’:1,’name’:’kaka’,’sex’:’male’})
#插入一个数据
for id in range(2,10):
name = random.choice([‘steve’,’koby’,’owen’,’tody’,’rony’])
sex = random.choice([‘male’,’female’])
db.user.insert({‘id’:id,’name’:name,’sex’:sex})
#通过循环插入一组数据
content = db.user.find()
#打印所有数据
for i in content:
print i
编写python脚本
#encoding=utf8
import re
zuidaima_nginx_log_path=”/usr/local/nginx/logs/www.zuidaima.com.access.log”
pattern = re.compile(r’^\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}’)
def stat_ip_views(log_path):
ret={}
f = open(log_path, “r”)
for line in f:
match = pattern.match(line)
if match:
ip=match.group(0)
if ip in ret:
views=ret[ip]
else:
views=0
views=views+1
ret[ip]=views
return ret
def run():
ip_views=stat_ip_views(zuidaima_nginx_log_path)
max_ip_view={}
for ip in ip_views:
views=ip_views[ip]
if len(max_ip_view)==0:
max_ip_view[ip]=views
else:
_ip=max_ip_view.keys()[0]
_views=max_ip_view[_ip]
if views>_views:
max_ip_view[ip]=views
max_ip_view.pop(_ip)
print “ip:”, ip, “,views:”, views
#总共有多少ip
print “total:”, len(ip_views)
#最大访问的ip
print “max_ip_view:”, max_ip_view
run()

以上就是nginx服务整理日志分析(shell+python)的两种方法的详细内容,更多请关注 第一php社区 其它相关文章!

Posted in 未分类

发表评论