前言
可能很多人会觉得这是一个奇葩的需求,爬虫去好好的爬数据不就行了,解析js干嘛?吃饱了撑的?
搜索一下互联网上关于这个问题还真不少,但是大多数童鞋是因为自己的js基础太烂,要么是html基础烂,要么ajax基础烂,反正各方面都很烂。基础这么渣不好好去学基础写什么爬虫?
那你肯定要问了“请问我的朋友,你tm怎么也有这个需求?莫非你是个技术渣?”
非也非也,博主作为一个拥有3年多前端经验的攻城尸,怎么会被这个问题给难倒呢,老夫今天遇到的问题很显然没有那么简单。
问题
那么博主到底是遇到什么问题了呢?
博主今天要去爬一个接口,但是调用那个接口需要带上令牌,也就是存储在cookie中的一个类似token的东西,cookie的值是一段js生成的,这段js又是通过另外一个接口获取回来的,而获取回来的js代码还是动态的,wtf!!!开发人员你这是 弄撒嘞?
路人甲:我擦嘞,声称经验老道的博主不会分析js的逻辑?
对,我就是不会,特么的js代码都是混淆加密的,眼睛都看瞎了都特么不知道写的都是写啥?
算了,我直接执行拿到结果就好了,管他写的是什么鬼。
思路
理一理思路,现在要做的事情其实很简单
请求接口a,拿到动态生成的混淆过的js代码
执行js代码,拿到生成的cookie值
请求接口b,带上js生成的令牌
拿到结果,愉快的玩耍…
思路相当的清晰,感觉秒秒钟就可以实现了呢。()
难题
python里面执行js?有点意思,我干嘛不用nodejs呢?
因为python是世界上最屌的语言啊!没有之一!
找到了pyv8这个神奇的模块,机器已经有了pip,执行安装一下不就ok了?
pip install pyv8
不要怀疑,博主机器装的是 kali linux ,root 权限,不需要 sudo
接着报错
pip install -u pyv8
collecting pyv8
using cached pyv8-0.5.zip
building wheels for collected packages: pyv8
running setup.py bdist_wheel for pyv8 … error
complete output from command /usr/bin/python -u -c “import setuptools, tokenize;__file__=’/tmp/pip-build-qum4bx/pyv8/setup.py’;exec(compile(getattr(tokenize, ‘open’, open)(__file__).read().replace(‘\r\n’, ‘\n’), __file__, ‘exec’))” bdist_wheel -d /tmp/tmpb0udlepip-wheel- –python-tag cp27:
running bdist_wheel
running build
running build_py
creating build
creating build/lib.linux-x86_64-2.7
copying pyv8.py -> build/lib.linux-x86_64-2.7
running build_ext
building ‘_pyv8′ extension
creating build/temp.linux-x86_64-2.7
creating build/temp.linux-x86_64-2.7/src
x86_64-linux-gnu-gcc -pthread -dndebug -g -fwrapv -o2 -wall -wstrict-prototypes -fno-strict-aliasing -wdate-time -d_fortify_source=2 -g -fdebug-prefix-map=/build/python2.7-cft4xx/python2.7-2.7.12=. -fstack-protector-strong -wformat -werror=format-security -fpic -dboost_python_static_lib -ilib/python/inc -ilib/boost/inc -ilib/v8/inc -i/usr/include/python2.7 -c src/exception.cpp -o build/temp.linux-x86_64-2.7/src/exception.o
cc1plus: warning: command line option ‘-wstrict-prototypes’ is valid for c/objc but not for c++
in file included from src/exception.cpp:1:0:
src/exception.h:6:16: fatal error: v8.h: 没有那个文件或目录
#include
^
compilation terminated.
error: command ‘x86_64-linux-gnu-gcc’ failed with exit status 1
—————————————-
failed building wheel for pyv8
running setup.py clean for pyv8
failed to build pyv8
installing collected packages: pyv8
running setup.py install for pyv8 … error
complete output from command /usr/bin/python -u -c “import setuptools, tokenize;__file__=’/tmp/pip-build-qum4bx/pyv8/setup.py’;exec(compile(getattr(tokenize, ‘open’, open)(__file__).read().replace(‘\r\n’, ‘\n’), __file__, ‘exec’))” install –record /tmp/pip-7oawua-record/install-record.txt –single-version-externally-managed –compile:
running install
running build
running build_py
creating build
creating build/lib.linux-x86_64-2.7
copying pyv8.py -> build/lib.linux-x86_64-2.7
running build_ext
building ‘_pyv8′ extension
creating build/temp.linux-x86_64-2.7
creating build/temp.linux-x86_64-2.7/src
x86_64-linux-gnu-gcc -pthread -dndebug -g -fwrapv -o2 -wall -wstrict-prototypes -fno-strict-aliasing -wdate-time -d_fortify_source=2 -g -fdebug-prefix-map=/build/python2.7-cft4xx/python2.7-2.7.12=. -fstack-protector-strong -wformat -werror=format-security -fpic -dboost_python_static_lib -ilib/python/inc -ilib/boost/inc -ilib/v8/inc -i/usr/include/python2.7 -c src/exception.cpp -o build/temp.linux-x86_64-2.7/src/exception.o
cc1plus: warning: command line option ‘-wstrict-prototypes’ is valid for c/objc but not for c++
in file included from src/exception.cpp:1:0:
src/exception.h:6:16: fatal error: v8.h: 没有那个文件或目录
#include
^
compilation terminated.
error: command ‘x86_64-linux-gnu-gcc’ failed with exit status 1
—————————————-
command “/usr/bin/python -u -c “import setuptools, tokenize;__file__=’/tmp/pip-build-qum4bx/pyv8/setup.py’;exec(compile(getattr(tokenize, ‘open’, open)(__file__).read().replace(‘\r\n’, ‘\n’), __file__, ‘exec’))” install –record /tmp/pip-7oawua-record/install-record.txt –single-version-externally-managed –compile” failed with error code 1 in /tmp/pip-build-qum4bx/pyv8/
似乎是因为缺少 v8.h 这个文件导致的,可是又看不懂啥意思。
解决
通过搜索引擎找到了解决方案,原来是因为 pyv8 依赖于boost ,然而这个问题官方并没有说,所以得先安装下这个包
apt-get update && apt-get install libboost-all-dev
安装完成之后继续安装 pyv8 ,依然是上面同样的问题,看来只能手动来了。
下载 #
解压并选择合适自己系统环境的文件,再次解压 并把解压得到的文件复制到
/usr/lib/python2.7/dist-packages/
里面去,然后测试看是否成功,终端执行
python
import pyv8
如果没有报错,那就成功了,开始愉快的玩耍,下面是我需要解析的js代码
var l = [119, 98, 115, 33, 111, 109, 120, 105, 118, 62, 92, 50, 50, 54, 45, 50, 50, 51, 45, 50, 50, 55, 45, 50, 49, 58, 45, 50, 50, 49, 45, 50, 51, 51, 45, 50, 50, 52, 45, 50, 50, 51, 45, 50, 50, 54, 45, 50, 49, 55, 45, 50, 49, 58, 45, 50, 49, 50, 45, 50, 50, 54, 45, 50, 50, 58, 45, 50, 50, 49, 45, 50, 50, 51, 45, 50, 50, 58, 45, 50, 51, 51, 45, 50, 50, 58, 45, 50, 50, 55, 45, 50, 50, 54, 45, 50, 50, 54, 94, 60, 119, 98, 115, 33, 121, 119, 99, 100, 108, 62, 92, 49, 45, 51, 50, 45, 53, 45, 55, 45, 50, 50, 45, 57, 45, 56, 45, 50, 51, 45, 51, 45, 51, 49, 45, 50, 52, 45, 50, 54, 45, 50, 49, 45, 50, 57, 45, 52, 45, 58, 45, 50, 53, 45, 50, 56, 45, 54, 45, 50, 55, 45, 50, 58, 45, 50, 94, 60, 119, 98, 115, 33, 118, 62, 35, 35, 60, 103, 112, 115, 33, 41, 119, 62, 49, 60, 119, 61, 121, 119, 99, 100, 108, 47, 109, 102, 111, 104, 117, 105, 60, 119, 44, 44, 42, 124, 118, 44, 62, 84, 117, 115, 106, 111, 104, 47, 103, 115, 112, 110, 68, 105, 98, 115, 68, 112, 101, 102, 41, 111, 109, 120, 105, 118, 92, 121, 119, 99, 100, 108, 92, 119, 94, 94, 42, 126, 60, 37, 47, 100, 112, 112, 108, 106, 102, 41, 40, 114, 117, 112, 108, 102, 111, 40, 45, 118, 45, 124, 113, 98, 117, 105, 59, 40, 48, 40, 126, 42, 60];
eval(function(p, a, c, k, e, d) {
e = function(c) {
return (c < a ? "" : e(parseint(c / a))) + ((c = c % a) > 35 ? string.fromcharcode(c + 29) : c.tostring(36))
};
if (!”.replace(/^/, string)) {
while (c–) d[e(c)] = k[c] || e(c);
k = [function(e) {
return d[e]
}];
e = function() {
return ‘\\w+’
};
c = 1
};
while (c–) if (k[c]) p = p.replace(new regexp(‘\\b’ + e(c) + ‘\\b’, ‘g’), k[c]);
return p
}(‘6 3=\’\’;7(2=0;2