再次更新:ubuntu下安装pyside
sudo apt-get install python-pyside
sudo apt-get install python3-pyside
update:ubuntu pyside 安装 http://pyside.readthedocs.io/en/latest/building/linux.html
使用ghost.py(webkit)可以很方便爬取javascript接口等生成数据
ghost.py安装
第一步:安装PySide (ubuntu), centos下安装参照PySide官网(yum install qtwebkit qtwebkit-devel)sudo apt-get install cmakesudo apt-get install libqt4-devsudo apt-get install qt4-dev-tools sudo apt-get install qtmobility-devsudo apt-get install python2.7-devsudo apt-get install libphonon-devpip install wheelwget https://pypi.python.org/packages/source/P/PySide/PySide-1.2.2.tar.gztar -xvzf PySide-1.2.2.tar.gzcd PySide-1.2.2python setup.py bdist_wheel --qmake=/usr/bin/qmake-qt4python pyside_postinstall.py -install第一步2: 如果在没有X的linux系统下使用ghost.py还需要安装 xvfbsudo apt-get install xvfbyum install xorg-X11-server-Xvfb用xvfb执行:xvfb-run --auto-servernum --server-args="-screen 0 1280x760x24" python x.py第二步: 安装ghost.pypip install ghost.py
appannie 网站数据分析可知,游戏列表数据是javascript生成的,如果使用requests不能直接用 xpath 匹配, 用ghost.py可以很方便的使用 xpath
配合lxml使用爬取 appannie 网站的应用
# -*- coding: utf-8 -*-from ghost import Ghostimport lxml.htmlagent = 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/40.0.2214.93 Safari/537.36'ghost = Ghost(user_agent=agent, wait_timeout=120)ghost.set_proxy('socks5', '192.168.1.111', 1080) # 使用socks5代理page, extra_resources = ghost.open('https://www.appannie.com/apps/google-play/publisher/20200000600489/?&page=2')ghost.wait_for_text('data-ref="main"', timeout=60) # 等待网页的'data-ref="main"'出现html = lxml.html.fromstring(ghost.content)e = html.xpath('//*[@id="container"]/div[2]/div[2]/div/div[2]/div/div[2]/div[1]/div[2]/table/tbody')[0] # for tr in e.getchildren(): print tr.getchildren()[3].text