python获取完整网页内容（含js动态加载的）：selenium+phantomjs - 查问我看

摘各种采集应用和采集插件的安装和采集代码整理

|-转 python获取完整网页内容（含js动态加载的）：selenium+phantomjs

PHPer 2023-12-10 341 0 0

https://blog.csdn.net/huwei2003/article/details/107490468
建议安装pip install Selenium4R
Selenium4R是 Selenium4的魔改版，国内可以直接安装

1 不管用requests_html，还是获取网页的源码时，发现通过ajax动态加载的内容都获取不到，得通过分析动态加载的接口去重新请求数据,有时很不方便。

2 下面我们利用 +phantomjs 来实现一次性获取网页上所有的内容；

1. 下载Phantomjs，下载地址：https://phantomjs.org/download.html 选择下载windows的还是linux的 2. 下完之后直接解压就OK了，然后selenium的安装用pip就行了

代码：

import requests
from lxml import etree
from lxml import html
from html.parser import HTMLParser #导入html解析库
from selenium import webdriver
import time
 
def getHTMLText(url):
        driver = webdriver.PhantomJS(executable_path='E:\\pythontest\\phantomjs-2.1.1-windows\\bin\\phantomjs')  # phantomjs的绝对路径
        time.sleep(2)
        driver.get(url)  # 获取网页
        time.sleep(2)
        return driver.page_source
 
def getHtmlByXpath(html_str,xpath):
        strhtml = etree.HTML(html_str)
        strResult = strhtml.xpath(xpath)
        return strResult
 
def w_file(filepath,contents):
    with open(filepath,'w',encoding='gb18030') as wf:
        wf.write(contents)
        
 
 
def main():
    url = 'https://m.fygdrs.com/h5/news.html?t=2&id=67062' #要访问的网址
    strhtml = getHTMLText(url) #获取HTML
    #print(html)
    w_file('E:\\pythontest\\wfile.txt',strhtml)
    strDiv=getHtmlByXpath(strhtml,"//div[@id='Article-content']")
    if(strDiv):
        str1= html.tostring(strDiv[0])
        print(str1)
        str2 = HTMLParser().unescape(str1.decode())
        print(str2)
        w_file('E:\\pythontest\\wfile3.txt',str2)
        
    print('ok')
 
 
if __name__ == '__main__':
    main()

--- end --- ...

浏览更多内容请先登录。 立即注册

分享的网址网站均收集自搜索引擎以及互联网，非查问网运营，查问网并没有提供其服务，请勿利用其做侵权以及违规行为。

采集, 内容整理

更新于：2023-12-10 19:47:52

您需要登录后才可以评论。立即注册

php获取链接跳转的真实地址

PHP向js传数组

python 学习中遇到的问题整理

没有使用asynccontextmanager ，但是报cannot import name 'asynccontextman...

python3.10.0+pyinstaller4.7打包，IndexError: tuple index out of range...

摘 各种采集应用和采集插件的安装和采集代码整理

|-转 python获取完整网页内容（含js动态加载的）：selenium+phantomjs

7

948

119w+

228

服务器搭建

WEB

个人爱好

游戏

互联网

mysql

Yii2

php

WEB后端

linux

采集

网站建设

Python

操作系统

Centos

WEB前端

经济

生活

内容整理

数据库

资源

OS

电影

JS

保险

常用命令

php项目

网站

IT

问题整理

composer

观点

工具

NodeJs

欧美电影

Yii扩展

美女

学习

魔兽世界

LAMP

全文索引

Apache

发现

Android

前端

影评

服务器维护

国产电影

uwow

PHP框架

随笔

评测

服务器

邮件服务器

音乐

历史

Windows

错误处理

推荐内容

摘各种采集应用和采集插件的安装和采集代码整理