【采集代码】采集代码实例 - 查问我看

原 WEB技术之后端技术

|-原【采集代码】采集代码实例

PHPer 2020-03-28 1704 0 0

这里记录一些采集代码的实例 20200328另外之前python采集的代码放那里了，2014还是2013搞的记不清了，有空重新写下。QueryList采集Curl类采集curl函数采集

这里记录一些采集代码的实例 20200328 ...

浏览更多内容请先登录。 立即注册

WEB, WEB后端, 采集

更新于：2022-05-22 17:18:21

您需要登录后才可以评论。立即注册

|--原 QueryList 采集代码实例

QueryList 采集代码实例主要看QueryList的部分，其他的不用理会 20200328//采集traileraddict的首页 public static function collectInfoTraileraddictHome(){ $s=time(); $filename=config('website.logs.collect');//记录采集的日志文件 $baseUrl=''; //待采集的目标页面，PHPHub教程区 $page = 'https://www.traileraddict.com/';//traileraddict的首页 //列表选择器 bxslider $rang = '#homemenu >li'; //采集规则 $rules = array( //文章标题 'title' => ['a','title'], //文章链接 'url' => ['a','href'], //图片 'source_image' => ['img','src'] ); //采集 $data = \QL\QueryList::Query($page,$rules,$rang)->data; $rang2= '#top_features >ul >li'; $rules2 = [ //文章标题 'title' => ['h2','text'], //文章链接 'url' => ['a','href'], //图片 'source_image' => ['','style','',function($content) use($baseUrl){ $content=str_replace('background-image:url(', '', $content); $content=str_replace(')', '', $content); $content='https:'.$content; return $content;}], ]; $data2 = \QL\QueryList::Query($page,$rules2,$rang2)->data; $rang3= '.featured_box'; $rules3 = [ //文章标题 'title' => ['a','text'], //文章链接 'url' => ['a','href'], //图片 'source_image' => ['a','href','',function($content) { return ''; }], ]; $data3 = \QL\QueryList::Query($page,$rules3,$rang3)->data; $i=0;//插入多少条到数据库计数 $j=0;//采集过的二级链接计数 $datas=array_merge($data,$data2,$data3); $num=count($datas);

浏览更多内容请先登录。 立即注册

WEB, WEB后端, 采集

更新于：2022-09-01 23:17:15

|--原 Curl类采集，之前网上找的

之前网上找的别人写的专门用来采集的类，封装了php的curl 20200328 class cURL { var $headers; var $user_agent; v...

之前网上找的别人写的专门用来采集的类，封装了php的curl 20200328 class cURL { var $headers; var $user_agent; var $compression; var $cookie_file; var $proxy; /** * 初始化 * * @param string $cookies * @param string $cookie * @param string $compression * @param string $proxy */ function cURL($cookies = TRUE, $cookie = 'cookies.txt', $compression = 'gzip', $proxy = '') { $this->headers [] = 'Accept: image/gif, image/x-bitmap, image/jpeg, image/pjpeg'; $this->headers [] = 'Connection: Keep-Alive'; $this->headers [] = 'Content-type: application/x-www-form-urlencoded;charset=UTF-8'; $this->user_agent = 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.0.3705; .NET CLR 1.1.4322; Media Center PC 4.0)'; $this->compression = $compression; $this->proxy = $proxy; $this->cookies = $cookies; if ($this->cookies == TRUE) $this->cookie ( $cookie ); } /** * 配置cookie * * @param unknown $cookie_file */ function cookie($cookie_file) { if (file_exists ( $cookie_file )) { $this->cookie_file = $cookie_file; } else { fopen ( $cookie_file, 'w' ) or $this->error ( 'The cookie file could not be opened. Make sure this directory has the correct permissions' ); $this->cookie_file = $cookie_file; fclose ( $this->cookie_file ); } } /** * get方式打开页面 * * @param unknown $url * @return mixed */ function get($url) { $process = curl_init ( $url ); curl_setopt ( $process, CURLOPT_HTTPHEADER, $this->headers ); curl_setopt ( $process, CURLOPT_HEADER, 0 ); curl_setopt ( $process, CURLOPT_USERAGENT, $this->user_agent ); if ($this->cookies == TRUE) c

浏览更多内容请先登录。 立即注册

WEB, WEB后端, 采集

更新于：2020-03-28 12:50:12

|--原自己用curl函数封装的一些采集函数

自己封装的一些采集函数，这里记录下 20200328 /** * () 13N2y19 1203 * @access public * @param $search_str...

自己封装的一些采集函数，这里记录下 20200328 /** * () 13N2y19 1203 * @access public * @param $search_str 获取要搜索的字符串 * @return $contents 获取的网页内容 */ function curl_google($search_str){ $cookie_file = "google.txt"; $str_urlencode=urlencode($search_str); $url = "http://www.google.com/search?q={$str_urlencode}"; $ch = curl_init(); curl_setopt($ch, CURLOPT_URL, $url); curl_setopt($ch, CURLOPT_USERAGENT, $_SERVER['HTTP_USER_AGENT']); curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1); curl_setopt($ch, CURLOPT_COOKIEJAR, $cookie_file); $contents = curl_exec($ch); curl_close($ch); return $contents; } function curl_baidu($search_str,$pn){ if(isset($pn)===false){ $pn=0; } $cookie_file = "baidu.txt"; $search_str=iconv('utf-8','gbk',$search_str); $str_urlencode=urlencode($search_str); $url="http://www.baidu.com/s?wd={$str_urlencode}&pn={$pn}"; $ch = curl_init(); curl_setopt($ch, CURLOPT_URL, $url); curl_setopt($ch, CURLOPT_USERAGENT, $_SERVER['HTTP_USER_AGENT']); curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1); curl_setopt($ch, CURLOPT_COOKIEJAR, $cookie_file); $contents = curl_exec($ch); $contents=iconv('gbk','utf-8',$contents); curl_close($ch); return $contents; } function curl_baiduzhidao($search_str,$pn,$sort){ if(isset($pn)===false){ $pn=0; } if(isset($sort)===false){ $sort=0; } $cookie_file = "baidu.txt"; $search_str=iconv('utf-8','gbk',$search_str); $str_urlencode=urlencode($search_str); $url="http://zhidao.baidu.com/search?word={$search_str}&lm=0&rn=10&sort={$sort}&ie=gbk&pn={$pn}"; $ch = curl_init(); curl_setopt($ch, CURLOPT_URL, $url); curl_setopt($ch, CURLOPT_USERAGENT, $_SERVER['HTTP_USER_AGENT']); curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1); curl_setopt($ch, CURLOPT_COOKIEJAR, $cookie_file); $contents = curl_exec($ch); $contents=iconv('gbk','utf-8',$contents); curl_close($ch); return $contents; } function curl_mtimeid_moviedetails($mtime_id){ $cookie_file = "mtime.txt"; $url="http://movie.mtime.com/{$mtime_id}/details.html#menu"; $ch = curl_init(); curl_setopt($ch, CURLOPT_URL, $url); curl_setopt($ch, CURLOPT_USERAGENT, $_SERVER['HTTP_USER_AGENT']); curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1); curl_se

浏览更多内容请先登录。 立即注册

WEB, WEB后端, 采集

更新于：2020-03-28 12:53:08

原 WEB技术之后端技术

|-原 【采集代码】采集代码实例

|--原 QueryList 采集代码实例

|--原 Curl类采集，之前网上找的

|--原 自己用curl函数封装的一些采集函数

7

1222

136w+

229

服务器搭建

WEB

个人爱好

游戏

linux

互联网

操作系统

mysql

Python

Yii2

php

WEB后端

网站建设

采集

WEB前端

Centos

经济

工具

生活

内容整理

数据库

资源

OS

电影

JS

常用命令

保险

php项目

问题整理

IT

网站

魔兽世界

composer

NodeJs

观点

AI

欧美电影

Yii扩展

美女

学习

LAMP

全文索引

Apache

前端

发现

Windows

Android

影评

服务器维护

国产电影

uwow

PHP框架

邮件服务器

评测

随笔

服务器

音乐

历史

推荐内容

|-原【采集代码】采集代码实例

|--原自己用curl函数封装的一些采集函数