首页 > CMS > 织梦DEDE > 正文

织梦自带采集无法采集端口不为80的网址错误解决方法

2024-07-12 09:11:21
字体:
来源:转载
供稿:网友

织梦采集,一般用不到采集网址有端口的情况,少数有端口的网址就无法采集了。总结了下dede无法采集端口不为80的网址错误解决:

  问题描述,当采集的网址后代端口时(为防止有推广嫌疑就把网址换成xxx了。):

  测试采集网址:http://www.xxx.com:89/index.php/main/news/index.html?channel_id=104&page=1

  获取的列表测试信息网址是不带端口的结果是不带端口的数组集合:

  测试的列表网址: http://www.xxx.com:89/index.php/main/news/index.html?channel_id=104&page=1

Array  (  [0] => Array  (  [title] => 讲座回放|施奠东—西湖,世界风景园林的  [link] => http://www.xxx.com/index.php/main/news/15529.html  [image] => http://www.xxx.com/uploadfiles/articles/20190528/15529.png  )  [1] => Array  (  [title] => 喜报|恭贺我院2019年度西湖杯荣获佳绩!  [link] => http://www.xxx.com/index.php/main/news/15528.html  [image] => http://www.xxx.com/uploadfiles/articles/20190522/15528.jpg  )  [2] => Array  (  [title] => 讲座预告|西湖——世界风景园林的杰出范  [link] => http://www.xxx.com/index.php/main/news/15526.html  [image] => http://www.xxx.com/uploadfiles/articles/20190516/15526.jpg  )  [3] => Array  (  [title] => 讲座回放|胡理琛—西湖七十年流变忆胜  [link] => http://www.xxx.com/index.php/main/news/15524.html  [image] => http://www.xxx.com/uploadfiles/articles/20190513/15524.png  )  [4] => Array  (  [title] => 讲座回放|彭嘉恒—“南师、禅及其在西方  [link] => http://www.xxx.com/index.php/main/news/15518.html  [image] => http://www.xxx.com/uploadfiles/articles/20190507/15518.png  )  [5] => Array  (  [title] => 讲座预告|胡理琛—西湖七十年流变忆胜  [link] => http://www.xxx.com/index.php/main/news/15516.html  [image] => http://www.xxx.com/uploadfiles/articles/20190430/15516.jpg  )  )

  这样显然得到的网址是错误的。根本无法访问,也就无法采集了。

  经过一番查找,原来是dede 设置HTML的内容和来源网址 的函数问题,漏写端口判断了。

  在include/dedehtml2.class.php

  function SetSource 函数里大概79行加上红框里的内容:

image.png

  再测试一下。ok 了,这样网址就可以正常打开,采集到了。

  付上代码:

function SetSource(&$html, $url = '', $linktype='')  {  $this->__construct();  $this->CAtt = new DedeAttribute2();  $url = trim($url);  $this->SourceHtml = $html;  $this->BaseUrl = $url;  //判断文档相对于当前的路径  $urls = @parse_url($url);  $port=$urls['port']=='80'?'':':'.$urls['port'];//lyy 为80时候可以省略,否则就加上  $this->HomeUrl = $urls['host'].$port;  $this->BaseUrlPath = $this->HomeUrl.$urls['path'];  $this->BaseUrlPath = preg_replace("///([^//]*)/.(.*)$/","/",$this->BaseUrlPath);  $this->BaseUrlPath = preg_replace("///$/",'',$this->BaseUrlPath);  if($linktype!='')  {  $this->GetLinkType = $linktype;  }  if($html != '')  {  $this->Analyser();  }  }

发表评论 共有条评论
用户名: 密码:
验证码: 匿名发表