久久久久久久视色,久久电影免费精品,中文亚洲欧美乱码在线观看,在线免费播放AV片

<center id="vfaef"><input id="vfaef"><table id="vfaef"></table></input></center>

    <p id="vfaef"><kbd id="vfaef"></kbd></p>

    
    
    <pre id="vfaef"><u id="vfaef"></u></pre>

      <thead id="vfaef"><input id="vfaef"></input></thead>

    1. 站長資訊網(wǎng)
      最全最豐富的資訊網(wǎng)站

      介紹python 數(shù)據(jù)抓取三種方法

      介紹python 數(shù)據(jù)抓取三種方法

      免費學(xué)習(xí)推薦:python視頻教程

      三種數(shù)據(jù)抓取的方法

      1. 正則表達式(re庫)
      2. BeautifulSoup(bs4)
      3. lxml

      *利用之前構(gòu)建的下載網(wǎng)頁函數(shù),獲取目標(biāo)網(wǎng)頁的html,我們以https://guojiadiqu.bmcx.com/AFG__guojiayudiqu/為例,獲取html。

      介紹python 數(shù)據(jù)抓取三種方法

      from get_html import download  url = 'https://guojiadiqu.bmcx.com/AFG__guojiayudiqu/'page_content = download(url)

      *假設(shè)我們需要爬取該網(wǎng)頁中的國家名稱和概況,我們依次使用這三種數(shù)據(jù)抓取的方法實現(xiàn)數(shù)據(jù)抓取。
      1.正則表達式

      from get_html import downloadimport re  url = 'https://guojiadiqu.bmcx.com/AFG__guojiayudiqu/'page_content = download(url)country = re.findall('class="h2dabiaoti">(.*?)</h2>', page_content) #注意返回的是listsurvey_data = re.findall('<tr><td bgcolor="#FFFFFF" id="wzneirong">(.*?)</td></tr>', page_content)survey_info_list = re.findall('<p>  (.*?)</p>', survey_data[0])survey_info = ''.join(survey_info_list)print(country[0],survey_info)

      2.BeautifulSoup(bs4)

      from get_html import downloadfrom bs4 import BeautifulSoup  url = 'https://guojiadiqu.bmcx.com/AFG__guojiayudiqu/'html = download(url)#創(chuàng)建 beautifulsoup 對象soup = BeautifulSoup(html,"html.parser")#搜索country = soup.find(attrs={'class':'h2dabiaoti'}).text survey_info = soup.find(attrs={'id':'wzneirong'}).textprint(country,survey_info)

      3.lxml

      from get_html import downloadfrom lxml import etree #解析樹url = 'https://guojiadiqu.bmcx.com/AFG__guojiayudiqu/'page_content = download(url)selector = etree.HTML(page_content)#可進行xpath解析country_select = selector.xpath('//*[@id="main_content"]/h2') #返回列表for country in country_select:     print(country.text)survey_select = selector.xpath('//*[@id="wzneirong"]/p')for survey_content in survey_select:     print(survey_content.text,end='')

      運行結(jié)果:
      介紹python 數(shù)據(jù)抓取三種方法
      最后,引用《用python寫網(wǎng)絡(luò)爬蟲》中對三種方法的性能對比,如下圖:
      介紹python 數(shù)據(jù)抓取三種方法
      僅供參考。

      相關(guān)免費學(xué)習(xí)推薦:python教程(視頻)

      贊(0)
      分享到: 更多 (0)
      網(wǎng)站地圖   滬ICP備18035694號-2    滬公網(wǎng)安備31011702889846號