뉴스 본문 크롤링

Question

안녕하세요!! 이전 질문에 이어서 알려주신 방법을 활용하여 다음 뉴스 url크롤링까지는 성공했는데 뉴스 본문의 css선택자가 네이버와는 다르게 구성되어 있어 본문이 크롤링 되지 않습니다ㅠ 질문 드리기 전에 이방법 저방법 다 해봤는데 몇 시간째 계속 오류만 떠서 머리에 쥐가 날 것 같습니다ㅠ 코드는 아래와 같습니다. import requests from bs4 import BeautifulSoup response = requests . get ( "https://news.daum.net/breakingnews/politics" ) html = response . text soup = BeautifulSoup ( html , "html.parser" ) articles = soup . select ( "div.cont_thumb" ) # 뉴스 기사 div 10개 추출 for article in articles : links = article . select ( "a.link_txt" ) url = links [ 0 ]. attrs [ 'href' ] response = requests . get ( url , headers ={ 'User-Agent' : 'Mozila/5.0' }) html = response . text soup = BeautifulSoup ( html , 'html.parser' ) #print(url) content = soup . select_one ( '#mArticle > #harmonyContainer > section' ) print ( content . text )

스타트코딩 · Answer

제가 다른 질문에 답변 드렸습니다. 코드는 다시 써드릴게요 :) import requests from bs4 import BeautifulSoup response = requests . get ( "https://news.daum.net/breakingnews/politics" ) html = response . text soup = BeautifulSoup ( html , "html.parser" ) links = soup . select ( "strong.tit_thumb > a.link_txt" ) articles = soup . select ( "div.cont_thumb" ) # 뉴스 기사 div 10개 추출 for link in links : title = link . text #태그 안에 텍스트 요소를 가져온다 # print(title) url = link . attrs [ 'href' ] #href의 속성값을 가져온다 print ( title , url ) #print(url) response = requests . get ( url , headers ={ 'User-agent' : 'Mozila/5.0' }) html = response . text soup = BeautifulSoup ( html , "html.parser" )