질문 & 답변 - 인프런 | 커뮤니티

묻고 답해요

169만명의 커뮤니티!! 함께 토론해봐요.

인프런 TOP Writers

미해결
[신규 개정판] 이것이 진짜 크롤링이다 - 실전편 (인공지능 수익화)

불필요한 div, p 코드 삽입 후 에러

안녕하세요. 샘불필요한 div, p 코드 사입 후 에러 발생 건 입니다. import requests from bs4 import BeautifulSoup import time req_header_dict = { # 요청헤더 : 브라우저 정보 'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.99 Safari/537.36' } response = requests.get("https://search.naver.com/search.naver?where=news&sm=tab_jum&query=%EC%86%90%ED%9D%A5%EB%AF%BC", headers= req_header_dict) html = response.text soup = BeautifulSoup(html, "html.parser") articles = soup.select("div.info_group") # 뉴스기사 div 10개 가져오기 for article in articles: links = article.select("a.info") # 결과는 리스트 if len(links) >= 2: url = links[1].attrs["href"] response = requests.get(url, headers= req_header_dict) html = response.text soup = BeautifulSoup(html, "html.parser") # 만약 뉴스라면 if "entertain" in response.url: title = soup.select_one(".end_tit") content = soup.select_one("#articeBody") # 스포츠 뉴스라면 elif "sports" in response.url: title = soup.select_one("h4.title") content =soup.select_one("#newsEndContents") # 본문 내용안에 불필요한 dvi 삭제 divs = content.select("div") for div in divs: div.decompose() paragraphs = content.select("p") for p in paragraphs: p.decompose() else: title = soup.select_one(".tit.title_area") content = soup.select_one("#newsct_article") print("##########링크##########",url) print("##########제목##########",title.text.strip()) print("##########본문##########",content.text.strip()) time.sleep(0.3)

yhahn02 · 2023.10.01 · [신규 개정판] 이것이 진짜 크롤링이다 - 실전편 (인공지능 수익화)

투표점수

0

조회수

236

답변

1
미해결
실습으로 끝장내는 웹 크롤링과 웹 페이지 자동화 & 실전 활용

html 출력문제

강의 내용 외 개인적인 실습 사이트의 질문은 답변이 제공되지 않습니다.문제가 생긴 코드, 에러 메세지 등을 꼭 같이 올려주셔야 빠른 답변이 가능합니다.코드를 이미지로 올려주시면 실행이 불가능하기 때문에 답변이 어렵습니다.답변은 바로 제공되지 않을 수 있습니다.실력 향상을 위해서는 직접 고민하고 검색해가며 해결하는 게 가장 좋습니다. import requestsfrom bs4 import BeautifulSoup url = "https://naver.com" req = requests.get(url) html = req.text print(html)주피터 노트에서는 실행이 되는데 비쥬얼스튜디오에서는 결과가 이렇게 나왔습니다. PS C:\Users\pw720> & C:/Users/pw720/AppData/Local/Programs/Python/Python311/python.exeon.exe AMD64)] on win32Python 3.11.5 (tags/v3.11.5:cce6ba9, Aug 24 2023, 14:38:34) [MSC v.1936 64 bit (AMD64)] on win32Type "help", "copyright", "credits" or "license" for more information.>>> print(html)Traceback (most recent call last):File "<stdin>", line 1, in <module>NameError: name 'html' is not defined>>> print(html)Traceback (most recent call last):File "<stdin>", line 1, in <module>NameError: name 'html' is not defined>>> beautifulsoup4설치가 안되는것같아 cmd에서 설치했는데 제대로 안깔려서 안되는건가요?

pw7208 · 2023.10.01 · 실습으로 끝장내는 웹 크롤링과 웹 페이지 자동화 & 실전 활용

투표점수

0

조회수

422

답변

2
미해결
[신규 개정판] 이것이 진짜 크롤링이다 - 기본편

네이버 자동로그인

안녕하세요 네이버 자동로그인을 실행하면로그인 창이 떠요.from selenium import webdriver from selenium.webdriver.chrome.service import Service from selenium.webdriver.chrome.options import Options from selenium.webdriver.common.by import By # 크롬 드라이버 자동 업데이트 from webdriver_manager.chrome import ChromeDriverManager # 브라우저 꺼짐 방지 chrome_options = Options() chrome_options.add_experimental_option('detach',True) # 불필요한 에러 메시지 없애기 chrome_options.add_experimental_option('excludeSwitches',['enable-logging']) service = Service(executable_path=ChromeDriverManager().install()) driver = webdriver.Chrome(service=service,options=chrome_options) # 웹페이지 해당 주소 이동 driver.implicitly_wait(5) driver.maximize_window() driver.get('https://nid.naver.com/nidlogin.login?mode=form&url=https://www.naver.com/') # 아이디 입력 창 driver.find_element(By.CSS_SELECTOR,'#id') id.click() id.send_keys('sand12') # 비밃번호 입력 창 pw = driver.find_element(By.CSS_SELECTOR,'#pw') pw.click() pw.send_keys('yiiit!@') # 로그인 버튼 login_btn = driver.find_element(By.CSS_SELECTOR,'#log\.login') login_btn.click()

sk a · 2023.09.26 · [신규 개정판] 이것이 진짜 크롤링이다 - 기본편

투표점수

0

조회수

680

답변

2
미해결
[신규 개정판] 이것이 진짜 크롤링이다 - 기본편

네이버 자동 로그인

안녕하세요 네이버 자동로그인 실행하면 네이버 로그인창이 떠요

sk a · 2023.09.26 · [신규 개정판] 이것이 진짜 크롤링이다 - 기본편

투표점수

0

조회수

1.04k

답변

2
미해결
[신규 개정판] 이것이 진짜 크롤링이다 - 기본편

vscode terminal 설정

안녕하세요 vscode terminal에서 cmd로 설정하면 한 번은cmd로 실행되고 두번 째 는 cmd 아래 python으로 실행됩니다여러가지로 cmd를 설정해도 마찬가지입니다답변 부탁드리면서 안녕히 계세요

sk a · 2023.09.26 · [신규 개정판] 이것이 진짜 크롤링이다 - 기본편

투표점수

0

조회수

432

답변

2
미해결
[신규 개정판] 이것이 진짜 크롤링이다 - 실전편 (인공지능 수익화)

구글 이미지 주소 추출 - 오류(고양이)

*. 질문 : 큰 이미지 주소추출에서 문제가 발생하는 듯 합니다. 해결점을 못 찾겠습니다. "고양이" from selenium import webdriver from selenium.webdriver.chrome.service import Service from selenium.webdriver.chrome.options import Options from selenium.webdriver.common.by import By from selenium.webdriver.common.keys import Keys import os import urllib.request import pyautogui # keyword = pyautogui.prompt("검색어를 입력하세요") if not os.path.exists("고양이"): os.mkdir("고양이") # 크롬 드라이버 자동 업데이트 from webdriver_manager.chrome import ChromeDriverManager import time import pyautogui import pyperclip # 브라우저 꺼짐 방지 chrome_options = Options() chrome_options.add_experimental_option("detach", True) # 불필요한 에러 메시지 없애기 chrome_options.add_experimental_option("excludeSwitches", ["enable-logging"]) service = Service(executable_path=ChromeDriverManager().install()) browser = webdriver.Chrome(service=service, options=chrome_options) # 웹페이지 해당 주소 이동 browser.implicitly_wait(10) # 웹페이지 로딩 될때가지 5초는 기다림 browser.maximize_window() #browser = webdriver.Chrome() browser.get("https://www.google.co.kr/search?q=%EA%B3%A0%EC%96%91%EC%9D%B4&tbm=isch&ved=2ahUKEwioo8HqscOBAxUM_WEKHdO9CDwQ2-cCegQIABAA&oq=%EA%B3%A0%EC%96%91%EC%9D%B4&gs_lcp=CgNpbWcQAzIECCMQJzIICAAQgAQQsQMyCAgAEIAEELEDMggIABCABBCxAzIICAAQgAQQsQMyCAgAEIAEELEDMggIABCABBCxAzIFCAAQgAQyCAgAEIAEELEDMgUIABCABDoLCAAQgAQQsQMQgwFQ9hJYiRlg7hpoAXAAeACAAY8BiAGMB5IBAzEuN5gBAKABAaoBC2d3cy13aXotaW1nwAEB&sclient=img&ei=eT4QZeiCOoz6hwPT-6LgAw&bih=933&biw=1680") before_h = browser.execute_script("return window.scrollY") # 무한 스크롤 while True: browser.find_element(By.CSS_SELECTOR, "body").send_keys(Keys.END) time.sleep(1) after_h = browser.execute_script("return window.scrollY") if after_h == before_h: break before_h = after_h # 썸네일 이미지 태크 추출 imgs = browser.find_elements(By.CSS_SELECTOR,".rg_i.Q4LuWd") for i, img in enumerate(imgs,1): # 각 이미지를 클릭해서 큰 사이즈를 찾음 img.click() time.sleep(2) # 큰 이미지 추출 target = browser.find_element("img.r48jcc.pT0Scc.iPVvYb") img_src = target.get_attribute("src") # 이미지 다운로드 # 크롤링 하다보면 http error 403: forbidden 에러가 납니다. opener = urllib.request.build_opener() opener.addheaders = [("User-Agent","Mozila/5.0")] urllib.request.install_opener(opener) urllib.request.urlretrieve(img_src,f"고양이{i}.jpg") # 이미지 저장

yhahn02 · 2023.09.25 · [신규 개정판] 이것이 진짜 크롤링이다 - 실전편 (인공지능 수익화)

투표점수

0

조회수

299

답변

1
미해결
[리뉴얼] 파이썬입문과 크롤링기초 부트캠프 [파이썬, 웹, 데이터 이해 기본까지] (업데이트)

굳굳

좋아요! 완전 초보들은 다운로드 받을 수 도 없어요.. 첫 시작에 도움이 많이 됩니다. 이렇게 다운로드 하나하나 전부 설명해주는거 너무 좋아요. 그래서 결제했어요. 공부 다했는데 처음부터 까먹어서 ㅠㅠ 다시 봣네요 감사합니다

kid4310 · 2023.09.25 · [리뉴얼] 파이썬입문과 크롤링기초 부트캠프 [파이썬, 웹, 데이터 이해 기본까지] (업데이트)

투표점수

0

조회수

258

답변

1
미해결
[신규 개정판] 이것이 진짜 크롤링이다 - 실전편 (인공지능 수익화)

selenium 에서 웹드라이버를 불러오지 못하는 오류납니다~

Microsoft Windows [Version 10.0.19045.3448](c) Microsoft Corporation. All rights reserved.C:\Users\user\data>C:/Users/user/AppData/Local/Programs/Python/Python311/python.exe c:/Users/user/data/sel.pyTraceback (most recent call last): File "C:\Users\user\AppData\Local\Programs\Python\Python311\Lib\site-packages\selenium\webdriver\common\driver_finder.py", line 38, in get_path path = SeleniumManager().driver_location(options) if path is None else path ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\user\AppData\Local\Programs\Python\Python311\Lib\site-packages\selenium\webdriver\common\selenium_manager.py", line 76, in driver_location browser = options.capabilities["browserName"] ^^^^^^^^^^^^^^^^^^^^AttributeError: 'str' object has no attribute 'capabilities'During handling of the above exception, another exception occurred:Traceback (most recent call last): File "c:\Users\user\data\sel.py", line 33, in <module> File "C:\Users\user\AppData\Local\Programs\Python\Python311\Lib\site-packages\selenium\webdriver\chrome\webdriver.py", line 45, in init super().__init__( File "C:\Users\user\AppData\Local\Programs\Python\Python311\Lib\site-packages\selenium\webdriver\chromium\webdriver.py", line 51, in init self.service.path = DriverFinder.get_path(self.service, options) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\user\AppData\Local\Programs\Python\Python311\Lib\site-packages\selenium\webdriver\common\driver_finder.py", line 40, in get_path msg = f"Unable to obtain driver for {options.capabilities['browserName']} using Selenium Manager." ^^^^^^^^^^^^^^^^^^^^

오유라 · 2023.09.24 · [신규 개정판] 이것이 진짜 크롤링이다 - 실전편 (인공지능 수익화)

투표점수

0

조회수

6.29k

답변

1
미해결
R로 하는 웹 크롤링 - 입문편

htmltab 패키지에서 url을 불러올 수 없습니다.

주식데이터 크롤링을 수강하고 있습니다. htmltab 패키지 설치 후 url를 정상적으로 입력을 했는데도 Error: Couldn't find a table. 이라는 메세지가 나옵니다. 무엇이 잘못되었을까요...

bj4496 · 2023.09.24 · R로 하는 웹 크롤링 - 입문편

투표점수

0

조회수

287

답변

1
미해결
[신규 개정판] 이것이 진짜 크롤링이다 - 실전편 (인공지능 수익화)

크롤링 에러 관련 문의

안녕하세요.아래 코드를 사용해서 '식품 로봇'이라는 검색어로 크롤링을 시도했는데요. URL에 지정한 기간에 존재하는 모든 기사를 수집하고자 하는데, 총 몇 페이지나 있는지 알 수가 없어서.. 페이지수를 2,000으로 넣어서 실행 해보았습니다.그런데, 크롤링이 잘 진행되다가 에러가 발생해서요. 혹시 이건 어떻게 수정할 수 있을지요?에러 문구:=======링크======= https://n.news.naver.com/mnews/article/025/0003239249?sid=101 Traceback (most recent call last): File "/Users/유저이름/startcoding/Chapter04/11.마지막페이지확인하기.py", line 64, in <module> print("=======제목======= \n", title.text.strip()) ^^^^^^^^^^ AttributeError: 'NoneType' object has no attribute 'text'실행 코드:import requests from bs4 import BeautifulSoup import time import pyautogui from openpyxl import Workbook from openpyxl.styles import Alignment # 사용자입력 keyword = pyautogui.prompt("검색어를 입력하세요") lastpage = int(pyautogui.prompt("몇 페이지까지 크롤링 할까요?")) # 엑셀 생성하기 wb = Workbook() # 엑셀 시트 생성하기 ws = wb.create_sheet(keyword) # 열 너비 조절 ws.column_dimensions['A'].width = 60 ws.column_dimensions['B'].width = 60 ws.column_dimensions['C'].width = 120 # 행 번호 row = 1 # 페이지 번호 page_num = 1 for i in range(1, lastpage * 10, 10): print(f"{page_num}페이지 크롤링 중 입니다.==========================") response = requests.get(f"https://search.naver.com/search.naver?sm=tab_hty.top&where=news&query={keyword}&start={i}") html = response.text # html은 response의 text 안에 위치함 soup = BeautifulSoup(html, 'html.parser') articles = soup.select("div.info_group") #뉴스 기사 div 10개 추출 # 기사가 10개니까 for문을 써서 하나하나 추출 필요 for article in articles: links = article.select("a.info") # a 태그, info class인 아이들을 가져옴. = 리스트 if len(links) >= 2: # 링크가 2개 이상이면 url = links[1].attrs['href'] # 두번째 링크의 href를 추출 # 다시 request 날려주기 response = requests.get(url, headers={'User-agent': 'Mozila/5.0'}) html = response.text soup_sub = BeautifulSoup(html, 'html.parser') print(url) # 연예 뉴스 체크 if "entertain" in response.url: title = soup_sub.select_one(".end_tit") content = soup_sub.select_one("#articeBody") elif "sports" in response.url: title = soup_sub.select_one("h4.title") content = soup_sub.select_one("#newsEndContents") # 본문 내용 안에 불필요한 div, p 삭제 divs = content.select("div") for div in divs: div.decompose() paragraphs = content.select("p") for p in paragraphs: p.decompose() else: title = soup_sub.select_one(".media_end_head_headline") content = soup_sub.select_one("#newsct_article") print("=======링크======= \n", url) print("=======제목======= \n", title.text.strip()) print("=======본문======= \n", content.text.strip()) ws[f'A{row}'] = url # A열에는 URL 기입 ws[f'B{row}'] = title.text.strip() ws[f'C{row}'] = content.text.strip() # 자동 줄바꿈 ws[f'C{row}'].alignment = Alignment(wrap_text=True) row = row + 1 time.sleep(0.3) # 마지막 페이지 여부 확인하기 isLastPage = soup.select_one("a.btn_next").attrs['aria-disabled'] if isLastPage == 'true': print("마지막 페이지 입니다.") break page_num = page_num + 1 wb.save(f'{keyword}_result.xlsx')

cherrykim90 · 2023.09.23 · [신규 개정판] 이것이 진짜 크롤링이다 - 실전편 (인공지능 수익화)

투표점수

0

조회수

398

답변

2
해결됨
[신규 개정판] 이것이 진짜 크롤링이다 - 실전편 (인공지능 수익화)

크롤링 기사 기간 설정

안녕하세요뉴스크롤링에서 크롤링 하고자 하는 뉴스의 기간을 정해주려면response = requests.get("https://search.naver.com/search.naver?where=news&sm=tab_jum&query=검색어") 위 코드의 " " 안에 뉴스기간을 옵션으로 설정하여 검색한 페이지의 URL을 긁어서 넣어주면 되는걸지요? 감사합니다.

cherrykim90 · 2023.09.22 · [신규 개정판] 이것이 진짜 크롤링이다 - 실전편 (인공지능 수익화)

투표점수

0

조회수

1.2k

답변

2
미해결
[신규 개정판] 이것이 진짜 크롤링이다 - 실전편 (인공지능 수익화)

코드가 작동이 되었다가 다시 안되는데요 ㅠㅠ

분명히 작동을 잘 했었는데,제가 어디서 잘못을 한것인지 아래와 같은 에러가 반복해서 발생합니다.startcoding/Chapter04/11.마지막페이지확인하기.py", line 62, in <module>print("=======링크======= \n", url)^^^NameError: name 'url' is not defined 강의를 뒤로가서 다시 작성해봐도... 이제는 02.본문내용스크롤부터 에러가 발생하고, Chapter04/02.뉴스본문내용크롤링하기.py", line 17, in <module>print(content.text)^^^^^^^^^^^^AttributeError: 'NoneType' object has no attribute 'text'"10.크롤링결과엑셀저장하기"에서도 돌아가다가 2페이지부터 이런 에러가 발생합니다. startcoding/Chapter04/10.크롤링결과엑셀저장하기.py", line 63, in <module> print("=======제목======= \n", title.text.strip()) ^^^^^^^^^^AttributeError: 'NoneType' object has no attribute 'text'제가 도대체 어디를 잘못하고 있는 걸까요 ㅠㅠ import requestsfrom bs4 import BeautifulSoupimport time # Time module 불러오기import pyautoguifrom openpyxl import Workbookfrom openpyxl.styles import Alignment# 사용자입력푸드keyword = pyautogui.prompt("검색어를 입력하세요")lastpage = int(pyautogui.prompt("몇 페이지까지 크롤링 할까요?"))# 엑셀 생성하기wb = Workbook()# 엑셀 시트 생성하기ws = wb.create_sheet(keyword)# 열 너비 조절ws.column_dimensions['A'].width = 60ws.column_dimensions['B'].width = 60ws.column_dimensions['C'].width = 120# 행번호row = 1# 페이지번호page_num = 1for i in range(1, lastpage * 10, 10):print(f"{page_num}페이지 크롤링 중입니다.===============")response = requests.get(f"https://search.naver.com/search.naver?where=news&ie=utf8&sm=nws_hty&query={keyword}&start={i}")html = response.textsoup = BeautifulSoup(html, 'html.parser')articles = soup.select("div.info_group") # 뉴스 기사 div 10개 추출(ctrl+F, div.info_group 검색후 10개로 확인)for article in articles:links = article.select("a.info") # 리스트: a 태그인데, class가 info인 것들 가지고 오기if len(links) >= 2: # 링크가 2개 이상이면url = links[1].attrs['href'] # 두번째 링크의 href를 추출response = requests.get(url, headers={'User-agent':'Mozila/5.0'})html = response.textsoup = BeautifulSoup(html, 'html.parser')# 연예 뉴스 체크if "entertain" in response.url:title = soup.select_one(".end_tit")content = soup.select_one("#articeBody")elif "sports" in response.url:title = soup.select_one("h4.title")content = soup.select_one("#newsEndContents")# 본문 내용 안에 불필요한 div 삭제 (기사 본문 이후 내용들)divs = content.select("div")for div in divs:div.decompose()paragraphs = content.select("p")for p in paragraphs:p.decompose()else:title = soup.select_one(".media_end_head_headline")content = soup.select_one("#newsct_article")print("=======링크======= \n", url)print("=======제목======= \n", title.text.strip())print("=======본문======= \n", content.text.strip())ws[f'A{row}'] = urlws[f'B{row}'] = title.text.strip()ws[f'C{row}'] = content.text.strip()# 자동 줄바꿈ws[f'C{row}'].alignment = Alignment(wrap_text=True)row = row + 1time.sleep(0.3) # 프로그램을 0.3초 정도 휴식 주기 (서버 부담 줄여주기, 프로그램 안정성 up)page_num = page_num + 1wb.save(f'{keyword}_result.xlsx')

2023.09.22 · [신규 개정판] 이것이 진짜 크롤링이다 - 실전편 (인공지능 수익화)

투표점수

0

조회수

270

답변

1
미해결
[리뉴얼] 파이썬입문과 크롤링기초 부트캠프 [파이썬, 웹, 데이터 이해 기본까지] (업데이트)

뉴스 기사 크롤

위 페이지에서 기사 제목: 김혜수 "실패 없을 것 같은 내 이력..."<span class="text">를 출력하고자 합니다. select를 이용해서 출력하고자 하는데, 어떤 것이 잘못되었는지 모르겠습니다. items = soup.select('ol#topViewArticlesContainer p.title span.text') ol id가 topViewArticlesContainer 밑에p 태그의 class가 title인 것의 밑에span 태그의 class가 text인 것을 뽑으려고 했는데,, 어떤게 잘못 된건지 알 수 있을까요?

sangschool · 2023.09.21 · [리뉴얼] 파이썬입문과 크롤링기초 부트캠프 [파이썬, 웹, 데이터 이해 기본까지] (업데이트)

투표점수

0

조회수

319

답변

1
미해결
[신규 개정판] 이것이 진짜 크롤링이다 - 실전편 (인공지능 수익화)

임포트가 잘 안되요~~~~

- 학습 관련 질문을 남겨주세요. 상세히 작성하면 더 좋아요! - 먼저 유사한 질문이 있었는지 검색해보세요. - 서로 예의를 지키며 존중하는 문화를 만들어가요. - 잠깐! 인프런 서비스 운영 관련 문의는 1:1 문의하기를 이용해주세요.

오유라 · 2023.09.20 · [신규 개정판] 이것이 진짜 크롤링이다 - 실전편 (인공지능 수익화)

투표점수

1

조회수

325

답변

3
미해결
[신규 개정판] 이것이 진짜 크롤링이다 - 실전편 (인공지능 수익화)

안녕하세요. Response 안쓰고 진행중입니다..

아래와 같이 코드를 작성했습니다. Response를 안쓰고 진행했는데 뉴스기사는 출력이 되지만 연예기사가 출력이 안됩니다 ㅠㅠ from selenium import webdriver from selenium.webdriver.chrome.service import Service from selenium.webdriver.chrome.options import Options from selenium.webdriver.common.by import By from selenium.webdriver.common.keys import Keys from bs4 import BeautifulSoup # 크롬 드라이버 자동 업데이트 from webdriver_manager.chrome import ChromeDriverManager import time import pyautogui import pyperclip import csv # 브라우저 꺼짐 방지 chrome_options = Options() chrome_options.add_experimental_option("detach", True) # 크롬창 안뜨게 함 chrome_options.add_argument('--headless') # headless 모드 활성화 chrome_options.add_argument('--disable-gpu') # GPU 가속 비활성화 # Mozilla 웹 브라우저에서 온 것처럼 인식 / 자동화된 요청을 감지하고 차단하는 것을 우회 chrome_options.add_argument("--user-agent=Mozilla/5.0") # 불필요 메세지 없애기 chrome_options.add_experimental_option("excludeSwitches", ["enable-logging"]) # 드라이버 업데이트 service = Service(executable_path=ChromeDriverManager().install()) # 옵션 적용 browser = webdriver.Chrome(service=service, options=chrome_options) news = pyautogui.prompt('뉴스기사 입력 >>> ') print(f'{news} 검색') # 웹페이지 해당 주소 이동 path = f'https://search.naver.com/search.naver?where=news&sm=tab_jum&query={news}' # url 대화 browser.get(path) # 네이버에서 html 줌 html = browser.page_source soup = BeautifulSoup(html, 'html.parser') articles = soup.select("div.info_group") # 뉴스 기사 div 10개 추출 for article in articles: links = article.select("a.info") if len(links) >= 2: # 링크가 2개 이상이면 url = links[1].attrs['href'] # 두번째 링크의 href 추출 # 다시 한번 받아옴 browser.get(url) html = browser.page_source soup = BeautifulSoup(html, 'html.parser') # 연예뉴스라면 -> ? div 모양이 다름 if 'entertain' in url: title = soup.select_one(".end_tit") content = soup.select_one('#articeBody') else: title = soup.select_one("#title_area") content = soup.select_one('#dic_area') # 해당 링크 본문의 아이디값 가져옴 print("=============링크==========\n", url) print("=============제목==========\n", title.text.strip()) print("=============내용==========\n", content.text.strip()) time.sleep(0.7) print('\nDvlp.H.Y.C.Sol\n') 출력은 이렇게 나옵니다.=============링크========== https://n.news.naver.com/mnews/article/382/0001075938?sid=106Traceback (most recent call last): File "c:\Users\cksth\OneDrive\바탕 화면\Career\크롤링\심화\02.연예뉴스.py", line 71, in <module> print("=============제목==========\n", title.text.strip())AttributeError: 'NoneType' object has no attribute 'text

찬솔 · 2023.09.18 · [신규 개정판] 이것이 진짜 크롤링이다 - 실전편 (인공지능 수익화)

투표점수

0

조회수

238

답변

1
미해결
[신규 개정판] 이것이 진짜 크롤링이다 - 기본편

스타트코딩님 코드 질문

스타트코딩님이 댓글에 올려주신 코드 중에새창을 바라보게 만들기라는 부분이 있는데이 코드가 없으면 정상적으로 작동되지 않더라구요.쇼핑 탭을 눌러서 새로운 탭이 생긴 것은 맞는데 원래 쇼핑 탭을 눌러서 새로운 탭이 생겼으면 새로운 탭을 바라보는 거 아닌가요?

이승연 · 2023.09.18 · [신규 개정판] 이것이 진짜 크롤링이다 - 기본편

투표점수

0

조회수

324

답변

1
미해결
[신규 개정판] 이것이 진짜 크롤링이다 - 기본편

네이버맵 크롤링 오류

안녕하세요 덕분에 크롤링 강의를 공부할 수 있었고 그 후 네이버맵 식당관련정보를 크롤링하려고 하는데 문제가 생겨 고생고생하다가 질문글을 남깁니다!데이터가 많아 구글링을 통해 멀티프로세싱으로 여러창을 띄워서 크롤링을 할 수 있었는데요if __name__ == "__main__": start_time = time.time() num_cores = 6 pool = multiprocessing.Pool(num_cores) # 검색어 배열 keywords = ['서울숲 식당', '건대 식당', '성수 식당', '홍대 식당', '신촌 식당', '이대 식당', '상수 식당', '합정 식당', '한남 식당', '명동 식당'] pool.map(get_data,keywords) pool.close() pool.join() # print(" ----------------------------------------- ") # print(" 실행 소요 시간 : 단위(초) ") # print(" ----------------------------------------- ") # print(time.time() - start_time) # print(" ----------------------------------------- ") conn.close()이런식으로 메인함수를 작성하고 get_data함수는 def get_data(keyword): browser = webdriver.Chrome() table_nm = "" type = '' URL = 'https://map.naver.com/v5/search/' + keyword print(URL) browser.get(URL) browser.implicitly_wait(10) browser.maximize_window() # iframe(searchIframe) 전환 switch_frame("searchIframe",browser) # iframe 안쪽을 한번 클릭하기 browser.find_element(By.CSS_SELECTOR,"#_pcmap_list_scroll_container").click()키워드를 받아 해당위치 식당을 검색하고 그 후 크롤링을 이어나가는 상황입니다.그런데 여러창이 띄워져서 처음에 https://map.naver.com/v5/search/홍대 식당이런식으로 들어가면 자동화된 크롬창에서 정보들이 뜨지 않는 문제를 겪고 있습니다 구글링해봐도 저와같은 문제상황을 찾지 못했습니다 도와주시면 정말 감사하겠습니다!!

heebum417 · 2023.09.18 · [신규 개정판] 이것이 진짜 크롤링이다 - 기본편

투표점수

0

조회수

278

답변

1
미해결
[신규 개정판] 이것이 진짜 크롤링이다 - 실전편 (인공지능 수익화)

파이썬 코드 실행/pip 오류 등

안녕하세요, 저번에 친절하게 답변해주셔서 감사합니다. 강의 수강 중에 크롤링 코드를 작성 후, 정상 작동을 확인한 다음에 다른 PC에서 파이썬/Visual Studio Code를 설치하여 파일을 실행했는데,정상적으로 크롤링이 작동하지 않는 오류가 발생했습니다 ㅜㅜ 처음에는 라이브러리 설치를 전부 진행했었는데,아래와 같은 오류가 발생했었습니다 [현재는 해결된 현상] import 모듈(?) 오류import requests 를 작성하면 requests 부분이 초록색이 되어야 하는데, 흰 글씨가 되는 현상 pip install --upgrade 오류해당 명령어를 사용하면 upgrade가 진행되지 않고,ERROR : You must give at least one requirement to install (see "pip help install") 이라는 문구만 출력됩니다.(혹시 몰라서 원래 잘 되던 기존 PC에 입력해보니까 다른 명령어로 쓰라면서 notice가 출력됐었습니다. 기존 PC에서는 아무런 설명도 없이 오류만 떠요ㅜㅜ) 일단 기본적으로 코드를 실행하면 크롤링 후에 엑셀 파일이 생성되어야 하는데 결과적으로는 안 됩니다..혹시 도움을 받을 수 있을까요? 현재까지 시도해 본 것들 1) Python , Visual Studio Code 삭제 및 재설치 , 윈도우 버전 확인 등 2) Python 환경 변수 설정 (기존 PC에는 따로 환경 변수 설정을 하지 않아도 잘 작동하는 점 확인) 3) Visual Studio Code 재실행, 컴퓨터 재부팅 4) cmd 에서 Python 정상 설치 확인 5) pip 삭제 후 재설치 진행 (upgrade는 못한 상태 6) 기존 PC와 현 PC의 코드 크로스 체크 (특이사항 없음 확인)

프로그램초 · 2023.09.18 · [신규 개정판] 이것이 진짜 크롤링이다 - 실전편 (인공지능 수익화)

투표점수

0

조회수

2.09k

답변

2
미해결
[신규 개정판] 이것이 진짜 크롤링이다 - 실전편 (인공지능 수익화)

부동산 크롤링 강좌 이벤트

선생님 안녕하세요~~ 몇 일 전에 이벤트 참여하려고 블로그에 글쓰고 선생님에게 메일을 보냈습니다.이거 완강하고 부동산도 꼭 듣고싶네요 ㅎㅎ 너무 유용한 강의 감사합니다. 메일 확인 부탁드려요 !!

jerry · 2023.09.18 · [신규 개정판] 이것이 진짜 크롤링이다 - 실전편 (인공지능 수익화)

투표점수

0

조회수

304

답변

1
미해결
ChatGPT 실무에 100% 활용하기

이버 쇼핑 최저가 검색 코드를 ChatGpt에게 문의하는 부분

네이버 쇼핑 최저가 검색 코드를 ChatGpt에게 문의하는 부분에서 강사님과 다른 코드를 작성해 답변해 줍니다.import requestsfrom bs4 import BeautifulSoup# Naver 쇼핑에서 특정 제품의 최저 가격 정보를 검색하려면 Python을 사용할 수 있습니다.# 아래는 시작하기 위한 Python 코드 예제입니다.# 특정 제품에 대한 Naver 쇼핑 검색 결과 페이지의 URL을 지정합니다.product_url = 'https://search.shopping.naver.com/search/all?query=여기에_제품_이름_입력'# URL로 GET 요청을 보냅니다.response = requests.get(product_url)# 요청이 성공했는지 확인합니다 (상태 코드 200).if response.status_code == 200: # 페이지의 HTML 내용을 파싱합니다. soup = BeautifulSoup(response.text, 'html.parser') # 제품 이름 및 가격과 같은 제품 정보를 포함하는 요소를 찾습니다. product_elements = soup.find_all('div', class_='basicList_info_area__17Xyo') if product_elements: # 최저 가격과 제품 이름을 추적하는 변수를 초기화합니다. lowest_price = None product_name = None for product in product_elements: # 제품 이름과 가격을 추출합니다. name = product.find('a', class_='basicList_link__1MaTN').text.strip() price = product.find('span', class_='price_num__2WUXn').text.strip() # 가격을 정수로 변환합니다 (화폐 기호, 쉼표 등을 제거합니다). price = int(price.replace('원', '').replace(',', '')) # 최저 가격을 찾았거나 또는 최저 가격이 아직 없는 경우 업데이트합니다. if lowest_price is None or price < lowest_price: lowest_price = price product_name = name if lowest_price is not None and product_name is not None: # 최저 가격과 제품 이름을 출력합니다. print(f"'{product_name}' 제품의 최저 가격은 {lowest_price} 원입니다.") else: print("제품 정보를 찾을 수 없습니다.") else: print("페이지에서 제품 정보를 찾을 수 없습니다.")else: print("웹페이지 검색에 실패했습니다. URL 또는 네트워크 연결을 확인하세요.")이런 코드를 gpt가 제공하는데 어떻게 해야 강사님과 같은 코드가 출력 될까요?

jypark22c · 2023.09.12 · ChatGPT 실무에 100% 활용하기

투표점수

0

조회수

402

답변

1

인기 태그

주간 인기글