네이버쇼핑(닭가슴살) 코드 문의드립니다.

Question

안녕하세요~ 닭가슴살 과제 해보던 중 첫번째 상품만 출력되어 문의드립니다. 돌려보면 대부분 첫번째 상품만 출력되지만, 어쩔 땐 애매하게 10개정도 출력이 될때도 있고, 좀 복불복이더라구요.. 무엇이 문제일까요?ㅠㅠ from selenium import webdriver from selenium.webdriver.common.by import By from selenium.webdriver.common.keys import Keys from bs4 import BeautifulSoup import time import pandas as pd data = [] #크롬 드라이버 생성 driver = webdriver.Chrome() # 페이지 이동 driver.get("https://search.shopping.naver.com/search/all?query=%EB%8B%AD%EA%B0%80%EC%8A%B4%EC%82%B4") html = driver.page_source soup = BeautifulSoup(html, 'html.parser') #스크롤 전 높이 last_height = driver.execute_script("return document.body.scrollHeight") while True: #스크롤 끝까지 내리기 driver.execute_script("window.scrollTo(0,document.body.scrollHeight)") #잠시 대기해줘야, 한번만 작동하지 않고 무한반복함 time.sleep(2) #스크롤 후 높이 new_height = driver.execute_script("return document.body.scrollHeight") #비교 (if, break문) if new_height == last_height : break #스크롤 전 높이 업데이트 last_height = new_height products = soup.select(".product_item__KQayS") for product in products: # 상품명 name = product.select_one(".product_title__ljFM_").text # 상세페이지 링크 link = product.select_one(".product_title__ljFM_>a").attrs['href'] # 가격 price = product.select_one(".price_num__Y66T7").text.split('원')[0].replace(',','') print(name, link, price) data.append([name, link, price])

m_r_n · Answer

결국 순서의 문제였군요! 스크롤을 모두 완료해서 콘텐츠 로딩을 완료한 다음에 HTML을 업데이트해줘야 하는군요.^^ 감사합니다.

인프런 AI 인턴 · Answer

안녕하세요, 인프런 AI 인턴입니다. m_r_n님. 네이버 쇼핑 크롤링 시에 첫 번째 상품만 출력되거나, 일정한 수의 상품만 출력되는 문제를 겪고 계신 것 같습니다. 이는 주로 BeautifulSoup 객체를 한 번 생성한 후 이를 반복해서 처리할 때 발생할 수 있는 문제입니다. 웹 페이지의 동적 콘텐츠 로딩까지 고려하지 않으면 전체 데이터를 가져오지 못할 수 있습니다. 문제의 원인 및 해결 방법 BeautifulSoup 객체 위치 : 현재 코드에서는 driver.page_source 를 초기 드라이버에서만 사용하고 있습니다. soup 객체는 콘텐츠가 로드되기 전의 HTML을 파싱하므로, 전체 데이터를 가져오지 못할 수 있습니다. 스크롤 작업이 완료된 후 다시 HTML을 가져와야 합니다. 스크롤 후 데이터 갱신 : 스크롤을 통해 더 많은 목록을 볼 수 있도록 페이지가 업데이트된 후, 새로 로드된 HTML을 다시 가져와야 합니다. 수정된 코드 예시는 다음과 같습니다: # 크롬 드라이버 생성 driver = webdriver.Chrome() # 페이지 이동 driver.get("https://search.shopping.naver.com/search/all?query=%EB%8B%AD%EA%B0%80%EC%8A%B4%EC%82%B4") # 스크롤 전 높이 last_height = driver.execute_script("return document.body.scrollHeight") while True: # 스크롤 끝까지 내리기 driver.execute_script("window.scrollTo(0,document.body.scrollHeight);") time.sleep(2) # 스크롤 후 높이 new_height = driver.execute_script("return document.body.scrollHeight") if new_height == last_height: break last_height = new_height # 페이지 소스 갱신 html = driver.page_source soup = BeautifulSoup(html, 'html.parser') # 상품 정보 추출 products = soup.select(".product_item__KQayS") for product in products: name = product.select_one(".product_title__ljFM_").text link = product.select_one(".product_title__ljFM_>a").attrs['href'] price = product.select_one(".price_num__Y66T7").text.split('원')[0].replace(',', '') print(name, link, price) data.append([name, link, price]) 이 코드에서는 스크롤을 완료한 후 driver.page_source 를 다시 호출하여 업데이트된 HTML을 파싱하도록 했습니다. 스크롤이 완료된 후 HTML을 최신 상태로 가져와 BeautifulSoup 객체를 갱신한 다음, 상품 정보를 추출하도록 했습니다. 추가로 다음 링크들이 도움이 될 수 있으니 참고 부탁드립니다: 네이버쇼핑 질문 있습니다! 첫페이지 크롤링에서 For 문까지는 작동이 되네요. 그러나 쿠팡 페이지 지정 출력 저는 질문자님의 이해를 돕고자 지식공유자님의 기존 답변을 참고하여 유사한 답변 링크를 추천해드리고 있습니다. 현재 베타 기능으로 답변이 만족스럽지 않을 수 있는 점 양해 부탁드립니다. 🙏 추가적으로 궁금한 점이 있으시면, 이어서 질문해 주세요. 곧 지식공유자께서 답변해 주실 것입니다.