Inflearn Community Q&A

smile9740776

asked

Introduction to Python and Creating Various Automated Applications Using Web Crawling

Extracting necessary data from the web using Python urllib (1)

구글에서 고양이 사진 100장 가져오기

Written on

235

안녕하세요~

beautifulsoup를 사용하여
구글에서 고양이 사진 100장 가져오는 실습을 완료해보고 싶은데

구글에서 고양이 사진 20장 밖에 가져오지 못하고 있습니다.

아래는 제 소스코드 입니다.
1. 어떤 부분을 추가해야 100장을 가져올 수 있을까요?

2. 그리고 왜 20장 밖에 가져오지 못하는 걸까요?

[소스코드]

# Issue : 어떻게 100장 가져올 수 있을까?

from bs4 import BeautifulSoup

import urllib.request as req

import urllib.parse as rep

import os

opener = req.build_opener()

opener.addheaders = [('User-agent', 'Mozilla/5.0')]

req.install_opener(opener)

savePath = "/Users/kimhyeyeong/Documents/section2/google/"

base = "https://www.google.com/search?q="

input_quote = input("구글에서 어떤 이미지를 가져오고 싶습니까")

quote = rep.quote_plus(input_quote)

end = "&source=lnms&tbm=isch&sa=X&ved=0ahUKEwic4eDlhpjjAhWDwrwKHdbRCeQQ_AUIECgB&biw=1440&bih=766&dpr=2"

url = base + quote + end

res = req.urlopen(url)

savePath = "/Users/kimhyeyeong/Documents/section2/google/"

try:

if not (os.path.isdir(savePath)):

os.makedirs(os.path.join(savePath))

except OSError as e:

if e.errno != errno.EEXIST:

print("폴더 만들기 실패")

raise

soup = BeautifulSoup(res, "html.parser")

img_list = soup.select("table.images_table > tr > td > a > img")

for i, img_list in enumerate(img_list, 1):

fullFileName = os.path.join(savePath, savePath+str(i) + '.jpg')

req.urlretrieve(img_list['src'], fullFileName)

print("다운로드 완료")

python웹-크롤링

Answer 1

niceman

Instructor

안녕하세요. smile974님

구글 이미지는 마우스 이벤트가 있어서 개발자 도구로 보시면 하단으로 스크롤시 request <-> response

패턴이 보이고 있습니다.

Selenium 을 후반부에 배우신 후 시도해 보시는 것을 추천드립니다.

약간은 어려운 작업이 될 것입니다. 성공하시면 크롤링의 많은 노하우를 습득하실 수 있을 거라 생각합니다.

기회가 된다면, 해당 예제를 제가 실습으로 추가해보겠습니다.

감사합니다.

smile9740776

asked

Ask a question