Youtube 댓글 수집에 관한 건

안녕하세요.
Youtube API를 이용하여 댓글을 수집하고 있는데,
해당 키워드에서 최대 50개의 영상과 500개의 댓글이 수집되게 코드를 짜보았습니다.
하지만 수집된 데이터를 확인해 보니, 댓글들이 전부 수집이 되지 않았더라구요.
next_page_token을 사용하였는데 어디서 잘못된건지 모르겠습니다.
(API 할당량 때문인지도 궁금합니다..!)

코드를 아래 첨부하였습니다.
괜찮으시다면 피드백 한 번씩 부탁드립니다...!

확인해주셔서 감사합니다.
추운 날씨 건강 조심하세요!

<해당 코드>

import csv
from googleapiclient.discovery import build
from googleapiclient.errors import HttpError

def youtube_search(api_key, keyword, max_video_results, max_comment_results, published_after=None, published_before=None):
    youtube = build('youtube', 'v3', developerKey=api_key)

    search_response = youtube.search().list(
        q=keyword,
        part='id,snippet',
        maxResults=max_video_results,
        type='video',
        publishedAfter=published_after,
        publishedBefore=published_before
    ).execute()

    video_ids = []
    video_details = []

    

    for item in search_response['items']:
        video_ids.append(item['id']['videoId'])
        video_details.append({'title': item['snippet']['title'], 'description': item['snippet']['description'], 'published_at': item['snippet']['publishedAt']})

    comments = []
    for video_id in video_ids:
        next_page_token = None

        try:
            for _ in range(5):
                comment_response = youtube.commentThreads().list(
                    videoId=video_id,
                    part='snippet',
                    maxResults=max_comment_results,
                    pageToken=next_page_token
                ).execute()

                comments.extend(comment_response['items'])

                next_page_token = comment_response.get('nextPageToken')
                if not next_page_token:
                    break
        except HttpError as error:
            if error.resp.status in [403, 404]:
                continue
            else:
                raise error

    return video_details, comments

def save_to_csv(video_details, comments, output_file):
    with open(output_file, 'w', newline='', encoding='utf-8') as file:
        writer = csv.writer(file)
        writer.writerow(['Video Title', 'Video Description', 'Published At', 'Comment'])

        for video, comment in zip(video_details, comments):
            writer.writerow([video['title'], video['description'], video['published_at'], comment['snippet']['topLevelComment']['snippet']['textDisplay']])

api_key = "API Key"
keyword = "스니커즈 리셀"
max_video_results = 50
max_comment_results = 500
years = [2019, 2020, 2021, 2022]

# 해당 월의 마지막 날을 계산하는 함수
def days_in_month(year, month):
    if month == 2:
        if year % 4 == 0 and (year % 100 != 0 or year % 400 == 0):
            return 29
        else:
            return 28
    elif month in [4, 6, 9, 11]:
        return 30
    else:
        return 31

for year in years:
    for month in range(1, 13):
        # 해당 월의 시작일과 종료일 설정
        first_day = f'{year}-{month:02d}-01T00:00:00Z'
        last_day = f'{year}-{month:02d}-{days_in_month(year, month):02d}T23:59:59Z'

        video_response, comments = youtube_search(api_key, keyword, max_video_results, max_comment_results, published_after=first_day, published_before=last_day)

        output_file = f"youtube_search_results_{year}_{month:02d}.csv"
        save_to_csv(video_response, comments, output_file)

안녕하세요, 인프런 AI 인턴이에요.

댓글 수집 코드에 대한 질문이신 것 같아요. 코드를 확인해보니 nextPageToken 사용에 대한 부분이 조금 문제가 될 수 있어 보입니다.

nextPageToken은 다음 페이지가 존재할 경우 다음 페이지를 호출하기 위한 값으로, 이전 페이지에서 받아온 값을 사용하면 됩니다. 그런데 코드에는 nextPageToken을 여러번 사용하면서 이전 페이지의 값이 아닌 None으로 초기화 하고 있어요. 이로 인해 모든 댓글이 수집되지 않고 중간에 멈출 수는 있을 것 같습니다. 일단 이 문제를 해결하고 나면 API 할당량과 관련된 문제가 있다면 다시 알려주세요.

더불어 코드를 좀 더 개선할 수 있는 부분도 있는데, comments.extend() 부분에서 items를 대신 items[].get(‘snippet’)과 같이 사용하시면 더 간결한 코드를 작성하실 수 있습니다. 이 외에도 try/except 부분에서 403, 404 에러만 처리하고 나머지 에러를 raise 하는 방법도 있습니다.

이상으로 피드백이었습니다. 궁금한 점이나 추가적인 질문이 있으시면 언제든지 댓글로 답변드릴게요. 건강 조심하세요!

인프런 커뮤니티 질문&답변