묻고 답해요
161만명의 커뮤니티!! 함께 토론해봐요.
인프런 TOP Writers
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[빠짝스터디] 3주차 과제 제품 현황 분석 : 리텐션 분석
과제. 직접 retention 구해보기(주별 weekly)WITH base AS ( SELECT DISTINCT user_id, user_pseudo_id, event_name, DATE(DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul')) AS event_date, DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul') AS event_datetime FROM advanced.app_logs WHERE event_date BETWEEN "2022-08-01" AND "2022-11-03" ), first_week_and_diff AS( SELECT *, -- DATE_DIFF(event_date, first_date, DAY) AS diff_of_day, DATE_DIFF(event_week, first_week, WEEK) AS diff_of_week FROM ( SELECT DISTINCT user_pseudo_id, DATE_TRUNC(MIN(event_date) OVER(PARTITION BY user_pseudo_id), WEEK(MONDAY)) AS first_week, event_date, DATE_TRUNC(event_date, WEEK(MONDAY)) AS event_week FROM base ) ), user_cnt_and_diff AS( SELECT diff_of_week, COUNT(DISTINCT user_pseudo_id) AS user_cnt FROM first_week_and_diff GROUP BY ALL ORDER BY diff_of_week ), retain_base AS ( SELECT *, FIRST_VALUE(user_cnt) OVER (ORDER BY diff_of_week) AS first_user_cnt FROM user_cnt_and_diff ) SELECT diff_of_week, ROUND(SAFE_DIVIDE(user_cnt, first_user_cnt), 2) AS retention FROM retain_base과제. 월별 리텐션 쿼리 작성해보기WITH base AS ( SELECT DISTINCT user_id, user_pseudo_id, event_name, DATE(DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul')) AS event_date, DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul') AS event_datetime FROM advanced.app_logs WHERE event_date BETWEEN "2022-08-01" AND "2022-11-03" ), first_month_and_diff AS( SELECT *, -- DATE_DIFF(event_date, first_date, DAY) AS diff_of_day, -- DATE_DIFF(event_week, first_week, WEEK) AS diff_of_week DATE_DIFF(event_month, first_month, MONTH) AS diff_of_month FROM ( SELECT DISTINCT user_pseudo_id, DATE_TRUNC(MIN(event_date) OVER(PARTITION BY user_pseudo_id), MONTH) AS first_month, event_date, DATE_TRUNC(event_date, MONTH) AS event_month FROM base ) ), user_cnt_and_diff AS( SELECT diff_of_month, COUNT(DISTINCT user_pseudo_id) AS user_cnt FROM first_month_and_diff GROUP BY ALL ORDER BY diff_of_month ), retain_base AS ( SELECT *, FIRST_VALUE(user_cnt) OVER (ORDER BY diff_of_month) AS first_user_cnt FROM user_cnt_and_diff ) SELECT diff_of_month, user_cnt SAFE_DIVIDE(user_cnt, first_user_cnt) AS retention FROM retain_base 리텐션 연습문제Weekly Retention을 구하는 쿼리를 바닥부터 스스로 작성해보세요. # 답 -- Weekly Retention을 구하자! (Retention이란, 시간이 흐르면서 다시 제품을 사용하는지 측정하는 지표) -- 첫 방문 후 매주 재방문 비율(weekly retention)을 계산하자! WITH base AS ( SELECT DISTINCT ###### DATE(DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul')) AS event_date, DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul') AS event_datetime, event_name, user_id, ###### user_pseudo_id FROM advanced.app_logs WHERE event_date BETWEEN "2022-08-01" AND "2023-08-31" ), event_week_and_first_week AS ( SELECT DISTINCT DATE_TRUNC(event_date, WEEK(MONDAY)) AS event_week, user_pseudo_id, DATE_TRUNC(MIN(event_date) OVER(PARTITION BY user_pseudo_id ORDER BY event_date), WEEK(MONDAY)) AS first_week FROM base ), retention_base AS ( SELECT *, FIRST_VALUE(user_cnt) OVER(ORDER BY day_of_week) AS total_user FROM( SELECT DATE_DIFF(event_week, first_week, WEEK) AS day_of_week, COUNT(DISTINCT user_pseudo_id) AS user_cnt FROM event_week_and_first_week GROUP BY ALL ) -- ORDER BY day_of_week ) SELECT day_of_week, SAFE_DIVIDE(user_cnt, total_user) AS retention FROM retention_base Retain User를 New + Current + Resurrected + Dormant User로 나누는 쿼리를 작성해보세요# 답 -- 신규 유저: 첫 활동일 기준 7일 이내 -- 기존 유저: 첫 활동일 기준 7일 이후, 마지막 활동으로부터 90일 이내 -- 휴면 유저: 마지막 활동일 이후 90일 이상 경과 -- 복귀 유저: 90일 이상 비활성화 이후 다시 돌아온 유저 WITH base AS ( SELECT DISTINCT DATE(DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul')) AS event_date, user_pseudo_id FROM advanced.app_logs --WHERE event_date BETWEEN "2022-08-01" AND "2023-08-31" ),user_first_last AS ( SELECT user_pseudo_id, MIN(event_date) AS first_date, #OVER(PARTITION BY user_pseudo_id)를 넣으면 윈도우함수사용과 GROUPBY졸 사용 비교했을 때처럼 중복행이 생긴다 MAX(event_date) AS last_date FROM base GROUP BY user_pseudo_id ), last_event_date AS ( -- 원래라면 CURRENT_DATE()로 해야하지만, 앱로그 데이터의 마지막 날로 설정. SELECT MAX(event_date) AS current_date FROM base ), user_activity AS ( -- 90일 이상 제품을 사용하지 않았다면 1, 아니면 0 SELECT user_pseudo_id, event_date, LEAD(event_date) OVER (PARTITION BY user_pseudo_id ORDER BY event_date) AS next_event_date, CASE WHEN DATE_DIFF(LEAD(event_date) OVER (PARTITION BY user_pseudo_id ORDER BY event_date), event_date, DAY) > 90 THEN 1 ELSE 0 END AS dormant_history FROM base ) -- user_types AS ( SELECT DISTINCT u.user_pseudo_id, u.first_date, u.last_date, led.current_date, us.dormant_history, CASE WHEN DATE_DIFF(led.current_date, u.last_date, DAY) > 90 THEN 'dormant_user' WHEN DATE_DIFF(u.last_date, u.first_date, DAY) <= 7 THEN 'new_user' WHEN dormant_history = 1 THEN 'resurrected_user' ELSE 'current_user' END AS user_type FROM user_first_last AS u CROSS JOIN last_event_date AS led LEFT JOIN user_activity AS us ON u.user_pseudo_id = us.user_pseudo_id주어진 데이터에서 어떤 사람들이 리텐션이 그나마 높을까요? 찾아보세요WITH base AS ( SELECT DISTINCT DATE(DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul')) AS event_date, user_pseudo_id FROM advanced.app_logs ), user_first_last AS ( SELECT user_pseudo_id, MIN(event_date) AS first_date, MAX(event_date) AS last_date FROM base GROUP BY user_pseudo_id ), last_event_date AS ( SELECT MAX(event_date) AS current_date FROM base ), user_activity AS ( SELECT user_pseudo_id, event_date, LEAD(event_date) OVER (PARTITION BY user_pseudo_id ORDER BY event_date) AS next_event_date, CASE WHEN DATE_DIFF(LEAD(event_date) OVER (PARTITION BY user_pseudo_id ORDER BY event_date), event_date, DAY) > 90 THEN 1 ELSE 0 END AS dormant_history FROM base ), user_types AS ( SELECT u.user_pseudo_id, u.first_date, u.last_date, led.current_date, MAX(us.dormant_history) AS dormant_history, CASE WHEN DATE_DIFF(led.current_date, u.last_date, DAY) > 90 THEN 'dormant_user' WHEN DATE_DIFF(u.last_date, u.first_date, DAY) <= 7 THEN 'new_user' WHEN MAX(us.dormant_history) = 1 THEN 'resurrected_user' ELSE 'current_user' END AS user_type FROM user_first_last AS u CROSS JOIN last_event_date AS led LEFT JOIN user_activity AS us ON u.user_pseudo_id = us.user_pseudo_id GROUP BY u.user_pseudo_id, u.first_date, u.last_date, led.current_date ), first_week_and_diff AS ( SELECT ut.user_type, fw.user_pseudo_id, fw.event_date, DATE_DIFF(DATE_TRUNC(fw.event_date, WEEK(MONDAY)), DATE_TRUNC(ut.first_date, WEEK(MONDAY)), WEEK) AS diff_of_week FROM base AS fw JOIN user_types AS ut ON fw.user_pseudo_id = ut.user_pseudo_id ), user_cnt_by_type_and_week AS ( SELECT user_type, diff_of_week, COUNT(DISTINCT user_pseudo_id) AS user_cnt FROM first_week_and_diff GROUP BY user_type, diff_of_week ), retention_base AS ( SELECT user_type, diff_of_week, user_cnt, FIRST_VALUE(user_cnt) OVER (PARTITION BY user_type ORDER BY diff_of_week) AS first_user_cnt FROM user_cnt_by_type_and_week ) SELECT user_type, diff_of_week, ROUND(SAFE_DIVIDE(user_cnt, first_user_cnt), 2) AS retention_rate FROM retention_base ORDER BY user_type, diff_of_week current user의 1주차 리텐션은 0.11부터 시작해서 5~6주차에는 0.15까지 오릅니다.new user의 경우 1주차까지 밖에 리텐션을 구하지 못하고, 0.06입니다.resurrected user의 경우 1주차 리텐션은 0.04부터 시작해서 7주차에는 0으로 빠르게 감소합니다.dormant user의 경우 휴면고객의 리텐션을 구하는 것이 의미가 있는지 잘 모르겠지만, 1주차 리텐션은 0.05부터 시작해서 11주차까지 0으로 감소합니다.리텐션이 그나마 높은 유저는 current user이고, 리텐션이 주마다 서서히 증가하거나 감소하는데 2주차에 오른 것으로 보아 기능 배포나 이벤트가 있지 않았을까 예상해봅니다.Core Event를 “click_payment”라고 설정하고 Weekly Retention을 구해주세요WITH base AS ( SELECT DISTINCT DATE(DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul')) AS event_date, DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul') AS event_datetime, event_name, user_id, user_pseudo_id FROM advanced.app_logs WHERE event_name = "click_payment" ), event_week_and_first_week AS ( SELECT DISTINCT DATE_TRUNC(event_date, WEEK(MONDAY)) AS event_week, user_pseudo_id, DATE_TRUNC(MIN(event_date) OVER(PARTITION BY user_pseudo_id ORDER BY event_date), WEEK(MONDAY)) AS first_week FROM base ), retention_base AS ( SELECT *, FIRST_VALUE(user_cnt) OVER(ORDER BY day_of_week) AS total_user FROM ( SELECT DATE_DIFF(event_week, first_week, WEEK) AS day_of_week, COUNT(DISTINCT user_pseudo_id) AS user_cnt FROM event_week_and_first_week GROUP BY day_of_week ) ) SELECT day_of_week, ROUND(SAFE_DIVIDE(user_cnt, total_user), 2) AS retention FROM retention_base ORDER BY day_of_weekapp_logs 테이블에서 event_name이 click_payment인 것만 필터 걸어서 리텐션 구하기를 진행했는데 이렇게 푸는 것이 맞을까요..?
-
해결됨초보자를 위한 BigQuery(SQL) 입문
2-5 집계 23:42 GROUP BY 연습문제 질문 드려요
포켓몬의 수를 타입 별 집계, 수가 10 이상인 타입만 남기는 쿼리 작성 연습문제에 대하여 질문 드려요. 작성 전에 고민했던 부분은포켓몬 타입은 type 1, 2로 나뉘어 있다.type 2는 null 인 경우가 있다. => where 로 제외할 수 있을까?type 1, 2 을 포함해서 세야 할 것 같다. (총 포켓몬 수보다 합산 값이 클 수 있다)였거든요.Group by 에서 type2를 빼면 통합되지 않았다는 오류가 발생해서 어쩔 수 없이 넣긴 했는데, 제가 원한 형태로 구현되지는 않았습니다. 어떻게 해야 null 을 제외한 type2 의 값까지 전부 셀 수 있을까요?WHERE TYPE2 IS NOT NULL 을 넣게 되면 1개 타입만 가진 개체의 경우 아예 집계를 하지 않는 것 같아서 그 부분도 뺄 수 밖에 없었어요.최종적으로는 1열에 타입, 2열에 해당하는 개체 수가 나오게 구현하고 싶습니다.(바위, 불인 포켓몬의 경우 바위에 1개, 불에 1개 계산되도록이요.바위, 불 포켓몬 + 바위 포켓몬 + 불 포켓몬 = 바위 2개체 불 2개체)SELECT type1, type2, COUNT(id) as cnt FROM `basic.pokemon` GROUP BY type1, type2
-
해결됨초보자를 위한 BigQuery(SQL) 입문
battel 테이블 오류가 뜨면서 테이블 저장이 안되요
Battel 데이터 문제로 보이는데, 설정을 잘못한걸까요?
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[3]_[인프런 빅쿼리 빠짝스터디 3주차] 리텐션 과제와 추가 문제 풀이
추가 문제 1번WITH transaction_data AS ( SELECT 111 AS user_id, 1001 AS item_id, 719200 AS actual_price, '01/08/2024 12:00:00' AS transaction_date UNION ALL SELECT 111, 2002, 89000, '01/10/2024 12:00:00' UNION ALL SELECT 189, 2002, 89000, '01/12/2024 12:00:00' UNION ALL SELECT 156, 3002, 459000, '01/15/2024 12:00:00' UNION ALL SELECT 121, 1001, 719200, '01/18/2024 12:00:00' UNION ALL SELECT 156, 2001, 90300, '01/25/2024 12:00:00' UNION ALL SELECT 145, 3001, 399000, '01/26/2024 12:00:00' UNION ALL SELECT 189, 1002, 607200, '01/28/2024 12:00:00' UNION ALL SELECT 111, 3001, 399000, '02/05/2024 12:00:00' UNION ALL SELECT 178, 1002, 759000, '02/07/2024 12:00:00' UNION ALL SELECT 121, 2002, 62300, '02/08/2024 12:00:00' UNION ALL SELECT 156, 1001, 899000, '02/10/2024 12:00:00' UNION ALL SELECT 190, 2001, 90300, '02/11/2024 12:00:00' UNION ALL SELECT 189, 2001, 90300, '02/14/2024 12:00:00' UNION ALL SELECT 111, 1002, 759000, '02/15/2024 12:00:00' UNION ALL SELECT 156, 3001, 299250, '02/20/2024 12:00:00' UNION ALL SELECT 189, 3002, 344250, '02/25/2024 12:00:00' UNION ALL SELECT 111, 2001, 90300, '02/28/2024 12:00:00' ), user_info AS ( SELECT 111 AS user_id, 'Seoul' AS city, 28 AS age, 'Female' AS gender UNION ALL SELECT 121, 'Busan', 35, 'Male' UNION ALL SELECT 145, 'Incheon', 42, 'Female' UNION ALL SELECT 156, 'Seoul', 31, 'Male' UNION ALL SELECT 178, 'Daegu', 25, 'Female' UNION ALL SELECT 189, 'Seoul', 39, 'Male' UNION ALL SELECT 190, 'Busan', 29, 'Female' ), item_info AS ( SELECT 1001 AS item_id, 'Electronics' AS category, 'Smartphone' AS item_name, 899000 AS list_price UNION ALL SELECT 1002 AS item_id, 'Electronics' AS category, 'Tablet' AS item_name, 759000 AS list_price UNION ALL SELECT 2001 AS item_id, 'Fashion' AS category, 'Sneakers' AS item_name, 129000 AS list_price UNION ALL SELECT 2002 AS item_id, 'Fashion' AS category, 'Backpack' AS item_name, 89000 AS list_price UNION ALL SELECT 3001 AS item_id, 'Home' AS category, 'Coffee Machine' AS item_name, 399000 AS list_price UNION ALL SELECT 3002 AS item_id, 'Home' AS category, 'Air Purifier' AS item_name, 459000 AS list_price ) # 1번 문제(15분) -- - 주차별 각 카테고리별 평균 할인율이 가장 높았던 기간과 할인율을 구하는 쿼리를 작성해주세요 -- - 단, 날짜 데이터를 YYYY-MM-DD 23:59:39 이런 형태로 변경해주세요 , real_base as ( select * from ( select *, avg(diff_price) as avg_price, from( SELECT transaction_date, category, list_price, (list_price - actual_price) as diff_price FROM item_info as item left join transaction_data trans_date on item.item_id = trans_date.item_id ) as base group by all ) as base2 ) ## 주차별 평균 할인율 가장 높은 기간 --> 2024-01-08 ~ 2024-01-18 -- select -- FORMAT_TIMESTAMP('%Y-%m-%d 23:59:39', PARSE_TIMESTAMP('%m/%d/%Y %H:%M:%S', transaction_date)) AS formatted_date, -- avg_price -- from real_base -- group by -- 1,2 -- order by -- 2 desc ## 카테고리별 평균 할인율이 가장 높았던 기간 select *, round((max_price / list_price) * 100, 2) as `할인율` from ( select distinct FORMAT_TIMESTAMP('%Y-%m-%d 23:59:39', PARSE_TIMESTAMP('%m/%d/%Y %H:%M:%S', transaction_date)) AS formatted_date, category, max(avg_price) as max_price, list_price from real_base group by all order by 3 desc ) as avg_price_table order by 5 desc ## 할인율이 가장 높았던 주차 -- 01/25/2024 -- 02/08/2024 -- 02/11/2024 -- 02/14/2024 -- 02/28/2024 -- select -- transaction_date, -- round((diff_price / list_price) * 100, 2) as `할인율` -- -- transaction_date, -- -- safe_divide(diff_price, list_price) -- from real_base -- group by -- 1,2 -- order by -- 2 desc ## 할인율이 높았던 카테고리 -- Fashion -- select -- category, -- round((diff_price / list_price) * 100, 2) as `할인율` -- -- transaction_date, -- -- safe_divide(diff_price, list_price) -- from real_base -- group by all -- order by -- 2 desc 2번WITH transaction_data AS ( SELECT 111 AS user_id, 1001 AS item_id, 719200 AS actual_price, '01/08/2024 12:00:00' AS transaction_date UNION ALL SELECT 111, 2002, 89000, '01/10/2024 12:00:00' UNION ALL SELECT 189, 2002, 89000, '01/12/2024 12:00:00' UNION ALL SELECT 156, 3002, 459000, '01/15/2024 12:00:00' UNION ALL SELECT 121, 1001, 719200, '01/18/2024 12:00:00' UNION ALL SELECT 156, 2001, 90300, '01/25/2024 12:00:00' UNION ALL SELECT 145, 3001, 399000, '01/26/2024 12:00:00' UNION ALL SELECT 189, 1002, 607200, '01/28/2024 12:00:00' UNION ALL SELECT 111, 3001, 399000, '02/05/2024 12:00:00' UNION ALL SELECT 178, 1002, 759000, '02/07/2024 12:00:00' UNION ALL SELECT 121, 2002, 62300, '02/08/2024 12:00:00' UNION ALL SELECT 156, 1001, 899000, '02/10/2024 12:00:00' UNION ALL SELECT 190, 2001, 90300, '02/11/2024 12:00:00' UNION ALL SELECT 189, 2001, 90300, '02/14/2024 12:00:00' UNION ALL SELECT 111, 1002, 759000, '02/15/2024 12:00:00' UNION ALL SELECT 156, 3001, 299250, '02/20/2024 12:00:00' UNION ALL SELECT 189, 3002, 344250, '02/25/2024 12:00:00' UNION ALL SELECT 111, 2001, 90300, '02/28/2024 12:00:00' ), user_info AS ( SELECT 111 AS user_id, 'Seoul' AS city, 28 AS age, 'Female' AS gender UNION ALL SELECT 121, 'Busan', 35, 'Male' UNION ALL SELECT 145, 'Incheon', 42, 'Female' UNION ALL SELECT 156, 'Seoul', 31, 'Male' UNION ALL SELECT 178, 'Daegu', 25, 'Female' UNION ALL SELECT 189, 'Seoul', 39, 'Male' UNION ALL SELECT 190, 'Busan', 29, 'Female' ), item_info AS ( SELECT 1001 AS item_id, 'Electronics' AS category, 'Smartphone' AS item_name, 899000 AS list_price UNION ALL SELECT 1002 AS item_id, 'Electronics' AS category, 'Tablet' AS item_name, 759000 AS list_price UNION ALL SELECT 2001 AS item_id, 'Fashion' AS category, 'Sneakers' AS item_name, 129000 AS list_price UNION ALL SELECT 2002 AS item_id, 'Fashion' AS category, 'Backpack' AS item_name, 89000 AS list_price UNION ALL SELECT 3001 AS item_id, 'Home' AS category, 'Coffee Machine' AS item_name, 399000 AS list_price UNION ALL SELECT 3002 AS item_id, 'Home' AS category, 'Air Purifier' AS item_name, 459000 AS list_price ) # 2번 문제(10분) 1610 -- - 2024년 1월에 가장 많은 매출을 기록한 카테고리를 구하는 쿼리를 작성해주세요 -- 날짜 테이블 -- 아이템 테이블 활용 -- 기간에 필터링 걸기 -- 카테고리 -- 매출 -- 카테고리별 구매가 -- 구매가의 합 = 매출 ## 카테고리별 구매가 , category_price as ( select item_date.transaction_date, item.category, item_date.actual_price from item_info as item left join transaction_data as item_date on item.item_id = item_date.item_id group by all ) ## 매출 | 카테고리별 구매가의 합 , `매출_테이블` as ( select *, sum(actual_price) over(partition by category) as `매출` from category_price ) ## 카테고리별 매출에 1월 필터링 select * from `매출_테이블` where transaction_date between '01/08/2024' and '01/28/2024' 3번WITH transaction_data AS ( SELECT 111 AS user_id, 1001 AS item_id, 719200 AS actual_price, '01/08/2024 12:00:00' AS transaction_date UNION ALL SELECT 111, 2002, 89000, '01/10/2024 12:00:00' UNION ALL SELECT 189, 2002, 89000, '01/12/2024 12:00:00' UNION ALL SELECT 156, 3002, 459000, '01/15/2024 12:00:00' UNION ALL SELECT 121, 1001, 719200, '01/18/2024 12:00:00' UNION ALL SELECT 156, 2001, 90300, '01/25/2024 12:00:00' UNION ALL SELECT 145, 3001, 399000, '01/26/2024 12:00:00' UNION ALL SELECT 189, 1002, 607200, '01/28/2024 12:00:00' UNION ALL SELECT 111, 3001, 399000, '02/05/2024 12:00:00' UNION ALL SELECT 178, 1002, 759000, '02/07/2024 12:00:00' UNION ALL SELECT 121, 2002, 62300, '02/08/2024 12:00:00' UNION ALL SELECT 156, 1001, 899000, '02/10/2024 12:00:00' UNION ALL SELECT 190, 2001, 90300, '02/11/2024 12:00:00' UNION ALL SELECT 189, 2001, 90300, '02/14/2024 12:00:00' UNION ALL SELECT 111, 1002, 759000, '02/15/2024 12:00:00' UNION ALL SELECT 156, 3001, 299250, '02/20/2024 12:00:00' UNION ALL SELECT 189, 3002, 344250, '02/25/2024 12:00:00' UNION ALL SELECT 111, 2001, 90300, '02/28/2024 12:00:00' ), user_info AS ( SELECT 111 AS user_id, 'Seoul' AS city, 28 AS age, 'Female' AS gender UNION ALL SELECT 121, 'Busan', 35, 'Male' UNION ALL SELECT 145, 'Incheon', 42, 'Female' UNION ALL SELECT 156, 'Seoul', 31, 'Male' UNION ALL SELECT 178, 'Daegu', 25, 'Female' UNION ALL SELECT 189, 'Seoul', 39, 'Male' UNION ALL SELECT 190, 'Busan', 29, 'Female' ), item_info AS ( SELECT 1001 AS item_id, 'Electronics' AS category, 'Smartphone' AS item_name, 899000 AS list_price UNION ALL SELECT 1002 AS item_id, 'Electronics' AS category, 'Tablet' AS item_name, 759000 AS list_price UNION ALL SELECT 2001 AS item_id, 'Fashion' AS category, 'Sneakers' AS item_name, 129000 AS list_price UNION ALL SELECT 2002 AS item_id, 'Fashion' AS category, 'Backpack' AS item_name, 89000 AS list_price UNION ALL SELECT 3001 AS item_id, 'Home' AS category, 'Coffee Machine' AS item_name, 399000 AS list_price UNION ALL SELECT 3002 AS item_id, 'Home' AS category, 'Air Purifier' AS item_name, 459000 AS list_price ) # 3번 문제(10분) 1646 -- - 유저별 총 구매 금액이 200만원 이상인 유저들이 가장 많이 구매한 카테고리를 찾는 쿼리를 작성해주세요 -- 유저별 총 구매 금액 -- 유저 | 총 구매금액 -- 유저별 | 구매금액 -- 구매금액 합산 = 총 구매금액 , `유저별_총구매금액` as ( select user_id, `총 구매금액` from ( select user_id, sum(actual_price) over(partition by user_id) as `총 구매금액` from ( SELECT user_id, actual_price, FROM transaction_data group by all ) as base1 ) base2 group by 1,2 ) -- 200만원 이상 필터링 , `200만원 이상 필터링` as ( select * from `유저별_총구매금액` where `총 구매금액` >= 2000000 ) ## 유저들이 가장 많이 구매한 카테고리 -- 유저 | 카테고리 | 구매 -- 구매가가 있는 카테고리 수를 세면 어떨까? -- 카테고리별 카운트? select category, `카테고리별카운트` from ( select *, sum(`카테고리수`) over(partition by category) as `카테고리별카운트` from ( select user_id, category, count(category) over(partition by user_id) as `카테고리수` from ( select td2.user_id, item2.category, td2.actual_price from transaction_data as td2 left join item_info as item2 on td2.item_id = item2.item_id ) b_base ) as b_base2 ) as b_base3 group by 1,2 order by 2 desc
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[2]_[인프런 빅쿼리 빠짝스터디 3주차] 리텐션 과제와 추가 문제 풀이
유저 상태 new user# User Status 정의 -- "우리 서비스를 이용하는 유저들의 주차별 상태가 어떻게 되나?" /* 1. New User: 해당 주차에 처음 방문한 유저 2. Current User: 지난주에도 방문했고 이번 주에도 방문한 유저 3. Resurrected User: 지난주에는 없었는데 이번주에 다시 방문한 유저 (첫 방문은 아님) 4. Dormant User: 지난주까지는 있었는데 이번주에 방문하지 않은 유저 */ # New User: 해당 주차에 처음 방문한 유저 -- '유저의 첫 방문 주'와 '유저별 이벤트 발생주'가 같은지 확인 -- 같으면, 1 -- 아니면 0 ## '유저의 첫 방문 주' with user_first_visit as ( select user_pseudo_id, date_trunc(min(event_date), week(monday)) as first_visit_date from `advanced.app_logs` where event_date between '2022-08-01' and '2022-12-31' group by 1 ) ## '유저의 모든 방문 주' , user_event_week as ( select distinct user_pseudo_id, date_trunc(event_date, week(monday)) as visit_date from `advanced.app_logs` where event_date between '2022-08-01' and '2022-12-31' ) ## 첫 방문 주와 모든 방문 주 중 같은 데이터가 있다면 '방문(=1)' , new_user_table as ( select ufv.user_pseudo_id, ufv.first_visit_date, uew.visit_date, if(ufv.first_visit_date = uew.visit_date, 1, 0) as new_user from user_first_visit as ufv inner join user_event_week as uew on ufv.user_pseudo_id = uew.user_pseudo_id group by 1,2,3 ) select first_visit_date as week, sum(new_user) as new_user_count from new_user_table group by 1 order by 1 current user# User Status 정의 -- "우리 서비스를 이용하는 유저들의 주차별 상태가 어떻게 되나?" /* 1. New User: 해당 주차에 처음 방문한 유저 2. Current User: 지난주에도 방문했고 이번 주에도 방문한 유저 3. Resurrected User: 지난주에는 없었는데 이번주에 다시 방문한 유저 (첫 방문은 아님) 4. Dormant User: 지난주까지는 있었는데 이번주에 방문하지 않은 유저 */ -- Current User: 지난주에도 방문했고 이번 주에도 방문한 유저 -- 유저별 모든 방문 주 구하기 -- 모든 방문주 - 전주 방문 데이터 = 1 --> current user ## 유저별 모든 방문 주 구하기 with user_visits as ( select user_pseudo_id, date_trunc((event_date), week(monday)) as visit_date from `advanced.app_logs` where event_date between '2022-08-01' and '2022-12-31' group by 1,2 ) ## 유저의 전주 방문 데이터 , prev_user_table as ( select user_pseudo_id, visit_date, lag(visit_date) over(partition by user_pseudo_id order by visit_date) as previous_visit_user from user_visits ) ## 모든 방문주 - 전주 방문 데이터 = 1 --> current user select visit_date as week, count(distinct user_pseudo_id) as current_user_cnt from( select distinct user_pseudo_id, visit_date, date_diff(visit_date, previous_visit_user, week) = 1 as current_user from prev_user_table ) as base where current_user is not null group by 1 order by 1 resurrected user# User Status 정의 -- "우리 서비스를 이용하는 유저들의 주차별 상태가 어떻게 되나?" /* 1. New User: 해당 주차에 처음 방문한 유저 2. Current User: 지난주에도 방문했고 이번 주에도 방문한 유저 3. Resurrected User: 지난주에는 없었는데 이번주에 다시 방문한 유저 (첫 방문은 아님) 4. Dormant User: 지난주까지는 있었는데 이번주에 방문하지 않은 유저 */ # Resurrected User: 지난주에는 없었는데 이번주에 다시 방문한 유저 (첫 방문은 아님) ## 유저별 모든 방문 주 구하기 with user_visits as ( select user_pseudo_id, date_trunc((event_date), week(monday)) as visit_date from `advanced.app_logs` where event_date between '2022-08-01' and '2022-12-31' group by 1,2 ) ## 유저의 전주 방문 데이터 , prev_user_table as ( select user_pseudo_id, visit_date, lag(visit_date) over(partition by user_pseudo_id order by visit_date) as previous_visit_user from user_visits ) ## (첫 방문이 아님) + (모든 방문 주 데이터 - 이전 방문주 > 1) = Resurrected User -- 첫 방문이 아닌 유저 : 'previous_visit_user is not null' -- 첫 방문이었으면 previous_visit_user가 null이었을테니깐. select visit_date as week, count(user_status) as Resurrected_User from ( select user_pseudo_id, visit_date, previous_visit_user, case when previous_visit_user is not null and date_diff(visit_date, previous_visit_user, week) > 1 then 'Resurrected_User' else null end as user_status from prev_user_table ) as Resurrected_User where user_status is not null group by 1 order by 1 dormant user# User Status 정의 -- "우리 서비스를 이용하는 유저들의 주차별 상태가 어떻게 되나?" /* 1. New User: 해당 주차에 처음 방문한 유저 2. Current User: 지난주에도 방문했고 이번 주에도 방문한 유저 3. Resurrected User: 지난주에는 없었는데 이번주에 다시 방문한 유저 (첫 방문은 아님) 4. Dormant User: 지난주까지는 있었는데 이번주에 방문하지 않은 유저 */ # Dormant User: 지난주까지는 있었는데 이번주에 방문하지 않은 유저 ## 모든 방문 데이터 with base as ( select user_pseudo_id, date_trunc((event_date), week(monday)) as visit_date from `advanced.app_logs` where event_date between '2022-08-01' and '2022-12-31' group by 1,2 ) ## 유저의 다음 방문주 데이터= , next_visit_date_table as ( select *, lead(visit_date) over(partition by user_pseudo_id order by visit_date) as next_visit_date from base ) ## Dormant User -- Dormant User? -- 다음주 방문 데이터가 null 이거나, -- next_visit_date > 1 week가 아닌 유저 , Dormant_User_table as ( select *, case when next_visit_date is null or date_diff(next_visit_date, visit_date, week) > 1 then 'Dormant_User' else null end as Dormant_User_data from next_visit_date_table ) select visit_date as week, count(Dormant_User_data) as Dormant_User_data_cnt from Dormant_User_table group by 1 order by 1
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[1]_[인프런 빅쿼리 빠짝스터디 3주차] 리텐션 과제와 추가 문제 풀이
[종합 느낀점]1. 날짜 데이터 변환 함수, 날짜 데이터로 추출 하는 함수 등 문제 풀 때 사용하는 함수를 까먹어 애를 먹었습니다.2. 테이블 조인시 첫 테이블을 잘못 선택하면, 쿼리가 길어지고 꼬인다는 것을 느꼈습니다.3. 지표가 어떤 지표느냐에 따라 해석 방법이 달라진 점도 알게 됐습니다. [문제 풀이 소요 시간]1번 문제 : 30분2번 문제 : 12분3번 문제 : 12분 정답이면서도 확장성 있는 쿼리를 작성하다가는 영원히 못풀겠더라고요..그래서 최대한 쿼리 작성을 마무리 짓는데 초점을 뒀습니다.. ㅠㅠ [new_users]위와 같은 시각화 결과가 나왔습니다.눈에 띄는 데이터로는 두 가지가 있습니다:1. 10월 new users 상승 상승 추정 이유: 1. 새로운 마케팅 전략 시행(시기가 추석 근처이니깐 추석을 기념해서 배달앱 쿠폰 지급?)2. 12월 new users 하락 하락 추정 이유: 1. 연말 모임으로 인한 배달 수요 감소 (사실 감소세는 10월달에 고점을 찍은 이후 쭉 하락세이지만, 그 이유를 단정짓지는 못하겠습니다..) [Current User]위와 같은 시각화 결과가 나왔습니다.눈에 띄는 데이터로는 세가지가 있습니다:1. 9월 중순 이후 Current User 상승 상승 추정 이유: 1. 새로운 마케팅 전략 시행(시기가 추석 근처이니깐 추석을 기념해서 배달앱 쿠폰 지급?)2. 12월까지 꾸준한 상승 상승 추정 이유: 1. 기존 유저의 긍정적인 경험이 이어져서 서비스에 잔류 2. 마케팅으로 유입된 신규 유저도 긍정적인 경험을 해서 서비스에 잔류3. 12월 Current User 하락 하락 추정 이유: 1. 연말 모임으로 인한 배달 수요 감소 [Resurrected User]'Current User' 해석과 비슷할 거 같습니다. [Dormant User]1. 9월 -> 10월 급격한 상승 상승 추정 이유: 1. 앞선 이유과 같음. 추석 관련 이벤트 실시. 2. 그러나 계속 상승이 이어지지 않은 점을 고민해 봐야 함.2. 10월 -> 11월 말 유지 유지 추정 이유: 1. 마케팅으로 긍정적인 경험을 한 기존·신규 유저 잔류 2. 하지만 입소문이 나서 계속 상승하지는 않음. 3. 일부 유저만 잔류하고, 나머지 신규 유입 유저는 이벤트만 맛보고 이탈.3. 12월 급격한 상승 상승 추정 이유: 1. 연말에 배달 수요가 감소할 것을 예측해 새로운 마케팅 전략 실시 주차별 리텐션# 리텐션 정의 -- "우리 서비스를 이용한 사람들이 현재는 얼마큼 이용하고 있을까?" /* 1. 무슨 데이터가 필요할까? - 유저 - 날짜 왜 이런 데이터가 필요할까? 이에 대한 답변: '사람들'이니깐, '유저'를 인식할 수 있는 데이터가 필요하다. '현재'이니깐, '시간'을 인식할 수 있는 데이터가 필요하다. ## 1. 유저별 첫 방문 주 구하기*/ with first_visit_table as ( select user_pseudo_id, date_trunc(min(event_date), week(monday)) as first_visit_date from `advanced.app_logs` where event_date between '2022-08-01' and '2022-12-31' group by 1 ) /*2. 처음 이벤트를 발생 시킨 주는? - 무슨 데이터가 필요할까? 1. 첫 이벤트 발생 날짜 2. 유저 - 왜 이런 데이터가 필요할까? 이에 대한 답변: '이벤트'는 주체가 있어야 발동 되는 거니깐 '유저' 데이터가 필요하다. 또 이벤트 발생이란 결과에는 '날짜'라는 개념도 포함 되어 있기 때문에, '날짜' 데이터가 필요하다. 그중에서도 '처음 이벤트를 발생한 주'의 데이터가 필요하다. 3. 유저가 이벤트를 실행한 주는? - 무슨 데이터가 필요할까? 1. 유저 2. 이벤트 이름(선택) 3. 이벤트 발생 날짜 + 중복제거 왜? 목적이 모든 유저의 리텐션이기 때문에. 중복 제거를 하지 않으면, 모든 유저가 발생시킨 모든 이벤트가 보이기 때문에 가독성이 안 좋기 때문에. - 왜 이런 데이터가 필요할까? 이에 대한 답변: 처음 이벤트를 발생시킨 날짜와 비교하기 위해서 ## 2. 유저별 이벤트 발생 주 구하기*/ , user_event as ( select distinct user_pseudo_id, date_trunc(event_date, week(monday)) as user_event_week from `advanced.app_logs` where event_date between '2022-08-01' and '2022-12-31' ) /*4. '유저가 이벤트 실행한 주 - 유저가 처음 방문한 주' - 무슨 데이터가 필요할까? 1. 유저 2. 첫 이벤트 발생 날짜 3. 이벤트 발생 날짜 4. (이벤트 발생 날짜 - 첫 이벤트 발생 날짜) - 왜 이런 데이터가 필요할까? 이에 대한 답변: 처음 이벤트를 발생시킨 날짜와 비교하기 위해서 ## 3. 유저별 이벤트 발생주와 첫 방문주 차이 구하기*/ , diff_week_table as ( select fvt.user_pseudo_id, fvt.first_visit_date, ue.user_event_week, date_diff(ue.user_event_week, fvt.first_visit_date, week(monday)) as diff_week from first_visit_table as fvt left join user_event as ue on fvt.user_pseudo_id = ue.user_pseudo_id order by 2, 4 ) /*5. 주차별 유저수 구하기 - 왜 위와 같은 생각을 했는가? 이에 대한 답변: 주차수에 따른 유저수를 알아야만 첫 주차에 비해 얼마큼 유저가 이벤트를 발생시키지 않았는지 알 수 있으니깐. - 무슨 데이터가 필요한가? 이에 대한 답변: 1. 이벤트 발생 차이 주 2. 중복 없는 유저 수 ## 4. 주차별 유저수 구하기*/ , unique_user_cnt_table as ( select diff_week, count(distinct user_pseudo_id) as unique_user_cnt from diff_week_table group by 1 order by 1 ) /*6. 모든 행에 첫 주 유저수 적용하기 - 왜 위와 같은 생각을 했는가? 이에 대한 답변: 첫 주 유저수로 주차마다 달라지는 유저수를 나누면, 리텐션을 구할 수 있기 때문에. - 무슨 데이터가 필요한가? 이에 대한 답변: 1. 이벤트 발생 차이주 2. 주차별 중복 없는 유저수 3. 모든 행에 적용하는 첫 주 유저수 ## 5. 모든 행에 첫 주 유저수 적용하기*/ , first_visit_week_data_table as ( select *, first_value(unique_user_cnt) over(order by unique_user_cnt desc) AS first_visit_week_data from unique_user_cnt_table order by 1 ) ## 6. 리텐션 구하기 | 주차별 유저수 / 첫 주 유저수 select *, round(safe_divide(unique_user_cnt, first_visit_week_data), 2) as retention_rate from first_visit_week_data_table
-
해결됨초보자를 위한 BigQuery(SQL) 입문
1-2 환경설정에서 마지막 실습 구문이 작동이 안돼요.
강의 내용 따라서 테이블 생성에는 문제가 없었어요.새 쿼리 탭에서 select* (쿼리에선 탭 1회)from 'basic.pokemon'order by id 로 작성하였는데 구문 오류라고 뜹니다.데이터셋 이름도 basic이고 테이블도 pokemon으로 해두었는데 왜 이러는 걸까요? 혹시나 싶어 'bigquery101-441101.basic.pokemon' 으로도 시도해 보았는데 마찬가지로 구문 오류가 떴습니다. 데이터셋이나 테이블 이름에 space가 들어갔나 싶어, 테이블셋 세부 정보에서 테이블 ID 복사 기능으로 붙여넣어도 작동이 안되는 상황입니다.
-
해결됨초보자를 위한 BigQuery(SQL) 입문
4-5. 3번 시간데이터 연습문제 질문
3. 각 트레이너별로 그들이 포켓몬을 포획한 첫 날(catch_date)을 찾고, 그 날짜를 'DD/MM/YYYY' 형식으로 출력해주세요.해당 문제에서 catch_date는 UTC 기준의 데이터이므로, 한국 기준으로 하려면 catch_datetime을 사용해야 한다고 하셨는데요!테이블을 보면, TIMESTAMP 타입인 catch_datetime만 UTC 기준의 데이터인 것으로 이해했는데DATE 타입인 catch_date가 UTC 기준의 데이터인 이유가 무엇인가요?
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
빠짝스터디 2주차 윈도우 함수
--1. 사용자별 쿼리를 실행한 총 횟수를 구하는 쿼리를 작성해주세요. 단, group by를 사용해서 집곟나ㅡㄴ 것이 아닌 query_logs의 데이터의 우측에 새로운 컬럼을 만들어주세요. select *, count(query_date) over(partition by user) as total_query_cnt from advanced.query_logs order by 1, 3 --2. 주차별로 팀 내에서 쿼리를 많이 실행한 수를 구한 후, 실행한 수를 활용해 랭킹을 구해주세요. 단, 랭킹이 1등인 사람만 결과가 보이게 해주세요 with query_cnt_by_team as( select extract(WEEK from query_date) as week_number, team, user, count(user) as query_cnt from advanced.query_logs group by all ) select *, rank() over(partition by week_number, team order by query_cnt desc ) as rk from query_cnt_by_team qualify rk = 1 order by 1, 2, 4 desc -- 3. (2번 문제에서 사용한 주차별 쿼리 사용) 쿼리를 실행한 시점 기준 1주 전에 쿼리 실행 수를 별도의 컬럼으로 확인할 수 있는 쿼리를 작성해주세요. with query_cnt_by_team as( select extract(WEEK from query_date) as week_number, team, user, count(user) as query_cnt from advanced.query_logs group by all ) select *, lag(query_cnt, 1) over(partition by user order by week_number) as prev_week_query_cnt from query_cnt_by_team -- 4. 시간의 흐름에 따라 일자별로 유저가 실행한 누적 쿼리 수를 작성해주세요. select *, sum(query_cnt) over (partition by user order by query_date rows between unbounded preceding and current row) as cumulative_sum from ( select query_date, user, count(user) as query_cnt from advanced.query_logs group by all ) order by 2, 1 -- 5. 다음 데이터는 주문 횟수를 나타낸 데이터입니다. 만약 주문 횟수가 없다면 NULL로 기록됩니다. 이런 데이터에서 NULL 값이라고 되어있는 부분을 바로 이전 날짜의 값으로 채워주는 쿼리를 작성해주세요. WITH raw_data AS ( SELECT DATE '2024-05-01' AS date, 15 AS number_of_orders UNION ALL SELECT DATE '2024-05-02', 13 UNION ALL SELECT DATE '2024-05-03', NULL UNION ALL SELECT DATE '2024-05-04', 16 UNION ALL SELECT DATE '2024-05-05', NULL UNION ALL SELECT DATE '2024-05-06', 18 UNION ALL SELECT DATE '2024-05-07', 20 UNION ALL SELECT DATE '2024-05-08', NULL UNION ALL SELECT DATE '2024-05-09', 13 UNION ALL SELECT DATE '2024-05-10', 14 UNION ALL SELECT DATE '2024-05-11', NULL UNION ALL SELECT DATE '2024-05-12', NULL ), raw_data2 as( select *, last_value(raw_data.number_of_orders ignore nulls) over(order by date) as last_value_orders from raw_data ) select * from raw_data2 WITH raw_data AS ( SELECT DATE '2024-05-01' AS date, 15 AS number_of_orders UNION ALL SELECT DATE '2024-05-02', 13 UNION ALL SELECT DATE '2024-05-03', NULL UNION ALL SELECT DATE '2024-05-04', 16 UNION ALL SELECT DATE '2024-05-05', NULL UNION ALL SELECT DATE '2024-05-06', 18 UNION ALL SELECT DATE '2024-05-07', 20 UNION ALL SELECT DATE '2024-05-08', NULL UNION ALL SELECT DATE '2024-05-09', 13 UNION ALL SELECT DATE '2024-05-10', 14 UNION ALL SELECT DATE '2024-05-11', NULL UNION ALL SELECT DATE '2024-05-12', NULL ), raw_data2 as( select *, last_value(raw_data.number_of_orders ignore nulls) over(order by date) as last_value_orders from raw_data ) --6. 5번 문제에서 NULL을 채운 후, 2일 전 ~ 현재 데이터의 평균을 구하는 쿼리를 작성해주세요(이동평균) select * except(number_of_orders), avg(last_value_orders) over (order by date rows between 2 preceding and current row) as moving_avg from raw_data2 --7. app_logs 테이블에서 Custom session을 만들어 주세요. 이전 이벤트 로그와 20초가 지나면 새로운 session을 만들어 주세요. session은 숫자로 (1, 2, 3..) 표시해도 됩니다. -- 2022-08-18의 user_pseudo_id(1997494153.8491999091)은 session_id가 4까지 나옵니다 with base as( select event_date, datetime(timestamp_micros(event_timestamp), 'Asia/Seoul') as event_datetime, event_name, user_id, user_pseudo_id from advanced.app_logs where event_date = "2022-08-18" and user_pseudo_id = "1997494153.8491999091" order by event_timestamp ), diff_data as ( select *, datetime_diff(event_datetime, prev_event_datetime, second) as second_diff from ( select *, lag(event_datetime, 1) over(partition by user_pseudo_id order by event_datetime) as prev_event_datetime from base order by event_datetime ) ) select *, sum(session_start) over(partition by user_pseudo_id order by event_datetime) as session_number from ( select *, case when prev_event_datetime is null then 1 when second_diff >= 20 then 1 else null end as session_start from diff_data )
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
빠짝스터디 2주차 과제 윈도우 함수
윈도우 함수 연습문제 1select *, count(*) over(partition by user) as total_query_cnt from `advanced.query_logs`;윈도우 함수 연습문제 2select *, rank() over(partition by team, week_number order by query_cnt desc) as team_rank from ( select user, week_number, team, count(*) as query_cnt from ( select *,extract(week from query_date) as week_number from `advanced.query_logs` ) group by all ) qualify team_rank=1;윈도우 함수 연습문제3select *, lag(query_cnt) over(partition by user order by week_number asc) as prev_week_query_cnt from ( select user, week_number, team, count(*) as query_cnt from ( select *,extract(week from query_date) as week_number from `advanced.query_logs` ) group by all );윈도우 함수 연습문제4select *, sum(query_count) over(partition by user order by query_date asc) as cumulative_query_cnt from ( select user, query_date, team, count(*) as query_count from `advanced.query_logs` group by all );윈도우 함수 연습문제5SELECT date, case when number_of_orders is null then lag(number_of_orders) over(order by date asc) else number_of_orders end as number_of_orders FROM raw_data;윈도우 함수 연습문제6select *, avg(number_of_orders) over(order by date asc rows between 2 preceding and current row) as moving_average from ( SELECT date, case when number_of_orders is null then lag(number_of_orders) over(order by date asc) else number_of_orders end as number_of_orders, FROM raw_data ) ;윈도우 함수 연습문제7select *, sum(session_start) over(partition by user_pseudo_id order by event_timestamp asc) as session_id from ( select *, case when time_diff is null then 1 when time_diff >= 20 then 1 else null end as session_start from ( select *, cast((event_timestamp - before_event_timestamp)/1000000 as int) as time_diff from ( select event_date, event_timestamp, event_name, user_id, user_pseudo_id, lag(event_timestamp) over(partition by user_pseudo_id order by event_timestamp asc) as before_event_timestamp from `advanced.app_logs` where user_pseudo_id='1997494153.8491999091' and event_date='2022-08-18' ) ) );
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
윈도우 함수 연습 문제
윈도우함수(탐색함수) 연습문제연습문제1-- 문제 1) user들의 다음 접속 월과 다다음 접속 월을 구하는 쿼리를 작성해주세요 select user_id, visit_month, lead(visit_month) over (partition by user_id order by visit_month asc) as after_visit_month, lead(visit_month, 2) over (partition by user_id order by visit_month asc) as after_visit_month from `advanced.analytics_function_01` order by user_id;연습문제2-- 문제2) user들의 다음 접속 월과 다다음 접속 월, 이전 접속 월을 구하는 쿼리를 작성해주세요. select *, lead(visit_month) over (partition by user_id order by visit_month asc) as after_visit_month, lead(visit_month, 2) over (partition by user_id order by visit_month asc) as after_two_visit_month, lag(visit_month) over(partition by user_id order by visit_month asc) as before_visit_month from `advanced.analytics_function_01` order by user_id, visit_month;연습문제3 -- 3번 유저가 접속했을 때 다음 접속까지의 간격을 구하시오 select *, after_visit_month - visit_month as diff from( select *, lead(visit_month, 1) over (partition by user_id order by visit_month) as after_visit_month from `advanced.analytics_function_01` ) order by user_id, visit_month;윈도우 함수 frame 연습문제SELECT * , SUM(amount) OVER(ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS amount_total , SUM(amount) OVER(ORDER BY order_id) AS cumulative_sum , SUM(amount) OVER(PARTITION BY user_id ORDER BY order_id) AS cumulative_sum_by_user , AVG(amount) OVER(ORDER BY order_id ROWS BETWEEN 5 PRECEDING AND 1 PRECEDING) AS last_5_orders_avg_amount FROM `advanced.orders` ORDER BY order_id 윈도우 함수 연습문제연습문제1-- 1. 사용자별 쿼리를 실행한 총 횟수를 구하는 쿼리를 작성해주세요 단, group by를 사용해서 집계하는것이 아닌 quary_log의 데이터의 우측에 새로운 컬럼을 작성해주세요 select *, count(query_date) over (partition by user) as total_query_cnt from `advanced.query_logs` order by user,query_date;연습문제2-- 2. 주차별로 팀 내에서 쿼리를 많이 실행한수를 구한후 실행 수를 활요해 랭킹을 구해주세요. 단 랭킹이 1등인 사람만 보여주세요. with query_cnt_by_team AS ( select extract(week FROM query_date) as week_number, team, user, count(user) as query_cnt from `advanced.query_logs` group by all ) select *, rank() over(partition by week_number, team order by query_cnt desc) AS rk from query_cnt_by_team qualify rk = 1 order by week_number, team, query_cnt desc; 연습문제3-- 3. (2번문제에서 사용한 주차별 쿼리 사용) 쿼리를 실행한 시점 기준 1주 전에 쿼리 실행 수를 별도의 컬럼으로 확인할수 있는 쿼리를 작성해주세요. with query_cnt_by_team AS ( select extract(week FROM query_date) as week_number, team, user, count(user) as query_cnt from `advanced.query_logs` group by all ) select *, lag(query_cnt, 1) over(partition by user order by week_number) as prev_week_query_cnt from query_cnt_by_team ; 연습문제4-- 4. 시간에 흐름에 따라 일자별로 유저가 실행한 누적 쿼리 수를 작성해주세요(query_date) select *, sum(query_cnt) over(partition by user order by query_date) as cumulative_sum, sum(query_cnt) over(partition by user order by query_date rows between unbounded preceding and current row) as cumulative_sum2 from( select query_date, team, user, count(user) as query_cnt from `advanced.query_logs` group by all ) order by user, query_date;연습문제5-- 5. 다음 데이터는 주문 횟수를 나타낸 데이터입니다. 만약 주문 횟수가 없으면 null로 기록됩니다.이런 데이터에서 null값이라고 되어 있는 부분은 바로 이전 날짜의 값으로 채워주는 쿼리를 작성해주세요 WITH raw_data AS ( SELECT DATE '2024-05-01' AS date, 15 AS number_of_orders UNION ALL SELECT DATE '2024-05-02', 13 UNION ALL SELECT DATE '2024-05-03', NULL UNION ALL SELECT DATE '2024-05-04', 16 UNION ALL SELECT DATE '2024-05-05', NULL UNION ALL SELECT DATE '2024-05-06', 18 UNION ALL SELECT DATE '2024-05-07', 20 UNION ALL SELECT DATE '2024-05-08', NULL UNION ALL SELECT DATE '2024-05-09', 13 UNION ALL SELECT DATE '2024-05-10', 14 UNION ALL SELECT DATE '2024-05-11', NULL UNION ALL SELECT DATE '2024-05-12', NULL ) select *, last_value(number_of_orders ignore nulls) over(order by date) as last_value_orders from raw_data;연습문제6-- 6. 5번 문제에서 null을 채운후 2일전 ~ 현재의 데이터의 평균을 구하는 쿼리를 작성해주세요(이동평균) WITH raw_data AS ( SELECT DATE '2024-05-01' AS date, 15 AS number_of_orders UNION ALL SELECT DATE '2024-05-02', 13 UNION ALL SELECT DATE '2024-05-03', NULL UNION ALL SELECT DATE '2024-05-04', 16 UNION ALL SELECT DATE '2024-05-05', NULL UNION ALL SELECT DATE '2024-05-06', 18 UNION ALL SELECT DATE '2024-05-07', 20 UNION ALL SELECT DATE '2024-05-08', NULL UNION ALL SELECT DATE '2024-05-09', 13 UNION ALL SELECT DATE '2024-05-10', 14 UNION ALL SELECT DATE '2024-05-11', NULL UNION ALL SELECT DATE '2024-05-12', NULL ) , filled_data as( select * except(number_of_orders), last_value(number_of_orders ignore nulls) over(order by date) as number_of_orders from raw_data ) select *, avg(number_of_orders) over (order by date rows between 2 preceding and current row) as moving_avg from filled_data연습문제7-- 7) app_logs 테이블에서 custom session을 만들어 주세요 이전 이벤트 로그와 20초가 지나면 새로운 session을 만들어 주세요 session 숫자로 (1,2,3 ...)표시해도 됩니다. -- 2022-08-18일의 user_pseudo_id(1997494153.8491999091)은 session_id가 4까지 나옵니다. with base as( select event_date, datetime(timestamp_micros(event_timestamp), 'Asia/Seoul') as event_datetime, event_name, user_id, user_pseudo_id from advanced.app_logs where event_date = "2022-08-18" and user_pseudo_id = "1997494153.8491999091" ), diff_Data as( select *, from( select *, lag(event_datetime, 1) over(partition by user_pseudo_id order by event_datetime) as prev_event_datetime from base ) ) select *, sum(session_start) over (partition by user_pseudo_id order by event_datetime) as session_num from( select *, case when prev_event_datetime is null then 1 when second_diff >= 20 then 1 else 0 end as session_start from diff_data ) order by event_datetime
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[빠짝스터디 2주차 과제] 윈도우 함수 연습문제
윈도우 함수의 탐색 함수 : LEAD, LAG, FIRST_VALUE, LAST_VALUE## 문제1) user들의 다음 접속 월과 다다음 접속 월을 구하는 쿼리를 작성해주세요. -- SELECT -- user_id -- , visit_month -- , LEAD(visit_month, 1) OVER (PARTITION BY user_id ORDER BY visit_month) AS after_month -- , LEAD(visit_month, 2) OVER (PARTITION BY user_id ORDER BY visit_month) AS after_next_month -- FROM `advanced.analytics_function_01` -- ORDER BY user_id ## 문제2) user들의 다음 접속 월과 다다음 접속 월,이전 접속 월을 구하는 쿼리를 작성해주세요. -- LAG 함수를 사용할 때 NULL이 나온다 => 그 값은 처음이다! -- LEAD 함수를 사용할 때 NULL이 나온다 => 그 값은 마지막이다! -- SELECT -- user_id -- , visit_month -- , LEAD(visit_month, 1) OVER (PARTITION BY user_id ORDER BY visit_month) AS after_month -- , LEAD(visit_month, 2) OVER (PARTITION BY user_id ORDER BY visit_month) AS after_next_month -- , LAG(visit_month) OVER (PARTITION BY user_id ORDER BY visit_month) AS before_month -- FROM `advanced.analytics_function_01` -- ORDER BY user_id ## 3번 : 유저가 접속했을 때, 다음 접속까지의 간격을 구하시오 -- user_id | visit_month | after_visit_month | diff_month -- SELECT -- user_id -- , visit_month -- , LEAD(visit_month, 1) OVER (PARTITION BY user_id ORDER BY visit_month) AS after_month -- , LEAD(visit_month) OVER (PARTITION BY user_id ORDER BY visit_month) - visit_month AS diff_month -- FROM `advanced.analytics_function_01` -- ORDER BY user_id ## 윈도우 함수를 이렇게 쓰는게 좋을까? => 중복된 쿼리는 줄이는 것이 좋을 수 있음 -- 쿼리를 수정할 상황이 생김 => 2번 수정 => 굉장히 많아지면 복잡해지고, 실수하기 좋음 -- 쿼리가 길어지는 것을 무서워하지 말고, 쿼리를 덜 수정할 수 있는 구조를 만들면 좋겠다! -- 윈도우 함수 쓰다보면 쿼리 줄이 길어짐. 감안하고 사용하면 좋겠다 -- -- 그래서 서브쿼리로 만들어보면,, -- SELECT -- * -- , (after_month - visit_month) AS diff_month -- FROM ( -- SELECT -- * -- , LEAD(visit_month) OVER (PARTITION BY user_id ORDER BY visit_month) AS after_month -- FROM `advanced.analytics_function_01` -- ) -- ORDER BY user_id ## 추가 문제 : 이 데이터셋을 기준으로 user_id의 첫번째 방문 월, 마지막 방문 월을 구하는 쿼리를 작성해주세요 SELECT user_id , visit_month , FIRST_VALUE(visit_month) OVER (PARTITION BY user_id ORDER BY visit_month) AS first_visit , LAST_VALUE(visit_month) OVER (PARTITION BY user_id ORDER BY visit_month) AS last_visit FROM `advanced.analytics_function_01` ORDER BY user_id, visit_month윈도우 함수 Frame 연습문제-- amount_total : 전체 SUM -- cumulative_sum : row 시점에 누적 SUM -- cumulative_sum_by_user : row 시점에 유저별 누적 SUM -- last_5_orders_avg_amount : order_id 기준으로 정렬하고, 직전 5개의 주문의 평균 amount -- 집계분석함수() OVER(PARTITION BY ~~~ ORDER BY ~~~ ROWS BETWEEN A and B) SELECT * , SUM(amount) OVER() AS amount_total ## OVER 안에 아무것도 들어가지 않는 경우도 있구나! , SUM(amount) OVER(ORDER BY order_id) AS cumulative_sum , SUM(amount) OVER(PARTITION BY user_id ORDER BY order_id) AS cumulative_sum_by_user , AVG(amount) OVER(ORDER BY order_id ROWS BETWEEN 5 PRECEDING AND 1 PRECEDING) AS last_5_orders_avg_amount -- , AVG(amount) OVER(ORDER BY order_id ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) -- , AVG(amount) OVER(ORDER BY order_id ROWS BETWEEN 5 PRECEDING AND CURRENT ROW) FROM advanced.orders ORDER BY order_id 윈도우 함수의 연습문제## 윈도우 함수 연습문제 ## 1) 사용자별 쿼리를 실행한 총 횟수를 구하는 쿼리를 작성해주세요. 단, GROUP BY를 사용해서 집계하는 것이 아닌 query_logs의 데이터의 우측에 새로운 컬럼을 만들어주세요. -- 사용자별 쿼리를 실행한 총 횟수 : COUNT() 전체 실행. -- OVER(PARTITION BY user) -- SELECT -- * -- , COUNT(query_date) OVER(PARTITION BY user) AS total_query_cnt -- FROM `advanced.query_logs` -- ORDER BY query_date ## 2) 주차별로 팀 내에서 쿼리를 많이 실행한 수를 구한 후, 실행한 수를 활용해 랭킹을 구해주세요. 단, 랭킹이 1등인 사람만 결과가 보이도록 해주세요. -- 주차별로 개인당 실행한 쿼리 횟수 -- 위 쿼리 횟수를 기반으로 랭킹 -- 랭킹을 기반으로 필터링(랭킹=1) -- 문제의 의도 : 원본 데이터 => 1 row마다 데이터가 있고, 그걸 집계해서 사용. GROUP BY => 윈도우 함수 -- SELECT -- * -- , RANK() OVER(PARTITION BY week_number, team ORDER BY query_cnt DESC) AS team_rank -- FROM ( -- SELECT -- EXTRACT(week from query_date) AS week_number, -- team, -- user, -- COUNT(query_date) AS query_cnt -- FROM `advanced.query_logs` -- GROUP BY ALL -- ) -- QUALIFY team_rank = 1 -- ORDER BY week_number, team -- ## ## WITH 문 풀이 -- WITH query_cnt_by_team AS ( -- SELECT -- EXTRACT(week from query_date) AS week_number, -- team, -- user, -- COUNT(query_date) AS query_cnt -- FROM `advanced.query_logs` -- GROUP BY ALL -- ) -- SELECT -- *, -- RANK() OVER(PARTITION BY week_number, team ORDER BY query_cnt DESC) AS rk -- FROM query_cnt_by_team -- -- QUALIFY : 윈도우 함수의 조건을 설정할 때 사용 -- QUALIFY rk = 1 -- ORDER BY week_number, team, query_cnt DESC -- -- 16주차 - AI팀의 케이피, 16주차 - 코칭팀의 카일, 16주차 - 데이터 사이언스팀의 샘 ## 3)(2번 문제에서 사용한 주차별 쿼리 사용) 쿼리를 실행한 시점 기준 1주 전에 쿼리 실행 수를 별도의 컬럼으로 확인할 수 있는 쿼리를 작성해주세요 -- 1주 전의 쿼리 실행 수 => LAG -- WITH query_cnt_by_team AS ( -- SELECT -- EXTRACT(week from query_date) AS week_number, -- team, -- user, -- COUNT(query_date) AS query_cnt -- FROM `advanced.query_logs` -- GROUP BY ALL -- ) -- SELECT -- user, -- team, -- week_number, -- query_cnt, -- LAG(query_cnt) OVER(PARTITION BY user ORDER BY week_number) AS prev_week_query_cnt -- FROM query_cnt_by_team -- ORDER BY user -- -- ans T) -- WITH query_cnt_by_team AS ( -- SELECT -- EXTRACT(week from query_date) AS week_number, -- team, -- user, -- COUNT(query_date) AS query_cnt -- FROM `advanced.query_logs` -- GROUP BY ALL -- ) -- SELECT -- *, -- LAG(query_cnt, 1) OVER(PARTITION BY user ORDER BY week_number) AS prev_week_query_cnt -- FROM query_cnt_by_team ## 4) 시간의 흐름에 따라, 일자별로 유저가 실행한 누적 쿼리 수를 작성해주세요. -- 누적 쿼리 : 과거의 시간(UNBOUNDED PRECEDING)부터 current row까지 -- 출제 의도 : Default Frame에 대해 알려드리고 싶었음. -- For aggregate analytic functions, if the ORDER BY clause is present but the window frame clause is not, the following window frame clause is used by default: -- SELECT -- *, -- SUM(query_cnt) OVER(PARTITION BY user ORDER BY query_date) AS cumulative_sum -- FROM ( -- SELECT -- user, -- team, -- query_date, -- COUNT(query_date) AS query_cnt -- FROM `advanced.query_logs` -- GROUP BY ALL -- ) -- -- ans T) -- SELECT -- *, -- SUM(query_cnt) OVER(PARTITION BY user ORDER BY query_date) AS cumulative_sum, -- SUM(query_cnt) OVER(PARTITION BY user ORDER BY query_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS cumulative_sum2 -- -- Frame의 Default 값 : UNBOUNDED PRECEDING ~ CURRENDT ROW -- FROM ( -- SELECT -- query_date, -- team, -- user, -- COUNT(user) AS query_cnt -- FROM `advanced.query_logs` -- GROUP BY ALL -- ) -- -- QUALIFY cumulative_sum != cumulative_sum2 -- -- -- WHERE, QUALIFY 조건 설정해서 2가지 값이 모두 같은지 비교 => 모두 같으면 != 연산 결과에 반환하는 값이 없을 -- ORDER BY user, query_date ## 5) 다음 데이터는 주문횟수를 나타낸 데이터입니다. 만약 주문 횟수가 없으면 NULL로 기록됩니다. 이런 데이터에서 NULL값이라고 되어있는 부분을 바로 이전 날짜의 값으로 채워주는 쿼리를 작성해주세요. -- LAG로 직전 값 가져오면 되지 않을까? -- number_of_orders가 null 이면, before_number_of_orders를 가져와라! -- 아래 쿼리는 어려운 방법 -- 그 다음 방법 : LAST VALUE를 쓰자! => 값이 없으면 NULL이 뜬다 ! -- FIRST_VALUE, LAST_VALUE => NULL 을 포함해서 연산 -- 출제 의도 : NULL 을 제외해서 연산하고 싶으면 IGNORE NULLS을 쓰면 된다 ! -- WITH raw_data AS ( -- SELECT DATE '2024-05-01' AS date, 15 AS number_of_orders UNION ALL -- SELECT DATE '2024-05-02', 13 UNION ALL -- SELECT DATE '2024-05-03', NULL UNION ALL -- SELECT DATE '2024-05-04', 16 UNION ALL -- SELECT DATE '2024-05-05', NULL UNION ALL -- SELECT DATE '2024-05-06', 18 UNION ALL -- SELECT DATE '2024-05-07', 20 UNION ALL -- SELECT DATE '2024-05-08', NULL UNION ALL -- SELECT DATE '2024-05-09', 13 UNION ALL -- SELECT DATE '2024-05-10', 14 UNION ALL -- SELECT DATE '2024-05-11', NULL UNION ALL -- SELECT DATE '2024-05-12', NULL -- ), -- -- SELECT -- -- date, -- -- IFNULL(number_of_orders, last_value_orders) AS numbers_of_orders -- -- FROM ( -- -- SELECT -- -- *, -- -- -- LAG(number_of_orders) OVER(ORDER BY date) AS prev_orders, ## 마지막 값 NULL !! -- -- LAST_VALUE(number_of_orders IGNORE NULLS) OVER(ORDER BY date) AS last_value_orders -- -- FROM raw_data -- -- ) -- -- -- ans T) -- filled_data AS( -- SELECT -- * EXCEPT(number_of_orders), -- LAST_VALUE(number_of_orders IGNORE NULLS) OVER(ORDER BY date) AS number_of_orders -- FROM raw_data -- ) -- ## 6) 5번 문제에서 NULL을 채운 후, 2일 전 ~ 현재 데이터의 평균을 구하는 쿼리를 작성해주세요 (이동평균) -- -- Frame : 2일 전 => BTWEEN 2 PRECEDING AND CURRENT ROW -- -- 출제 의도 : Frame을 지정할 수 있는가? -- SELECT -- *, -- AVG(number_of_orders) OVER(ORDER BY date ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS moving_avg -- FROM filled_data ## 7) app_logs 테이블에서 CustomSession을 만들어주세요. 이전 이벤트 로그와 20초가 지나면 새로운 Session을 만들어주세요. Session은 숫자로(1,2,3…) 표시해도 됩니다. ## 2022-08-18일의 user_pseudo_id(1997494153.8491999091)은 session_id가 4까지 나옵니다 WITH base as ( SELECT event_date, event_timestamp, DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul') as event_datetime, event_name, user_id, user_pseudo_id FROM advanced.app_logs WHERE event_date = '2022-08-18' and user_pseudo_id = '1997494153.8491999091' ), base2 as ( SELECT *, LAG(event_datetime, 1) OVER(PARTITION BY user_pseudo_id ORDER BY event_datetime) as before_event_datetime FROM base ) SELECT *, SUM(session_start) OVER(PARTITION BY user_pseudo_id ORDER BY event_timestamp) as session_id FROM ( SELECT *, IF(second_diff is NULL or second_diff > 20, 1, NULL) as session_start FROM ( SELECT *, DATETIME_DIFF(event_datetime, before_event_datetime, second) as second_diff FROM base2 ) ) ORDER BY event_timestamp
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[빠짝스터디 2주차 과제] 윈도우 함수 연습 문제
1. 탐색함수 연습문제 문제 1. user들의 다음 접속 월과 다다음 접속 월을 구하는 쿼리SELECT user_id, visit_month, lead(visit_month) OVER (PARTITION BY user_id ORDER BY visit_month) AS lead_visit_month, lead(visit_month,2) OVER (PARTITION BY user_id ORDER BY visit_month) AS lead_visit_month2 FROM `advanced.analytics_function_01` ORDER BY user_id문제2. user들의 다음 접속 월과 다다음 접속 월, 이전 접속 월을 구하는 쿼리 SELECT user_id, visit_month, lead(visit_month,1) OVER (PARTITION BY user_id ORDER BY visit_month) AS lead_visit_month, lead(visit_month,2) OVER (PARTITION BY user_id ORDER BY visit_month) AS lead_visit_month2, lag(visit_month,1) OVER (PARTITION BY user_id ORDER BY visit_month) AS lag_visit_month FROM `advanced.analytics_function_01` ORDER BY user_id, visit_month # LAG 함수를 사용할 때 NULL 이면 그 값은 처음, # LEAD 함수를 사용할 때 NULL 이면 그 값은 마지막 문제3. 유저가 접속했을 때, 다음 접속까지의 간격을 구하기SELECT user_id, visit_month, lead(visit_month,1) OVER (PARTITION BY user_id ORDER BY visit_month) AS lead_visit_month, ((lead(visit_month,1) OVER (PARTITION BY user_id ORDER BY visit_month) ) - visit_month) AS diff_month # 별칭쓴거는 select 안에서 못함 FROM `advanced.analytics_function_01` ORDER BY user_id, visit_month # 이 경우에는 쿼리 많은경우에 수정할 경우 헷갈리기 시작한다. 따라서 서브쿼리로 묶어서 하면 더 편하고 실수 적어짐 # 쿼리 길어진다고 해도 무서워하지 말고 쿼리 덜 수정할 수 있는 구조를 만들자. # 윈도우 함수 쓰면 줄이 쿼리 길어짐. 감안하고 쓰자.문제 4. 이 데이터셋을 기준으로 user_id의 첫번째 접근 월을 구하는 쿼리를 작성하기# 마지막 추가 문제 SELECT *, FIRST_VALUE(visit_month) OVER(PARTITION BY user_id ORDER BY visit_month) AS First_Value, LAST_VALUE(visit_month) OVER(PARTITION BY user_id ORDER BY visit_month) AS Last_Value FROM ( SELECT *, lead(visit_month,1) OVER (PARTITION BY user_id ORDER BY visit_month) AS lead_visit_month FROM `advanced.analytics_function_01` ORDER BY user_id, visit_month ) ORDER BY user_id, visit_month 2. Frame 연습문제 문제 . 회사의 모든 주문량, 누적 주문량, 최근 직전5개 평균 주문량 구하기 SELECT * , SUM(amount) OVER (ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS amount_total, # OVER 안에 아무것도 없으면 전체 출력이다! SUM(amount) OVER (ORDER BY order_id ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS cumulative_sum, SUM(amount) OVER (PARTITION BY user_id ORDER BY order_id ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS cumulative_sum_by_user, AVG(amount) OVER (ORDER BY order_id ROWS BETWEEN 5 PRECEDING AND 1 PRECEDING) AS last_5_avg order by order_id 3. 연습문제문제 1. 사용자별 쿼리를 실행한 총 횟수를 구하는 쿼리를 작성하기. 단, GROUP BY를 사용해서 집계하는 것이 아닌 query_logs의 데이터의 우측에 새로운 컬럼을 만들기SELECT *, COUNT(user) OVER (PARTITION BY user ) AS cnt FROM advanced.query_logs ORDER BY query_date, user문제 2. 주차별로 팀 내에서 쿼리를 많이 실행한 수를 구한 후, 실행한 수를 활용해 랭킹을 구하기. 단, 랭킹이 1등인 사람만 결과가 보이도록 하기 # query_date를 바탕으로 주차별로 구분하여 WITH함수로 묶기 WITH week_number_table AS ( SELECT *, EXTRACT(WEEK FROM query_date) AS week_number, # 하루에 유저별 쿼리 이용수 추출 COUNT(*) AS query_cnt, FROM advanced.query_logs GROUP BY ALL ) # 팀 내에서 유저별로 랭킹 구하기 SELECT week_number, team, query_date, query_cnt, RANK() OVER (PARTITION BY week_number,team ORDER BY query_cnt DESC) AS team_rank FROM week_number_table # 1등인 유저만 출력 QUALIFY team_rank = 1 ORDER BY week_number, team 문제 3. (2번 문제에서 사용한 주차별 쿼리 사용) 쿼리를 실행한 시점 기준 1주 전에 쿼리 실행 수를 별도의 컬럼으로 확인할 수 있는 쿼리를 작성하기# 2번쿼리 WITH week_number_table AS ( SELECT *, EXTRACT(WEEK FROM query_date) AS week_number, COUNT(*) AS query_cnt, FROM advanced.query_logs GROUP BY ALL ) SELECT *, # 1주전의 실행수 이므로 LAG를 이용해서 전 week_number의 실행수 구하기 LAG(query_cnt, 1) OVER(PARTITION BY user ORDER BY week_number) AS prev_week_query_count FROM week_number_table 문제 4. 시간의 흐름에 따라 일자별로 유저가 실행한 누적 쿼리 수를 작성하기# 시간의 흐름에 따라 일자별로 쿼리수 묶기 WITH user_query AS ( SELECT *, COUNT(*) AS query_count FROM advanced.query_logs GROUP BY ALL ) # 유저별로 누적 쿼리수 구하기 SELECT *, SUM(query_count) OVER (PARTITION BY user ORDER BY query_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS cumulative_query_count FROM user_query ORDER BY user # Frame 의 Default 값은 UNBOUNDED PRECEDING ~ CURRENT ROW 이다 # 데이터 정합성 확인 할때 2가지 값이 모두 같은지 비교하면 편리하다 ( WHERE, QUALIFY절에 활용 ) 문제 5. 만약 주문 횟수가 없으면 NULL로 기록됨. 이런 데이터에서 NULL값이라고 되어 있는 부분을 바로 이전 날짜의 값으로 채워주는 쿼리를 작성하기WITH raw_data AS ( SELECT DATE '2024-05-01' AS date, 15 AS number_of_orders UNION ALL SELECT DATE '2024-05-02', 13 UNION ALL SELECT DATE '2024-05-03', NULL UNION ALL SELECT DATE '2024-05-04', 16 UNION ALL SELECT DATE '2024-05-05', NULL UNION ALL SELECT DATE '2024-05-06', 18 UNION ALL SELECT DATE '2024-05-07', 20 UNION ALL SELECT DATE '2024-05-08', NULL UNION ALL SELECT DATE '2024-05-09', 13 UNION ALL SELECT DATE '2024-05-10', 14 UNION ALL SELECT DATE '2024-05-11', NULL UNION ALL SELECT DATE '2024-05-12', NULL ) # CASE문을 이용해서 주문량 NULL 값이면 전 날짜의 주문량으로 대치 SELECT *, (CASE WHEN number_of_orders IS NULL THEN LAG(number_of_orders,1) OVER (ORDER BY date) ELSE number_of_orders END ) AS number_of_orders2 FROM raw_data # BUT 맨 마지막값 NULL이 연속2번이라 NULL이 나옴 # LAST_VALUE의 IGNORE NULLS 쓰기 SELECT *, LAST_VALUE(number_of_orders IGNORE NULLS) OVER (ORDER BY date) AS number_of_orders2 FROM raw_data 문제 6. 5번 문제에서 NULL을 채운 후, 2일전 ~ 현재 데이터의 평균을 구하는 쿼리를 작성하기(이동평균)SELECT date, number_of_orders2, AVG(number_of_orders2) OVER (ORDER BY date ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS moving_agv # NULL값 처리한 테이블 서브쿼리로 묶은 다음에 윈도우 함수써서 2틀전~현재 평균 출력하기 FROM ( SELECT *, LAST_VALUE(number_of_orders IGNORE NULLS) OVER (ORDER BY date) AS number_of_orders2, FROM raw_data ) 문제 7. app_logs 테이블에서 custom session을 만들기. 이전 이벤트 로그와 20초가 지나면 새로운 session을 만들기. session은 숫자로 (1, 2, 3 …) 표시해도 됨.WITH START AS ( SELECT event_date, # timestamp 를 서울 표준 시간으로 바꾸기 DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul') AS event_datetime, event_name, user_id, user_pseudo_id FROM advanced.app_logs WHERE event_date = '2022-08-16' ), # 기존 시간의 차이 구하기 DIFF_DATE AS ( SELECT *, DATETIME_DIFF(event_datetime, before_event_datetime, SECOND) AS second_diff FROM ( SELECT # 전(LAG) 시간을 불러오기 위해 서브쿼리로 묶음 *, LAG(event_datetime, 1 ) OVER ( PARTITION BY user_pseudo_id ORDER BY event_datetime) AS before_event_datetime FROM START) ORDER BY event_datetime) # session_start를 누적합 이용하여 session_number 구하 SELECT *, SUM(session_start) OVER (PARTITION BY user_pseudo_id ORDER BY event_datetime) AS session_number FROM ( SELECT *, # CASE문을 써서 처음 시작부분 (NULL) 1로 바꾸기 (CASE WHEN second_diff IS NULL THEN 1 # second_diff 가 20초 이상이면 1 아니면 0 WHEN second_diff >= 20 THEN 1 ELSE 0 END) AS session_start FROM DIFF_DATE)
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[인프런 빅쿼리 빠짝스터디 2주차] 윈도우 함수, FRAME설정, QUALITY
윈도우 탐색 함수 연습문제(1) 연습문제 1-- 문제 1) USER의 다음 접속월, 다다음 접속 월 SELECT user_id, visit_month, LEAD(visit_month,1) OVER(PARTITION BY user_id ORDER BY visit_month) AS next_month, LEAD(visit_month,2) OVER(PARTITION BY user_id ORDER BY visit_month) AS the_month_after_next FROM `avdanced.analytics_function_01` (2) 연습문제 2-- 문제 2) USER의 다음 접속월, 다다음 접속 월, 이전 접속 월 SELECT user_id, visit_month, LEAD(visit_month,1) OVER(PARTITION BY user_id ORDER BY visit_month) AS next_month, LEAD(visit_month,2) OVER(PARTITION BY user_id ORDER BY visit_month) AS the_month_after_next, LAG(visit_month,1) OVER(PARTITION BY user_id ORDER BY visit_month) AS last_month FROM `avdanced.analytics_function_01` 윈도우 함수 FRAME 연습문제연습문제 (1~4)SELECT -- 1)모든 주문량 SUM(amount) OVER() AS amount_total, -- 2)특정주문시점에서 누적주문량 #SUM(amount) OVER(partition by order_date) AS cumulative_sum, SUM(amount) OVER (ORDER BY order_date) AS cumulative_sum, -- 3)고객별 주문 시점에서 누적 주문량 #SUM(amount) OVER(partition by user_id) AS cumulative_sum_by_user, SUM(amount) OVER(partition by user_id ORDER BY order_id) AS cumulative_sum_by_user, -- 4) 최근 직전 5개 평균 주문량 AVG(amount) OVER(ROWS BETWEEN 5 PRECEDING AND 1 PRECEDING) AS last_5_orders_avg_amount, AVG(amount) OVER(ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) AS last_5_unbounded_orders_avg_amount, AVG(amount) OVER(ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS all_orders_avg_amount FROM `avdanced.orders` 윈도우 함수(1) 연습문제 1-- 연습문제1) 사용자별 쿼리 실행 횟수 WITH base AS( SELECT user, team, query_date, COUNT(*) OVER(PARTITION BY user) AS total_query_cnt, FROM `avdanced.query_logs` ) SELECT * FROM base(2) 연습문제 2-- 연습문제2) 주차별 팀내 쿼리 실행한 수 (RANK 1만 보이도록) WITH base2 AS( SELECT EXTRACT(WEEK FROM query_date) AS week_number, team, user, COUNT(*) OVER(PARTITION BY EXTRACT(WEEK FROM query_date) ,user ORDER BY EXTRACT(WEEK FROM query_date) ) AS query_cnt, FROM `avdanced.query_logs` ORDER BY EXTRACT(WEEK FROM query_date) ) SELECT DISTINCT *, RANK() OVER(PARTITION BY team,week_number ORDER BY total_query_cnt DESC) AS team_rank FROM base2 QUALIFY team_rank = 1 ORDER BY week_number, team(3) 연습문제 3WITH base2 AS( SELECT EXTRACT(WEEK FROM query_date) AS week_number, team, user, COUNT(*) OVER(PARTITION BY EXTRACT(WEEK FROM query_date) ,user ORDER BY EXTRACT(WEEK FROM query_date) ) AS query_cnt, FROM `avdanced.query_logs` #QUALIFY team_rank = 1 ORDER BY EXTRACT(WEEK FROM query_date) ), base3 AS( SELECT DISTINCT *, RANK() OVER(PARTITION BY team,week_number ORDER BY query_cnt DESC) AS team_rank FROM base2 QUALIFY team_rank = 1 ORDER BY week_number, team ) -- 연습문제3) 쿼리 실행 시점 1주전 쿼리 실행 SELECT DISTINCT *, LAG(query_cnt,1) OVER(PARTITION BY user ORDER BY week_number) AS prev_week_query_count FROM base2 GROUP BY ALL ORDER BY user, week_number(4) 연습문제 4--연습문제4) SELECT *, SUM(query_count) OVER(PARTITION BY user ORDER BY query_date) AS culmulative_query_count, SUM(query_cnt) OVER(PARTITION BY user ORDER BY query_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS cumulative_sum2 FROM( SELECT DISTINCT *, COUNT(user) OVER(PARTITION BY query_date, user) AS query_count, FROM `avdanced.query_logs` ) ORDER BY user,query_date (5) 연습문제 5WITH raw_data AS ( SELECT DATE '2024-05-01' AS date, 15 AS number_of_orders UNION ALL SELECT DATE '2024-05-02', 13 UNION ALL SELECT DATE '2024-05-03', NULL UNION ALL SELECT DATE '2024-05-04', 16 UNION ALL SELECT DATE '2024-05-05', NULL UNION ALL SELECT DATE '2024-05-06', 18 UNION ALL SELECT DATE '2024-05-07', 20 UNION ALL SELECT DATE '2024-05-08', NULL UNION ALL SELECT DATE '2024-05-09', 13 UNION ALL SELECT DATE '2024-05-10', 14 UNION ALL SELECT DATE '2024-05-11', NULL UNION ALL SELECT DATE '2024-05-12', NULL ) --연습문제 5) null에 이전 값 삽입 SELECT raw_data.date, IF(raw_data.number_of_orders IS NULL, LAG(raw_data.number_of_orders,1) OVER(ORDER BY date), raw_data.number_of_orders) FROM raw_data(6) 연습문제 6WITH raw_data AS ( SELECT DATE '2024-05-01' AS date, 15 AS number_of_orders UNION ALL SELECT DATE '2024-05-02', 13 UNION ALL SELECT DATE '2024-05-03', NULL UNION ALL SELECT DATE '2024-05-04', 16 UNION ALL SELECT DATE '2024-05-05', NULL UNION ALL SELECT DATE '2024-05-06', 18 UNION ALL SELECT DATE '2024-05-07', 20 UNION ALL SELECT DATE '2024-05-08', NULL UNION ALL SELECT DATE '2024-05-09', 13 UNION ALL SELECT DATE '2024-05-10', 14 UNION ALL SELECT DATE '2024-05-11', NULL UNION ALL SELECT DATE '2024-05-12', NULL ), null_is_lag AS( --연습문제 5) null에 이전 값 삽입 SELECT raw_data.date, IF(raw_data.number_of_orders IS NULL, LAG(raw_data.number_of_orders,1) OVER(ORDER BY date), raw_data.number_of_orders) AS number_of_orders FROM raw_data ) -- 연습문제 6) 이동평균 SELECT *, AVG(nl.number_of_orders) OVER(ORDER BY nl.date ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS moving_avg FROM null_is_lag AS nl(7) 연습문제 7-- 1. TIMESTAMP → DATETIME -- 2. SECOND_DIFF 생성 : uSER로 묶어서 - -- 3. SESSION_START생성 : USER로 묶어서 LAG(DATA,1)이 NULL이면 1, SECOND_DIFF가 20이상이면 +1 -- 4. SESSION_ID생성: SESSION_START가 1일 경우 SESSION_ID +1, NULL일 경우 LAG(DATA,1) WITH add_date AS ( -- 1. TIMESTAMP → DATETIME SELECT event_date, event_timestamp, DATETIME(TIMESTAMP_MICROS(event_timestamp)) AS event_datetime, event_name, user_id, user_pseudo_id, LAG(DATETIME(TIMESTAMP_MICROS(event_timestamp))) OVER(PARTITION BY user_pseudo_id ORDER BY event_timestamp) AS before_event_datetime FROM `avdanced.app_logs_temp` --,UNNEST(event_params) AS param -- FROM 절 안에서 UNNEST를 사용 WHERE event_date ="2022-08-18" AND user_pseudo_id = "1997494153.8491999091" ), add_diff AS ( -- 2. SECOND_DIFF 생성 : uSER로 묶어서 - SELECT *, DATE_DIFF(event_datetime, before_event_datetime,SECOND) AS second_diff, FROM add_date ), add_session AS( -- 3. SESSION_START생성 : USER로 묶어서 LAG(DATA,1)이 NULL이면 1, SECOND_DIFF가 20이상이면 +1 SELECT *, IF(second_diff IS NULL OR second_diff >=20, 1, NULL) AS session_start FROM add_diff ) -- 4. SESSION_ID생성 *, SUM(session_start) OVER(PARTITION BY user_pseudo_id ORDER BY event_datetime) AS session_num FROM add_session ORDER BY event_datetime
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[빠짝스터디 2주차 과제] 윈도우 함수 연습문제
1.--1) 사용자별 쿼리를 실행한 총 횟수를 구하는 쿼리를 작성해주세요. 단 Group By를 사용해서 집계하는 것이 아닌 query_logs의 데이터의 우측에 새로운 컬럼을 만들어주세요. select * , count(query_date) over(partition by user) as total_query_cnt from advanced.query_logs order by user, query_date2.--2) 주차별로 팀 내에서 쿼리를 많이 실행한 수를 구한 후, 실행한 수를 활용해 랭킹을 구해주세요. 단 랭킹이 1등인 사람만 결과가 보이도록 해주세요. select * , rank() over(partition by week_number, team order by query_cnt desc) as team_rank from ( select EXTRACT(WEEK FROM query_date) as week_number , team , user , count(user) as query_cnt from advanced.query_logs group by all ) as base qualify team_rank = 1 order by 1,2,3 3.--3) (2번 문제에서 사용한 주차별 쿼리 사용) 쿼리를 실행한 시점 기준 1주 전에 쿼리 실행 수를 별도의 컬럼으로 확인할 수 있는 쿼리를 작성해주세요. select * , lag(query_cnt) over(partition by user order by week_number) as prev_week_query_cnt from ( select EXTRACT(WEEK FROM query_date) as week_number , team , user , count(user) as query_cnt from advanced.query_logs group by all ) as base order by user4.--4) 시간의 흐름에 따라, 일자별로 유저가 실행한 누적 쿼리 수를 작성해주세요. select * , sum(query_cnt) over(partition by user order by query_date) as cumulative_query_cnt from ( select user , team , query_date , count(user) as query_cnt from advanced.query_logs group by all ) as base order by user5.--5) 다음 데이터는 주문 횟수를 나타낸 데이터입니다. 만약 주문 횟수가 없으면 NULL로 기록됩니다. 이런 데이터에서 NULL 값이라고 되어있는 부분을 바로 이전 날짜의 값으로 채워주는 쿼리를 작성해주세요. WITH raw_data AS ( SELECT DATE '2024-05-01' AS date, 15 AS number_of_orders UNION ALL SELECT DATE '2024-05-02', 13 UNION ALL SELECT DATE '2024-05-03', NULL UNION ALL SELECT DATE '2024-05-04', 16 UNION ALL SELECT DATE '2024-05-05', NULL UNION ALL SELECT DATE '2024-05-06', 18 UNION ALL SELECT DATE '2024-05-07', 20 UNION ALL SELECT DATE '2024-05-08', NULL UNION ALL SELECT DATE '2024-05-09', 13 UNION ALL SELECT DATE '2024-05-10', 14 UNION ALL SELECT DATE '2024-05-11', NULL UNION ALL SELECT DATE '2024-05-12', NULL ) select * , ifnull(number_of_orders,last_value(number_of_orders ignore nulls) over(order by date)) as last_value_orders from raw_data6.--6) 5번 문제에서 NULL을 채운 후, 2일 전 ~ 현재 데이터의 평균을 구하는 쿼리를 작성해주세요(이동 평균) WITH raw_data AS ( SELECT DATE '2024-05-01' AS date, 15 AS number_of_orders UNION ALL SELECT DATE '2024-05-02', 13 UNION ALL SELECT DATE '2024-05-03', NULL UNION ALL SELECT DATE '2024-05-04', 16 UNION ALL SELECT DATE '2024-05-05', NULL UNION ALL SELECT DATE '2024-05-06', 18 UNION ALL SELECT DATE '2024-05-07', 20 UNION ALL SELECT DATE '2024-05-08', NULL UNION ALL SELECT DATE '2024-05-09', 13 UNION ALL SELECT DATE '2024-05-10', 14 UNION ALL SELECT DATE '2024-05-11', NULL UNION ALL SELECT DATE '2024-05-12', NULL ) select date , number_of_orders , avg(number_of_orders) over(order by date rows between 2 preceding and current row) as moving_avg from (select date , ifnull(number_of_orders,last_value(number_of_orders ignore nulls) over(order by date)) as number_of_orders from raw_data) as base7.--7) app_logs 테이블에서 Custom Session을 만들어 주세요. 이전 이벤트 로그와 20초가 지나면 새로운 Session을 만들어 주세요. Session은 숫자로 (1,2,3...) 표시해도 됩니다 --2022-08-18일의 user_pseudo_id(1997494153.8491999091)은 session_id가 4까지 나옵니다. with base as ( select event_date , event_timestamp , datetime(timestamp_micros(event_timestamp), 'Asia/Seoul') as event_datetime , event_name , user_id , user_pseudo_id , lag(datetime(timestamp_micros(event_timestamp), 'Asia/Seoul')) over(partition by user_pseudo_id order by event_timestamp) as before_event_datetime from advanced.app_logs ) , diff_data as ( select * , if(second_diff > 20 or second_diff is null, 1, null) as session_start from ( select * , DATETIME_DIFF(event_datetime, before_event_datetime, second) as second_diff from base ) ) select * , sum(session_start) over(order by event_timestamp) as session_num from diff_data -- where event_date = '2022-08-18' -- and user_pseudo_id = '1997494153.8491999091' order by event_timestamp
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[빠짝스터디 2주차 과제] 윈도우 함수 연습문제
이번주는 평일 주말에 일이 많아서 따라가기 힘들었네요 ㅎㅎ그치만 마무리해서 올립니다!! #유저들의 다음 접속 월, 다다음 접속 월SELECT user_id,visit_month,LEAD(visit_month) OVER(PARTITION BY user_id ORDER BY visit_month),LEAD(visit_month, 2) OVER(PARTITION BY user_id ORDER BY visit_month)FROM advanced.analytics_function_01#유저들의 다음 접속월, 다다음 접속 월, 이전 접속 월SELECT user_id,visit_month,LEAD(visit_month) OVER(PARTITION BY user_id ORDER BY visit_month),LEAD(visit_month, 2) OVER(PARTITION BY user_id ORDER BY visit_month),LAG(visit_month) OVER(PARTITION BY user_id ORDER BY visit_month)FROM advanced.analytics_function_01ORDER BY user_id#diff 구하기SELECT user_id,visit_month,LEAD(visit_month) OVER(PARTITION BY user_id ORDER BY visit_month),LEAD(visit_month) OVER(PARTITION BY user_id ORDER BY visit_month) - visit_monthFROM advanced.analytics_function_01ORDER BY user_idSELECT after_visit_month - visit_monthFROM(SELECT user_id,visit_month,LEAD(visit_month) OVER(PARTITION BY user_id ORDER BY visit_month) AS after_visit_monthFROM advanced.analytics_function_01ORDER BY user_id)#윈도우함수SELECT *,SUM(amount) OVER () AS amount_total,SUM(amount) OVER (ORDER BY order_id ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS cumulative_sum,SUM(amount) OVER (PARTITION BY user_id ORDER BY order_id ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS cumulative_sum_user,AVG(amount) OVER (ORDER BY order_id ROWS BETWEEN 5 PRECEDING AND 1 PRECEDING) AS last_5_avgFROM advanced.ordersORDER BY order_id#윈도우함수 연습문제 1SELECT *, COUNT(user) OVER(PARTITION BY user)FROM advanced.query_logs#윈도우함수 연습문제 2WITH nums AS (SELECT EXTRACT(week FROM query_date) AS week_number,team,userFROM advanced.query_logs), numss AS (SELECT week_number, team, user, COUNT(user) AS query_cntFROM numsGROUP BY week_number, team, user), rnks AS (SELECT *, RANK() OVER (PARTITION BY week_number, team ORDER BY query_cnt DESC) AS rkFROM numssORDER BY week_number, team, user)SELECT *FROM rnksWHERE rk = 1#윈도우함수 연습문제 3WITH nums AS (SELECT EXTRACT(week FROM query_date) AS week_number,team,userFROM advanced.query_logs), numss AS (SELECT week_number, team, user, COUNT(user) AS query_cntFROM numsGROUP BY week_number, team, user)SELECT user, team, week_number, query_cnt, LAG(query_cnt) OVER (PARTITION BY user ORDER BY week_number)FROM numssORDER BY user, team#윈도우함수 연습문제 4WITH cntcnt AS (SELECT user, team, query_date, COUNT(user) AS query_countFROM advanced.query_logsGROUP BY user, team, query_date)SELECT user, team, query_date, query_count, SUM(query_count) OVER (PARTITION BY user ORDER BY query_date) AS cntFROM cntcnt#윈도우함수 연습문제 5WITH raw_data AS(SELECT DATE '2024-05-01' AS date,15 AS number_of_orders UNION ALLSELECT DATE '2024-05-02',13 UNION ALLSELECT DATE '2024-05-03',NULL UNION ALLSELECT DATE '2024-05-04',16 UNION ALLSELECT DATE '2024-05-05',NULL UNION ALLSELECT DATE '2024-05-06',18 UNION ALLSELECT DATE '2024-05-07',20 UNION ALLSELECT DATE '2024-05-08',NULL UNION ALLSELECT DATE '2024-05-09',13 UNION ALLSELECT DATE '2024-05-10',14 UNION ALLSELECT DATE '2024-05-11',NULL UNION ALLSELECT DATE '2024-05-12',NULL), raws AS (SELECT date, number_of_orders, LAG(number_of_orders) OVER(ORDER BY date) AS beforesFROM raw_data)SELECT date, COALESCE(number_of_orders, befores)FROM raws#윈도우함수 연습문제 6WITH raw_data AS(SELECT DATE '2024-05-01' AS date,15 AS number_of_orders UNION ALLSELECT DATE '2024-05-02',13 UNION ALLSELECT DATE '2024-05-03',NULL UNION ALLSELECT DATE '2024-05-04',16 UNION ALLSELECT DATE '2024-05-05',NULL UNION ALLSELECT DATE '2024-05-06',18 UNION ALLSELECT DATE '2024-05-07',20 UNION ALLSELECT DATE '2024-05-08',NULL UNION ALLSELECT DATE '2024-05-09',13 UNION ALLSELECT DATE '2024-05-10',14 UNION ALLSELECT DATE '2024-05-11',NULL UNION ALLSELECT DATE '2024-05-12',NULL), raws AS (SELECT date, number_of_orders, LAG(number_of_orders) OVER(ORDER BY date) AS beforesFROM raw_data), dates AS (SELECT date, COALESCE(number_of_orders, befores) AS ordersFROM raws)SELECT date, orders, AVG(orders) OVER(ORDER BY date ROWS BETWEEN 2 PRECEDING AND CURRENT ROW)FROM dates#윈도우함수 연습문제7마지막 문제는 스스로 풀진 못했고 강의의 힘을 빌려 마무리했습니다! ㅎㅎ
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[빠짝 스터디 2주차 과제] 윈도우 함수 연습 문제
1. 윈도우 함수 탐색 함수 연습 문제1) 유저의 다음 접속 월과 다다음 접속 월 구하기SELECT user_id , visit_month , LEAD(visit_month) OVER (PARTITION BY user_id ORDER BY visit_month) AS lead_visit_month , LEAD(visit_month, 2) OVER (PARTITION BY user_id ORDER BY visit_month) AS lead2_visit_month FROM advanced.analytics_function_01 2) 이전 접속월 구하기SELECT user_id , visit_month , LAG(visit_month) OVER (PARTITION BY user_id ORDER BY visit_month) AS before_visit_month FROM advanced.analytics_function_01 3) 다음 접속 월까지의 간격 구하기SELECT * , after_visit_month - visit_month AS diff_month FROM ( SELECT user_id , visit_month , LEAD(visit_month) OVER (PARTITION BY user_id ORDER BY visit_month) AS after_visit_month FROM advanced.analytics_function_01 ) 4) 첫번째와 마지막 방문 월 구하기SELECT user_id , visit_month , FIRST_VALUE(visit_month) OVER (PARTITION BY user_id ORDER BY visit_month) AS first_visit_month , LAST_VALUE(visit_month) OVER (PARTITION BY user_id ORDER BY visit_month) AS last_visit_month FROM advanced.analytics_function_01 2. 윈도우 함수 Frame 연습 문제SELECT * , SUM(amount) OVER () AS amount_total , SUM(amount) OVER (ORDER BY order_id) AS cumulative_sum , SUM(amount) OVER (PARTITION BY user_id ORDER BY order_id) AS user_cumulative_sum , AVG(amount) OVER (ORDER BY order_id ROWS BETWEEN 5 PRECEDING AND 1 PRECEDING) AS last_5_avg FROM advanced.orders 3. 윈도우 함수 연습문제1) 사용자 별 쿼리를 실행한 총 횟수 구하는 쿼리를 작성해주세요.SELECT *, COUNT(query_date) OVER(PARTITION BY user ORDER BY query_date) AS total_query_cnt # 38 row FROM `advanced.query_logs` 2) 주차별로 팀내에서 쿼리를 많이 실행한 수를 구한후, 실행한수를 활용해 랭킹을 구해주세요. WITH base AS ( SELECT EXTRACT(WEEK FROM query_date) AS week_number, team, user, COUNT(user) AS query_cnt FROM advanced.query_logs GROUP BY ALL) SELECT *, RANK() OVER(PARTITION BY team ORDER BY query_cnt) AS rk FROM base QUALIFY rk = 1 ORDER BY week_number, team, query_cnt DESC 3) 쿼리를 실행한 시점 기준1주전에 쿼리 실행 수를 별도의 컬럼으로 확인할 수 있는 쿼리를 작성해주세요WITH base AS ( SELECT EXTRACT(WEEK FROM query_date) AS week_number, team, user, COUNT(user) AS query_cnt FROM advanced.query_logs GROUP BY ALL) SELECT *, LAG(query_cnt, 1) OVER(PARTITION BY user ORDER BY week_number) AS prev_week_cnt FROM base4) 시간의 흐름에 따라,일자별로 유저가 실행한 누적 쿼리수를 작성해주세요SELECT *, SUM(query_cnt) OVER(PARTITION BY user ORDER BY query_date) AS cumulative_sum, SUM(query_cnt) OVER(PARTITION BY user ORDER BY query_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS cumulative_sum2 FROM( SELECT query_date, team, user, COUNT(user) AS query_cnt FROM advanced.q5) NULL값이라고 되어있는 부분을 바로 이전 날짜의 값으로 채워주는 쿼리를 작성해주세요WITH raw_data AS ( SELECT DATE '2024-05-01' AS date, 15 AS number_of_orders UNION ALL SELECT DATE '2024-05-02', 13 UNION ALL SELECT DATE '2024-05-03', NULL UNION ALL SELECT DATE '2024-05-04', 16 UNION ALL SELECT DATE '2024-05-05', NULL UNION ALL SELECT DATE '2024-05-06', 18 UNION ALL SELECT DATE '2024-05-07', 20 UNION ALL SELECT DATE '2024-05-08', NULL UNION ALL SELECT DATE '2024-05-09', 13 UNION ALL SELECT DATE '2024-05-10', 14 UNION ALL SELECT DATE '2024-05-11', NULL UNION ALL SELECT DATE '2024-05-12', NULL ) SELECT *, LAST_VALUE(number_of_orders IGNORE NULLS) OVER(ORDER BY date) AS last_value_orders FROM raw_data6) 5번문제에서 NULL을 채운 후, 2일전~현재데이터의 평균을 구하는 쿼리를 작성해주세요(이동평균)WITH raw_data AS ( SELECT DATE '2024-05-01' AS date, 15 AS number_of_orders UNION ALL SELECT DATE '2024-05-02', 13 UNION ALL SELECT DATE '2024-05-03', NULL UNION ALL SELECT DATE '2024-05-04', 16 UNION ALL SELECT DATE '2024-05-05', NULL UNION ALL SELECT DATE '2024-05-06', 18 UNION ALL SELECT DATE '2024-05-07', 20 UNION ALL SELECT DATE '2024-05-08', NULL UNION ALL SELECT DATE '2024-05-09', 13 UNION ALL SELECT DATE '2024-05-10', 14 UNION ALL SELECT DATE '2024-05-11', NULL UNION ALL SELECT DATE '2024-05-12', NULL ) , filled_date AS( SELECT * EXCEPT(number_of_orders), LAST_VALUE(number_of_orders IGNORE NULLS) OVER(ORDER BY date) AS number_of_orders FROM raw_data) SELECT *, AVG(number_of_orders) OVER (ORDER BY date ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS moving_avg FROM filled_date7) app_logs테이블에서 CustomSession을 만들어주세요. 이전 이벤트 로그와 20초가 지나면 새로운 Session을 만들어주세요.WITH base AS( SELECT EVENT_DATE, DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul') AS event_datetime, event_name, user_id, user_pseudo_id FROM advanced.app_logs WHERE event_date = '2022-08-18' AND user_pseudo_id = "1997494153.8491999091" ) , diff_date AS(SELECT *, DATETIME_DIFF(event_datetime, prev_event_datetime, SECOND) AS second_diff FROM ( SELECT *, LAG(EVENT_DATETIME, 1) OVER(PARTITION BY user_pseudo_id ORDER BY event_datetime) AS prev_event_datetime FROM base ORDER BY base.event_datetime)) SELECT *, SUM(session_start) OVER(PARTITION BY user_pseudo_id ORDER BY event_datetime) AS session_num FROM ( SELECT *, CASE WHEN prev_event_datetime IS NULL THEN 1 WHEN second_diff >= 20 THEN 1 ELSE 0 END AS session_start FROM diff_date)
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[빠짝스터디 2주차 과제] 윈도우 함수 연습 문제
#1SELECT *, LEAD(visit_month, 1) OVER(PARTITION BY user_id ORDER BY visit_month) AS lead_visit_month1, LEAD(visit_month, 2) OVER(PARTITION BY user_id ORDER BY visit_month) AS lead_visit_month2 FROM advanced.analytics_function_01 ORDER BY user_id;#2SELECT *, LEAD(visit_month, 1) OVER(PARTITION BY user_id ORDER BY visit_month) AS after_visit_month1, LEAD(visit_month, 2) OVER(PARTITION BY user_id ORDER BY visit_month) AS after_visit_month2, LAG(visit_month, 1) OVER(PARTITION BY user_id ORDER BY visit_month) AS before_visit_month1 FROM advanced.analytics_function_01 ORDER BY user_id, visit_month;#3SELECT *, after_visit_month - visit_month AS diff FROM ( SELECT *, LEAD(visit_month, 1) OVER (PARTITION BY user_id ORDER BY visit_month) AS after_visit_month, FROM advanced.analytics_function_01 );#4SELECT *, FIRST_VALUE(visit_month) OVER (PARTITION BY user_id ORDER BY visit_month ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS first, LAST_VALUE(visit_month) OVER (PARTITION BY user_id ORDER BY visit_month ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS last FROM advanced.analytics_function_01;#5SELECT *, COUNT(user) OVER(PARTITION BY user) AS total_query_cnt FROM advanced.query_logs ORDER BY user, query_date;#6WITH query_cnt_cnt AS ( SELECT EXTRACT(WEEK FROM query_date) AS week_number, team, user, COUNT(*) AS query_cnt FROM advanced.query_logs GROUP BY week_number,team,user ) SELECT *, RANK() OVER(PARTITION BY week_number, team ORDER BY query_cnt DESC) AS rk FROM query_cnt_cnt QUALIFY rk = 1 ORDER BY week_number, team, query_cnt DESC;#7WITH query_cnt_cnt AS ( SELECT EXTRACT(WEEK FROM query_date) AS week_number, team, user, COUNT(*) AS query_cnt FROM advanced.query_logs GROUP BY week_number,team,user ) SELECT *, LAG(query_cnt, 1) OVER (PARTITION BY user ORDER BY week_number) AS prev_week_query_cnt FROM query_cnt_cnt;#8WITH query_cnt AS ( SELECT *, COUNT(*) AS query_cnt_cnt FROM advanced.query_logs GROUP BY user, team, query_date ) SELECT *, SUM(query_cnt_cnt) OVER (PARTITION BY user ORDER BY query_date ) AS query_cnt_cnt_cnt FROM query_cnt;#9WITH raw_data AS ( SELECT DATE '2024-05-01' AS date, 15 AS number_of_orders UNION ALL SELECT DATE '2024-05-02', 13 UNION ALL SELECT DATE '2024-05-03', NULL UNION ALL SELECT DATE '2024-05-04', 16 UNION ALL SELECT DATE '2024-05-05', NULL UNION ALL SELECT DATE '2024-05-06', 18 UNION ALL SELECT DATE '2024-05-07', 20 UNION ALL SELECT DATE '2024-05-08', NULL UNION ALL SELECT DATE '2024-05-09', 13 UNION ALL SELECT DATE '2024-05-10', 14 UNION ALL SELECT DATE '2024-05-11', NULL UNION ALL SELECT DATE '2024-05-12', NULL ) SELECT *, LAST_VALUE(number_of_orders IGNORE NULLS) OVER (ORDER BY date) AS last_value_orders FROM raw_data;#10WITH raw_data AS ( SELECT DATE '2024-05-01' AS date, 15 AS number_of_orders UNION ALL SELECT DATE '2024-05-02', 13 UNION ALL SELECT DATE '2024-05-03', NULL UNION ALL SELECT DATE '2024-05-04', 16 UNION ALL SELECT DATE '2024-05-05', NULL UNION ALL SELECT DATE '2024-05-06', 18 UNION ALL SELECT DATE '2024-05-07', 20 UNION ALL SELECT DATE '2024-05-08', NULL UNION ALL SELECT DATE '2024-05-09', 13 UNION ALL SELECT DATE '2024-05-10', 14 UNION ALL SELECT DATE '2024-05-11', NULL UNION ALL SELECT DATE '2024-05-12', NULL ), filled_data AS ( SELECT * EXCEPT(number_of_orders), LAST_VALUE(number_of_orders IGNORE NULLS) OVER (ORDER BY date) AS number_of_orders FROM raw_data ) SELECT *, AVG(number_of_orders) OVER(ORDER BY date ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS moving_avg FROM filled_data#11WITH base AS ( SELECT event_date, DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul') AS event_datetime, event_name, user_id, user_pseudo_id FROM advanced.app_logs WHERE 1=1 AND event_date = "2022-08-18" AND user_pseudo_id = "1997494153.8491999091" ) , diff_date AS ( SELECT *, DATETIME_DIFF(event_datetime, prev_event_datetime, SECOND) AS second_diff # second_diff 기반으로 새로운 세션의 시작일지, 아닐지를 판단할 수 있음 FROM( SELECT *, LAG(event_datetime, 1) OVER(PARTITION BY user_pseudo_id ORDER BY event_datetime) AS prev_event_datetime -- DATETIME_DIFF() => 차이를 구할 수 있음 FROM base ) ) SELECT *, SUM(session_start) OVER(PARTITION BY user_pseudo_id ORDER BY event_datetime) AS session_num FROM ( SELECT *, CASE WHEN prev_event_datetime IS NULL THEN 1 WHEN second_diff >= 20 THEN 1 ELSE 0 END AS session_start FROM diff_data ORDER BY event_datetime )
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[빠짝스터디 2주차 과제] 윈도우 함수 연습 문제
1번-- 1) 사용자별 쿼리를 실행한 총 횟수를 구하는 쿼리를 작성해주세요. -- 단, GROUP BY를 사용해서 집계하는 것이 아닌 query_logs의 데이터의 우측에 새로운 컬럼을 만들어주세요. SELECT *, COUNT(query_date) over(PARTITION BY user) AS total_query_cnt FROM advanced.query_logs ORDER BY user, query_date 2번-- 2) 주차별로 팀 내에서 쿼리를 많이 실행한 수를 구한 후, 실행한 수를 활용해 랭킹을 구해주세요. -- 단, 랭킹이 1등인 사람만 결과가 보이도록 해주세요 WITH query_cnt_team AS ( SELECT EXTRACT(WEEK FROM query_date) AS week_number, team, user, COUNT(user) AS query_cnt FROM advanced.query_logs GROUP BY ALL ) SELECT *, RANK() OVER(PARTITION BY week_number, team ORDER BY query_cnt desc) AS rk FROM query_cnt_team QUALIFY rk = 1 ORDER BY week_number, query_cnt desc; 3번-- 3) (2번 문제에서 사용한 주차별 쿼리 사용) 쿼리를 실행한 시점 기준 1주 전에 쿼리 실행 수를 별도의 컬럼으로 확인할 수 있는 쿼리를 작성헤주세요. WITH query_cnt_team AS ( SELECT EXTRACT(WEEK FROM query_date) AS week_number, team, user, COUNT(user) AS query_cnt FROM advanced.query_logs GROUP BY ALL ) SELECT *, LAG(query_cnt) OVER(PARTITION BY team, user ORDER BY week_number) AS prev_week_query_cnt FROM query_cnt_team 4번-- 4) 시간의 흐름(query_date)에 따라, 일자별로 유저가 실행한 누적 쿼리 수를 작성해주세요. WITH query_cnt_team AS ( SELECT query_date, team, user, COUNT(user) as query_cnt FROM advanced.query_logs GROUP BY ALL ORDER BY user, query_date ) SELECT user, team, query_date, query_cnt, SUM(query_cnt) OVER(PARTITION BY team, user ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS cumulative_query_count FROM query_cnt_team ORDER BY user, query_date; 5번-- 5) 주문 횟수를 나타낸 데이터에서 NULL 값이라고 되어있는 부분을 바로 이전 날짜의 값으로 채워주는 쿼리를 작성해주세요. WITH raw_data AS( SELECT DATE'2024-05-01'AS date,15 AS number_of_orders UNION ALL SELECT DATE'2024-05-02',13 UNION ALL SELECT DATE'2024-05-03',NULL UNION ALL SELECT DATE'2024-05-04',16 UNION ALL SELECT DATE'2024-05-05',NULL UNION ALL SELECT DATE'2024-05-06',18 UNION ALL SELECT DATE'2024-05-07',20 UNION ALL SELECT DATE'2024-05-08',NULL UNION ALL SELECT DATE'2024-05-09',13 UNION ALL SELECT DATE'2024-05-10',14 UNION ALL SELECT DATE'2024-05-11',NULL UNION ALL SELECT DATE'2024-05-12',NULL ) SELECT *, LAST_VALUE(number_of_orders IGNORE NULLS) OVER(ORDER BY date) AS last_value_orders FROM raw_data; 6번-- 6) 5번 문제에서 NULL을 채운 후, 2일 전 ~ 현재 데이터의 평균을 구하는 쿼리를 작성해주세요.(이동 평균) WITH raw_data AS( SELECT DATE'2024-05-01'AS date,15 AS number_of_orders UNION ALL SELECT DATE'2024-05-02',13 UNION ALL SELECT DATE'2024-05-03',NULL UNION ALL SELECT DATE'2024-05-04',16 UNION ALL SELECT DATE'2024-05-05',NULL UNION ALL SELECT DATE'2024-05-06',18 UNION ALL SELECT DATE'2024-05-07',20 UNION ALL SELECT DATE'2024-05-08',NULL UNION ALL SELECT DATE'2024-05-09',13 UNION ALL SELECT DATE'2024-05-10',14 UNION ALL SELECT DATE'2024-05-11',NULL UNION ALL SELECT DATE'2024-05-12',NULL ), filled_data AS ( SELECT * EXCEPT(number_of_orders), LAST_VALUE(number_of_orders IGNORE NULLS) OVER(ORDER BY date) AS last_value_orders FROM raw_data ) SELECT * FROM filled_data; 7번-- 7) app_logs 테이블에서 Custom Session을 만들어 주세요. 이전 이벤트 로그와 20초가 지나면 새로운 Session을 만들어 주세요. -- Session은 숫자로 (1, 2, 3 ...) 표시해도 됩니다. WITH base AS ( SELECT event_date, DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul') AS event_datetime, event_name, user_id, user_pseudo_id FROM advanced.app_logs WHERE event_date = '2022-08-18' ), diff_date AS ( SELECT *, DATETIME_DIFF(event_datetime, pre_event_time, SECOND) AS second_diff FROM ( SELECT *, LAG(event_datetime) OVER(PARTITION BY user_pseudo_id ORDER BY event_datetime ASC) AS pre_event_time FROM base ) ), session_start AS ( SELECT *, CASE WHEN pre_event_time IS NULL THEN 1 WHEN second_diff >= 20 THEN 1 ELSE 0 END AS start_session FROM diff_date ) SELECT *, SUM(start_session) OVER(PARTITION BY user_pseudo_id ORDER BY event_datetime ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS session_id FROM session_start ORDER BY event_datetime;
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[빠짝스터디 2주차 과제] 윈도우 함수 연습 문제
# 윈도우 함수(탐색 함수) 연습 문제 1 SELECT *, LEAD(visit_month, 1) OVER(PARTITION BY user_id ORDER BY visit_month ASC) AS lead_1, LEAD(visit_month, 2) OVER(PARTITION BY user_id ORDER BY visit_month ASC) AS lead_2 FROM advanced.analytics_function_01 ORDER BY user_id; # 윈도우 함수(탐색 함수) 연습 문제 2 SELECT *, LEAD(visit_month, 1) OVER(PARTITION BY user_id ORDER BY visit_month ASC) AS lead_1, LEAD(visit_month, 2) OVER(PARTITION BY user_id ORDER BY visit_month ASC) AS lead_2, LAG(visit_month, 1) OVER(PARTITION BY user_id ORDER BY visit_month ASC) AS lag_1 FROM advanced.analytics_function_01 ORDER BY user_id;# 1) 사용자별 쿼리를 실행한 총 횟수를 구하는 쿼리를 작성해주세요. 단, GROUP BY를 사용해서 집계하는 것이 아닌 query_logs의 데이터의 우측에 새로운 컬럼을 만들어주세요. SELECT *, COUNT(user) OVER(PARTITION BY user) AS total_query_cnt FROM advanced.query_logs ORDER BY query_date# 2) 주차별로 팀 내에서 쿼리를 많이 실행한 수를 구한 후, 실행한 수를 활용해 랭킹을 구해주세요. 단, 랭킹이 1등인 사람만 결과가 보이도록 해주세요 WITH base AS ( SELECT EXTRACT(WEEK FROM query_date) AS week_number, team, user, COUNT(user) AS query_cnt FROM advanced.query_logs GROUP BY ALL ) SELECT *, RANK() OVER(PARTITION BY week_number, team ORDER BY query_cnt DESC) AS team_rank FROM base QUALIFY team_rank = 1 ORDER BY week_number, team # 3) (2번 문제에서 사용한 주차별 쿼리 사용) 쿼리를 실행한 시점 기준 1주 전에 쿼리 실행 수를 별도의 컬럼으로 확인할 수 있는 쿼리를 작성해주세요 WITH base AS ( SELECT EXTRACT(WEEK FROM query_date) AS week_number, team, user, COUNT(user) AS query_cnt FROM advanced.query_logs GROUP BY ALL ) SELECT user, team, week_number, query_cnt, LAG(query_cnt, 1) OVER(PARTITION BY user ORDER BY week_number) AS prev_week_query_cnt FROM base# 4) 시간의 흐름에 따라, 일자별로 유저가 실행한 누적 쿼리 수를 작성해주세요 WITH base AS ( SELECT user, team, query_date, COUNT(user) AS query_cnt FROM advanced.query_logs GROUP BY ALL ) SELECT *, SUM(query_cnt) OVER(PARTITION BY user ORDER BY query_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS cumulative_query_count FROM base# 5) 다음 데이터는 주문 횟수를 나타낸 데이터입니다. 만약 주문 횟수가 없으면 NULL로 기록됩니다. 이런 데이터에서 NULL값이라고 되어있는 부분을 바로 이전 날짜의 값으로 채워주는 쿼리를 작성해주세요 WITH raw_data AS ( SELECT DATE'2024-05-01'AS date, 15 AS number_of_orders UNION ALL SELECT DATE'2024-05-02', 13 UNION ALL SELECT DATE'2024-05-03', NULL UNION ALL SELECT DATE'2024-05-04', 16 UNION ALL SELECT DATE'2024-05-05', NULL UNION ALL SELECT DATE'2024-05-06', 18 UNION ALL SELECT DATE'2024-05-07', 20 UNION ALL SELECT DATE'2024-05-08', NULL UNION ALL SELECT DATE'2024-05-09', 13 UNION ALL SELECT DATE'2024-05-10', 14 UNION ALL SELECT DATE'2024-05-11', NULL UNION ALL SELECT DATE'2024-05-12', NULL ) SELECT *, IFNULL(number_of_orders, LAG(number_of_orders,1) OVER(ORDER BY date ASC)) AS non_null FROM raw_data; SELECT *, LAST_VALUE(number_of_orders IGNORE NULLS) OVER(ORDER BY date ASC) AS non_null FROM raw_data# 6) 5번 문제에서 NULL을 채운 후, 2일 전 ~ 현재 데이터의 평균을 구하는 쿼리를 작성해주세요(이동 평균) WITH raw_data AS ( SELECT DATE'2024-05-01'AS date, 15 AS number_of_orders UNION ALL SELECT DATE'2024-05-02', 13 UNION ALL SELECT DATE'2024-05-03', NULL UNION ALL SELECT DATE'2024-05-04', 16 UNION ALL SELECT DATE'2024-05-05', NULL UNION ALL SELECT DATE'2024-05-06', 18 UNION ALL SELECT DATE'2024-05-07', 20 UNION ALL SELECT DATE'2024-05-08', NULL UNION ALL SELECT DATE'2024-05-09', 13 UNION ALL SELECT DATE'2024-05-10', 14 UNION ALL SELECT DATE'2024-05-11', NULL UNION ALL SELECT DATE'2024-05-12', NULL ) SELECT *, AVG(number_of_orders) OVER(ORDER BY date ASC ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS moving_avg FROM ( SELECT date, LAST_VALUE(number_of_orders IGNORE NULLS) OVER(ORDER BY date) AS number_of_orders FROM raw_data )# 7) app_logs 테이블에서 Custom Session을 만들어 주세요. 이전 이벤트 로그와 20초가 지나면 새로운 Session을 만들어 주세요. Session은 숫자로(1,2,3...) 표시해도 됩니다 # 2022-08-18일의 user_pseudo_id(1997494153.8491999091)은 session_id가 4까지 나옵니다 WITH base AS ( SELECT event_date, DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul') AS event_datetime, event_name, user_id, user_pseudo_id FROM advanced.app_logs WHERE event_date = '2022-08-18' AND user_pseudo_id = '1997494153.8491999091' ), diff_data AS ( SELECT *, DATETIME_DIFF(event_datetime, before_1_event_datetime, SECOND) AS second_diff FROM ( SELECT *, LAG(event_datetime, 1) OVER(PARTITION BY user_pseudo_id ORDER BY event_datetime ASC) AS before_1_event_datetime FROM base ) ) SELECT *, SUM(session_start) OVER(PARTITION BY user_pseudo_id ORDER BY event_datetime ASC ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS session_id FROM ( SELECT *, CASE WHEN second_diff IS NULL THEN 1 WHEN second_diff >= 20 THEN 1 ELSE NULL END AS session_start FROM diff_data )