OpenCV YOLO에서 각 Output layer shape 출력 질문입니다!

Question

안녕하세요! 사소한 질문일 수 있는데 궁금해서 질문드립니다! 다음과 같이 단일 이미지를 Object Detection하는 코드를 함수화 하지 않고 개별 cell로 작성한 후 output shape를 출력하면 3개의 shape 모두 각각 출력되는 것을 볼 수 있습니다!(빨간색 박스입니다!) 그런데 아래와 같이 함수화로 만들고 난 후 출력하면 3개의 Output layer 중 마지막 layer인 (8112, 85) 만 3번 출력되는데 왜 이러는 걸까요..? 그렇다고 Object Detection 결과가 잘못되어서 나오진 않습니다. 함수화하지 않은 셀에서 실행했을 때랑 Detection 결과는 동일하게 정상적으로 나옵니다. shape 출력 결과만 저렇게 나오는 것 같은데.. 왜 그러는 건가요..?

권 철민 · Answer

typing이 잘못되었군요. 아래에서 for 문에 idx, output이 되어야 하는데, idx, ouput이 되었습니다. ouput=>output으로 수정하시면 됩니다. for idx, ouput in enumerate (cv_out): print ( 'output shape:' , output.shape)

밑바닥개발자 · Answer

늦은 답변 죄송합니다 ㅜㅜ 하단에 코드 첨부해드렸습니다! # 단일 이미지를 YOLO로 Object Detection 함수화 시키기 import numpy as np import time import os def get_detected_img(cv_net, img_array, conf_threshold, nms_threshold, use_copied_img=True, is_print=True): # 원본 이미지 사이즈로 다시 돌려야 함! -> array일때 row는 height를 의미! Detection결과로 반환되는 x좌표는 width를 의미함! 헷갈리지 말즈아! height = img_array.shape[0] width = img_array.shape[1] draw_img = None if use_copied_img: draw_img = img_array.copy() else: draw_img = img_array # YOLO의 3개 Output layer를 얻자 layer_names = cv_net.getLayerNames() outlayer_names = [layer_names[i[0] - 1] for i in cv_net.getUnconnectedOutLayers()] #print('out layer names:', outlayer_names) # 로드한 YOLO 모델에 입력 이미지 넣기 cv_net.setInput(cv2.dnn.blobFromImage(img_array, scalefactor=1/255., size=(416, 416), swapRB=True, crop=False)) # 이미지 Object Detection 수행하는데 Output layers 넣어주기! -> 넣어준 layer일 때마다의 Output을 반환해줌 start = time.time() cv_out = cv_net.forward(outlayer_names) green, red = (0, 255, 0), (0, 0, 255) class_ids = [] confidences = [] boxes = [] # print('type cv_out:', type(cv_out)) # 총 3개의 Ouput layer들에 대해 하나씩 loop for idx, ouput in enumerate(cv_out): print('output shape:', output.shape) # 각 Output layer들의 Object Detection 결과 처리 for idx2, detection in enumerate(ouput): scores = detection[5:] # 80개의 클래스 softmax 확률 class_id = np.argmax(scores) # 가장 확률이 높은 클래스 id 반환 confidence = scores[class_id] # 가장 확률이 높은 클래스의 confidence score 반환 if confidence > conf_threshold: # 들어있는 스케일링된 좌표값들 처리(scaled center_x, center_y, width, height) center_x = int(detection[0] * width) center_y = int(detection[1] * height) o_width = int(detection[2] * width) o_height = int(detection[3] * height) # 왼쪽 상단 좌표 left = int(center_x - o_width/2) top = int(center_y - o_height/2) class_ids.append(class_id) confidences.append(float(confidence)) # confidence type => just float형으로!(not np.float) boxes.append([left, top, o_width, o_height]) # NMS 수행 optimal_idx = cv2.dnn.NMSBoxes(boxes, confidences, conf_threshold, nms_threshold) # NMS 결과의 최적의 바운딩 박스들을 하나씩 표시! if len(optimal_idx) > 0: for i in optimal_idx.flatten(): class_id = class_ids[i] confidence = confidences[i] box = boxes[i] left = int(box[0]) top = int(box[1]) right = int(left + box[2]) bottom = int(top + box[3]) caption = f"{labels_to_names_seq[class_id]}: {confidence :.3f}" # 박스 씌우고 캡션 넣기 cv2.rectangle(draw_img, (left, top), (right, bottom), green, thickness=2) cv2.putText(draw_img, caption, (left, top-5), cv2.FONT_HERSHEY_COMPLEX, 0.4, red, 1) if is_print: print("Detection 수행 시간:", time.time() - start, "초") return draw_img

권 철민 · Answer

안녕하십니까, 함수를 어떻게 만들었는지 소스코드를 올려 놓으시면 봐드릴께요.