Keras_YOLO cpu inference error

Question

안녕하세요 강사님. 저번에 친절하게 올려주신 답변을 보고 따라하는 중에 또 질문을 올리네요!! Keras-yolo를 데스크탑 gpu로 train해서 만든 .h5파일을 노트북 cpu로 inference 과정 중 문제가 생겼습니다. 모델 로드는 제대로 되었는데, Yolo의 detect_image()를 사용하여 inference 중 아레와 같은 에러가 발생하였습니다! 혹시 cpu에서 keras-YOLO는 사용이 불가능한 것일까요? -------------- 모델 학습 25회 학습 + validation 2회 진행 중 한번 끊겨서 중간 저장된 model_stage_1.h5로 5회 학습 + 5회 validation 진행한 결과입니다. train_yolo(pretrained_path, annotation_path,classes_path, anchors_path, log_dir,trained_model_name, b_size, epochs_cnt) pretrained_path: C:/JupyterNotebook/ClothClassified/model_data/cloth_stage_1.h5 Create Tiny YOLOv3 model with 6 anchors and 14 classes. Load weights C:/JupyterNotebook/ClothClassified/model_data/cloth_stage_1.h5. Freeze the first 42 layers of total 44 layers. Train on 28938 samples, val on 3215 samples, with batch size 2. Epoch 1/5 14469/14469 [==============================] - 2273s 157ms/step - loss: 17.7330 - val_loss: nan WARNING:tensorflow:From C:\Users\min96\anaconda3\envs f115\lib\site-packages\keras\callbacks ensorboard_v1.py:343: The name tf.Summary is deprecated. Please use tf.compat.v1.Summary instead. Epoch 2/5 14469/14469 [==============================] - 2078s 144ms/step - loss: 17.6376 - val_loss: nan Epoch 3/5 14469/14469 [==============================] - 2094s 145ms/step - loss: 17.5342 - val_loss: nan Epoch 4/5 14469/14469 [==============================] - 2049s 142ms/step - loss: 17.6445 - val_loss: nan Epoch 5/5 14469/14469 [==============================] - 2042s 141ms/step - loss: 17.6296 - val_loss: nan Unfreeze all of the layers. Train on 28938 samples, val on 3215 samples, with batch size 2. Epoch 6/10 14469/14469 [==============================] - 2111s 146ms/step - loss: 14.0190 - val_loss: 5.6780 Epoch 7/10 14469/14469 [==============================] - 2111s 146ms/step - loss: 11.0151 - val_loss: 4.3427 Epoch 8/10 14469/14469 [==============================] - 2114s 146ms/step - loss: 9.7026 - val_loss: 7.6887 Epoch 9/10 14469/14469 [==============================] - 2115s 146ms/step - loss: 9.0433 - val_loss: 2.4560 Epoch 10/10 14469/14469 [==============================] - 2105s 146ms/step - loss: 8.5110 - val_loss: 7.9146 ------------- 모델 로드 cloth_tiny_yolo = YOLO(model_path=pretrained_path, anchors_path=anchors_path, classes_path=classes_path) C:/JupyterNotebook/model_data/cloth_final.h5 model, anchors, and classes loaded. ----------------------- train detected_img = yolo.detect_image(img) plt.imshow(detected_img) (192, 192, 3) --------------------------------------------------------------------------- UnimplementedError Traceback (most recent call last) ~\anaconda3\envs 115c\lib\site-packages ensorflow_core\python\client\session.py in _do_call (self, fn, *args) 1364 try : -> 1365 return fn ( * args ) 1366 except errors . OpError as e : ~\anaconda3\envs 115c\lib\site-packages ensorflow_core\python\client\session.py in _run_fn (feed_dict, fetch_list, target_list, options, run_metadata) 1349 return self._call_tf_sessionrun(options, feed_dict, fetch_list, -> 1350 target_list, run_metadata) 1351 ~\anaconda3\envs 115c\lib\site-packages ensorflow_core\python\client\session.py in _call_tf_sessionrun (self, options, feed_dict, fetch_list, target_list, run_metadata) 1442 fetch_list , target_list , -> 1443 run_metadata) 1444 UnimplementedError : The Conv2D op currently does not support grouped convolutions on the CPU. A grouped convolution was attempted to be run because the input depth of 512 does not match the filter input depth of 1 [[{{node conv2d_49/convolution}}]] During handling of the above exception, another exception occurred: UnimplementedError Traceback (most recent call last) in 1 # with tf.device('/cpu:0'): ----> 2 detected_img = yolo . detect_image ( img ) 3 plt . imshow ( detected_img ) C:\JupyterNotebook\model_data\keras-yolo3\yolo.py in detect_image (self, image) 122 self . yolo_model . input : image_data , 123 self . input_image_shape : [ image . size [ 1 ] , image . size [ 0 ] ] , --> 124 K . learning_phase ( ) : 0 125 }) 126 ~\anaconda3\envs 115c\lib\site-packages ensorflow_core\python\client\session.py in run (self, fetches, feed_dict, options, run_metadata) 954 try : 955 result = self._run(None, fetches, feed_dict, options_ptr, --> 956 run_metadata_ptr) 957 if run_metadata : 958 proto_data = tf_session . TF_GetBuffer ( run_metadata_ptr ) ~\anaconda3\envs 115c\lib\site-packages ensorflow_core\python\client\session.py in _run (self, handle, fetches, feed_dict, options, run_metadata) 1178 if final_fetches or final_targets or ( handle and feed_dict_tensor ) : 1179 results = self._do_run(handle, final_targets, final_fetches, -> 1180 feed_dict_tensor, options, run_metadata) 1181 else : 1182 results = [ ] ~\anaconda3\envs 115c\lib\site-packages ensorflow_core\python\client\session.py in _do_run (self, handle, target_list, fetch_list, feed_dict, options, run_metadata) 1357 if handle is None : 1358 return self._do_call(_run_fn, feeds, fetches, targets, options, -> 1359 run_metadata) 1360 else : 1361 return self . _do_call ( _prun_fn , handle , feeds , fetches ) ~\anaconda3\envs 115c\lib\site-packages ensorflow_core\python\client\session.py in _do_call (self, fn, *args) 1382 ' session_config.graph_options.rewrite_options.' 1383 'disable_meta_optimizer = True') -> 1384 raise type ( e ) ( node_def , op , message ) 1385 1386 def _extend_graph ( self ) : UnimplementedError : The Conv2D op currently does not support grouped convolutions on the CPU. A grouped convolution was attempted to be run because the input depth of 512 does not match the filter input depth of 1 [[node conv2d_49/convolution (defined at C:\Users\min96\anaconda3\envs 115c\lib\site-packages ensorflow_core\python\framework\ops.py:1748) ]] Original stack trace for 'conv2d_49/convolution': File "C:\Users\min96\anaconda3\envs 115c\lib unpy.py", line 193, in _run_module_as_main "__main__", mod_spec) File "C:\Users\min96\anaconda3\envs 115c\lib unpy.py", line 85, in _run_code exec(code, run_globals) File "C:\Users\min96\anaconda3\envs 115c\lib\site-packages\ipykernel_launcher.py", line 16, in app.launch_new_instance() File "C:\Users\min96\anaconda3\envs 115c\lib\site-packages raitlets\config\application.py", line 664, in launch_instance app.start() File "C:\Users\min96\anaconda3\envs 115c\lib\site-packages\ipykernel\kernelapp.py", line 612, in start self.io_loop.start() File "C:\Users\min96\anaconda3\envs 115c\lib\site-packages ornado\platform\asyncio.py", line 199, in start self.asyncio_loop.run_forever() File "C:\Users\min96\anaconda3\envs 115c\lib\asyncio\base_events.py", line 442, in run_forever self._run_once() File "C:\Users\min96\anaconda3\envs 115c\lib\asyncio\base_events.py", line 1462, in _run_once handle._run() File "C:\Users\min96\anaconda3\envs 115c\lib\asyncio\events.py", line 145, in _run self._callback(*self._args) File "C:\Users\min96\anaconda3\envs 115c\lib\site-packages ornado\ioloop.py", line 688, in lambda f: self._run_callback(functools.partial(callback, future)) File "C:\Users\min96\anaconda3\envs 115c\lib\site-packages ornado\ioloop.py", line 741, in _run_callback ret = callback() File "C:\Users\min96\anaconda3\envs 115c\lib\site-packages ornado\gen.py", line 814, in inner self.ctx_run(self.run) File "C:\Users\min96\anaconda3\envs 115c\lib\site-packages ornado\gen.py", line 162, in _fake_ctx_run return f(*args, **kw) File "C:\Users\min96\anaconda3\envs 115c\lib\site-packages ornado\gen.py", line 775, in run yielded = self.gen.send(value) File "C:\Users\min96\anaconda3\envs 115c\lib\site-packages\ipykernel\kernelbase.py", line 365, in process_one yield gen.maybe_future(dispatch(*args)) File "C:\Users\min96\anaconda3\envs 115c\lib\site-packages ornado\gen.py", line 234, in wrapper yielded = ctx_run(next, result) File "C:\Users\min96\anaconda3\envs 115c\lib\site-packages ornado\gen.py", line 162, in _fake_ctx_run return f(*args, **kw) File "C:\Users\min96\anaconda3\envs 115c\lib\site-packages\ipykernel\kernelbase.py", line 268, in dispatch_shell yield gen.maybe_future(handler(stream, idents, msg)) File "C:\Users\min96\anaconda3\envs 115c\lib\site-packages ornado\gen.py", line 234, in wrapper yielded = ctx_run(next, result) File "C:\Users\min96\anaconda3\envs 115c\lib\site-packages ornado\gen.py", line 162, in _fake_ctx_run return f(*args, **kw) File "C:\Users\min96\anaconda3\envs 115c\lib\site-packages\ipykernel\kernelbase.py", line 545, in execute_request user_expressions, allow_stdin, File "C:\Users\min96\anaconda3\envs 115c\lib\site-packages ornado\gen.py", line 234, in wrapper yielded = ctx_run(next, result) File "C:\Users\min96\anaconda3\envs 115c\lib\site-packages ornado\gen.py", line 162, in _fake_ctx_run return f(*args, **kw) File "C:\Users\min96\anaconda3\envs 115c\lib\site-packages\ipykernel\ipkernel.py", line 306, in do_execute res = shell.run_cell(code, store_history=store_history, silent=silent) File "C:\Users\min96\anaconda3\envs 115c\lib\site-packages\ipykernel\zmqshell.py", line 536, in run_cell return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs) File "C:\Users\min96\anaconda3\envs 115c\lib\site-packages\IPython\core\interactiveshell.py", line 2867, in run_cell raw_cell, store_history, silent, shell_futures) File "C:\Users\min96\anaconda3\envs 115c\lib\site-packages\IPython\core\interactiveshell.py", line 2895, in _run_cell return runner(coro) File "C:\Users\min96\anaconda3\envs 115c\lib\site-pa

권 철민 · Answer

안녕하십니까, CPU에서는 Grouped Convolution이 지원이 안되서 발생하는 문제 같습니다. Keras Yolo3 코드의 어디에서 Grouped Convolution이 사용되는지 찾아봤는데, 명확하지 않은것 같습니다. 오류 메시지에서는 49번 convolution node 만들때 사용되었다고 하는데,..... 음, 49번 convolution node만들때 사용되지 않은 것 같은데... 암튼 CPU에서는 Keras Yolo가 문제가 있는것 같습니다. 감사합니다.