Keras_YOLO cpu inference error

21.05.18 07:39 작성 조회수 137


안녕하세요 강사님.

저번에 친절하게 올려주신 답변을 보고 따라하는 중에 또 질문을 올리네요!!

Keras-yolo를 데스크탑 gpu로 train해서 만든 .h5파일을

노트북 cpu로 inference 과정 중 문제가 생겼습니다.

모델 로드는 제대로 되었는데, Yolo의 detect_image()를 사용하여 inference 중 아레와 같은 에러가 발생하였습니다!

혹시 cpu에서 keras-YOLO는 사용이 불가능한 것일까요?

-------------- 모델 학습

25회 학습 + validation 2회 진행 중 한번 끊겨서 중간 저장된 model_stage_1.h5로

5회 학습 + 5회 validation 진행한 결과입니다.

train_yolo(pretrained_path, annotation_path,classes_path, anchors_path, log_dir,trained_model_name, b_size, epochs_cnt)

pretrained_path: C:/JupyterNotebook/ClothClassified/model_data/cloth_stage_1.h5
Create Tiny YOLOv3 model with 6 anchors and 14 classes.
Load weights C:/JupyterNotebook/ClothClassified/model_data/cloth_stage_1.h5.
Freeze the first 42 layers of total 44 layers.
Train on 28938 samples, val on 3215 samples, with batch size 2.
Epoch 1/5
14469/14469 [==============================] - 2273s 157ms/step - loss: 17.7330 - val_loss: nan
WARNING:tensorflow:From C:\Users\min96\anaconda3\envs\tf115\lib\site-packages\keras\callbacks\tensorboard_v1.py:343: The name tf.Summary is deprecated. Please use tf.compat.v1.Summary instead.

Epoch 2/5
14469/14469 [==============================] - 2078s 144ms/step - loss: 17.6376 - val_loss: nan
Epoch 3/5
14469/14469 [==============================] - 2094s 145ms/step - loss: 17.5342 - val_loss: nan
Epoch 4/5
14469/14469 [==============================] - 2049s 142ms/step - loss: 17.6445 - val_loss: nan
Epoch 5/5
14469/14469 [==============================] - 2042s 141ms/step - loss: 17.6296 - val_loss: nan
Unfreeze all of the layers.
Train on 28938 samples, val on 3215 samples, with batch size 2.
Epoch 6/10
14469/14469 [==============================] - 2111s 146ms/step - loss: 14.0190 - val_loss: 5.6780
Epoch 7/10
14469/14469 [==============================] - 2111s 146ms/step - loss: 11.0151 - val_loss: 4.3427
Epoch 8/10
14469/14469 [==============================] - 2114s 146ms/step - loss: 9.7026 - val_loss: 7.6887
Epoch 9/10
14469/14469 [==============================] - 2115s 146ms/step - loss: 9.0433 - val_loss: 2.4560
Epoch 10/10
14469/14469 [==============================] - 2105s 146ms/step - loss: 8.5110 - val_loss: 7.9146

------------- 모델 로드

cloth_tiny_yolo = YOLO(model_path=pretrained_path, anchors_path=anchors_path, classes_path=classes_path)

C:/JupyterNotebook/model_data/cloth_final.h5 model, anchors, and classes loaded.

----------------------- train

detected_img = yolo.detect_image(img) plt.imshow(detected_img)

(192, 192, 3)
UnimplementedError                        Traceback (most recent call last)
~\anaconda3\envs\t115c\lib\site-packages\tensorflow_core\python\client\session.py in _do_call(self, fn, *args)
   1364     try:
-> 1365       return fn(*args)
   1366     except errors.OpError as e:

~\anaconda3\envs\t115c\lib\site-packages\tensorflow_core\python\client\session.py in _run_fn(feed_dict, fetch_list, target_list, options, run_metadata)
   1349       return self._call_tf_sessionrun(options, feed_dict, fetch_list,
-> 1350                                       target_list, run_metadata)

~\anaconda3\envs\t115c\lib\site-packages\tensorflow_core\python\client\session.py in _call_tf_sessionrun(self, options, feed_dict, fetch_list, target_list, run_metadata)
   1442                                             fetch_list, target_list,
-> 1443                                             run_metadata)

UnimplementedError: The Conv2D op currently does not support grouped convolutions on the CPU. A grouped convolution was attempted to be run because the input depth of 512 does not match the filter input depth of 1
	 [[{{node conv2d_49/convolution}}]]

During handling of the above exception, another exception occurred:

UnimplementedError                        Traceback (most recent call last)
<ipython-input-23-b6fbc29987b5> in <module>
      1 # with tf.device('/cpu:0'):
----> 2 detected_img = yolo.detect_image(img)
      3 plt.imshow(detected_img)

C:\JupyterNotebook\model_data\keras-yolo3\yolo.py in detect_image(self, image)
    122                 self.yolo_model.input: image_data,
    123                 self.input_image_shape: [image.size[1], image.size[0]],
--> 124                 K.learning_phase(): 0
    125             })

~\anaconda3\envs\t115c\lib\site-packages\tensorflow_core\python\client\session.py in run(self, fetches, feed_dict, options, run_metadata)
    954     try:
    955       result = self._run(None, fetches, feed_dict, options_ptr,
--> 956                          run_metadata_ptr)
    957       if run_metadata:
    958         proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)

~\anaconda3\envs\t115c\lib\site-packages\tensorflow_core\python\client\session.py in _run(self, handle, fetches, feed_dict, options, run_metadata)
   1178     if final_fetches or final_targets or (handle and feed_dict_tensor):
   1179       results = self._do_run(handle, final_targets, final_fetches,
-> 1180                              feed_dict_tensor, options, run_metadata)
   1181     else:
   1182       results = []

~\anaconda3\envs\t115c\lib\site-packages\tensorflow_core\python\client\session.py in _do_run(self, handle, target_list, fetch_list, feed_dict, options, run_metadata)
   1357     if handle is None:
   1358       return self._do_call(_run_fn, feeds, fetches, targets, options,
-> 1359                            run_metadata)
   1360     else:
   1361       return self._do_call(_prun_fn, handle, feeds, fetches)

~\anaconda3\envs\t115c\lib\site-packages\tensorflow_core\python\client\session.py in _do_call(self, fn, *args)
   1382                     '\nsession_config.graph_options.rewrite_options.'
   1383                     'disable_meta_optimizer = True')
-> 1384       raise type(e)(node_def, op, message)
   1386   def _extend_graph(self):

UnimplementedError: The Conv2D op currently does not support grouped convolutions on the CPU. A grouped convolution was attempted to be run because the input depth of 512 does not match the filter input depth of 1
	 [[node conv2d_49/convolution (defined at C:\Users\min96\anaconda3\envs\t115c\lib\site-packages\tensorflow_core\python\framework\ops.py:1748) ]]

Original stack trace for 'conv2d_49/convolution':
  File "C:\Users\min96\anaconda3\envs\t115c\lib\runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "C:\Users\min96\anaconda3\envs\t115c\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "C:\Users\min96\anaconda3\envs\t115c\lib\site-packages\ipykernel_launcher.py", line 16, in <module>
  File "C:\Users\min96\anaconda3\envs\t115c\lib\site-packages\traitlets\config\application.py", line 664, in launch_instance
  File "C:\Users\min96\anaconda3\envs\t115c\lib\site-packages\ipykernel\kernelapp.py", line 612, in start
  File "C:\Users\min96\anaconda3\envs\t115c\lib\site-packages\tornado\platform\asyncio.py", line 199, in start
  File "C:\Users\min96\anaconda3\envs\t115c\lib\asyncio\base_events.py", line 442, in run_forever
  File "C:\Users\min96\anaconda3\envs\t115c\lib\asyncio\base_events.py", line 1462, in _run_once
  File "C:\Users\min96\anaconda3\envs\t115c\lib\asyncio\events.py", line 145, in _run
  File "C:\Users\min96\anaconda3\envs\t115c\lib\site-packages\tornado\ioloop.py", line 688, in <lambda>
    lambda f: self._run_callback(functools.partial(callback, future))
  File "C:\Users\min96\anaconda3\envs\t115c\lib\site-packages\tornado\ioloop.py", line 741, in _run_callback
    ret = callback()
  File "C:\Users\min96\anaconda3\envs\t115c\lib\site-packages\tornado\gen.py", line 814, in inner
  File "C:\Users\min96\anaconda3\envs\t115c\lib\site-packages\tornado\gen.py", line 162, in _fake_ctx_run
    return f(*args, **kw)
  File "C:\Users\min96\anaconda3\envs\t115c\lib\site-packages\tornado\gen.py", line 775, in run
    yielded = self.gen.send(value)
  File "C:\Users\min96\anaconda3\envs\t115c\lib\site-packages\ipykernel\kernelbase.py", line 365, in process_one
    yield gen.maybe_future(dispatch(*args))
  File "C:\Users\min96\anaconda3\envs\t115c\lib\site-packages\tornado\gen.py", line 234, in wrapper
    yielded = ctx_run(next, result)
  File "C:\Users\min96\anaconda3\envs\t115c\lib\site-packages\tornado\gen.py", line 162, in _fake_ctx_run
    return f(*args, **kw)
  File "C:\Users\min96\anaconda3\envs\t115c\lib\site-packages\ipykernel\kernelbase.py", line 268, in dispatch_shell
    yield gen.maybe_future(handler(stream, idents, msg))
  File "C:\Users\min96\anaconda3\envs\t115c\lib\site-packages\tornado\gen.py", line 234, in wrapper
    yielded = ctx_run(next, result)
  File "C:\Users\min96\anaconda3\envs\t115c\lib\site-packages\tornado\gen.py", line 162, in _fake_ctx_run
    return f(*args, **kw)
  File "C:\Users\min96\anaconda3\envs\t115c\lib\site-packages\ipykernel\kernelbase.py", line 545, in execute_request
    user_expressions, allow_stdin,
  File "C:\Users\min96\anaconda3\envs\t115c\lib\site-packages\tornado\gen.py", line 234, in wrapper
    yielded = ctx_run(next, result)
  File "C:\Users\min96\anaconda3\envs\t115c\lib\site-packages\tornado\gen.py", line 162, in _fake_ctx_run
    return f(*args, **kw)
  File "C:\Users\min96\anaconda3\envs\t115c\lib\site-packages\ipykernel\ipkernel.py", line 306, in do_execute
    res = shell.run_cell(code, store_history=store_history, silent=silent)
  File "C:\Users\min96\anaconda3\envs\t115c\lib\site-packages\ipykernel\zmqshell.py", line 536, in run_cell
    return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs)
  File "C:\Users\min96\anaconda3\envs\t115c\lib\site-packages\IPython\core\interactiveshell.py", line 2867, in run_cell
    raw_cell, store_history, silent, shell_futures)
  File "C:\Users\min96\anaconda3\envs\t115c\lib\site-packages\IPython\core\interactiveshell.py", line 2895, in _run_cell
    return runner(coro)
  File "C:\Users\min96\anaconda3\envs\t115c\lib\site-pa

답변 1

답변을 작성해보세요.



CPU에서는 Grouped Convolution이 지원이 안되서 발생하는 문제 같습니다.

Keras Yolo3 코드의 어디에서 Grouped Convolution이 사용되는지 찾아봤는데, 명확하지 않은것 같습니다. 오류 메시지에서는 49번 convolution node 만들때 사용되었다고 하는데,..... 음, 49번 convolution node만들때 사용되지 않은 것 같은데...

암튼 CPU에서는 Keras Yolo가 문제가 있는것 같습니다.


넵. 답변에 감사드립니다.