해결된 질문
작성
·
180
0
안녕하세요. 선생님 keras-yolo3 raccoon 데이터셋 훈련 때 오류가 발생했습니다. freeze시에는 잘 동작을 하는데, 모든 레이어를 unfreeze 할 때는 처음부터 에러가 바로 발생하네요. 메모리 용량 문제로 보이는데, 개인 GPU라서 해결할 방법은 GPU 메모리가 높은 것을 사용하는 것이겠죠?
--------------------------------------------------------------------------- ResourceExhaustedError Traceback (most recent call last) <ipython-input-6-b92f17d39649> in <module> 82 epochs=100, 83 initial_epoch=50, ---> 84 callbacks=[logging, checkpoint, reduce_lr, early_stopping]) 85 model.save_weights(log_dir + 'trained_weights_final.h5') ~\anaconda3\envs\tf113\lib\site-packages\keras\legacy\interfaces.py in wrapper(*args, **kwargs) 89 warnings.warn('Update your `' + object_name + 90 '` call to the Keras 2 API: ' + signature, stacklevel=2) ---> 91 return func(*args, **kwargs) 92 wrapper._original_function = func 93 return wrapper ~\anaconda3\envs\tf113\lib\site-packages\keras\engine\training.py in fit_generator(self, generator, steps_per_epoch, epochs, verbose, callbacks, validation_data, validation_steps, class_weight, max_queue_size, workers, use_multiprocessing, shuffle, initial_epoch) 1413 use_multiprocessing=use_multiprocessing, 1414 shuffle=shuffle, -> 1415 initial_epoch=initial_epoch) 1416 1417 @interfaces.legacy_generator_methods_support ~\anaconda3\envs\tf113\lib\site-packages\keras\engine\training_generator.py in fit_generator(model, generator, steps_per_epoch, epochs, verbose, callbacks, validation_data, validation_steps, class_weight, max_queue_size, workers, use_multiprocessing, shuffle, initial_epoch) 211 outs = model.train_on_batch(x, y, 212 sample_weight=sample_weight, --> 213 class_weight=class_weight) 214 215 outs = to_list(outs) ~\anaconda3\envs\tf113\lib\site-packages\keras\engine\training.py in train_on_batch(self, x, y, sample_weight, class_weight) 1213 ins = x + y + sample_weights 1214 self._make_train_function() -> 1215 outputs = self.train_function(ins) 1216 return unpack_singleton(outputs) 1217 ~\anaconda3\envs\tf113\lib\site-packages\keras\backend\tensorflow_backend.py in __call__(self, inputs) 2664 return self._legacy_call(inputs) 2665 -> 2666 return self._call(inputs) 2667 else: 2668 if py_any(is_tensor(x) for x in inputs): ~\anaconda3\envs\tf113\lib\site-packages\keras\backend\tensorflow_backend.py in _call(self, inputs) 2634 symbol_vals, 2635 session) -> 2636 fetched = self._callable_fn(*array_vals) 2637 return fetched[:len(self.outputs)] 2638 ~\anaconda3\envs\tf113\lib\site-packages\tensorflow\python\client\session.py in __call__(self, *args, **kwargs) 1437 ret = tf_session.TF_SessionRunCallable( 1438 self._session._session, self._handle, args, status, -> 1439 run_metadata_ptr) 1440 if run_metadata: 1441 proto_data = tf_session.TF_GetBuffer(run_metadata_ptr) ~\anaconda3\envs\tf113\lib\site-packages\tensorflow\python\framework\errors_impl.py in __exit__(self, type_arg, value_arg, traceback_arg) 526 None, None, 527 compat.as_text(c_api.TF_Message(self.status.status)), --> 528 c_api.TF_GetCode(self.status.status)) 529 # Delete the underlying status object from memory otherwise it stays alive 530 # as there is a reference to status from this from the traceback due to ResourceExhaustedError: OOM when allocating tensor with shape[1024,512,3,3] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [[{{node training_1/Adam/gradients/conv2d_58/convolution_grad/Conv2DBackpropFilter}}]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
그리고 추가로 궁금한 것이, 처음에 훈련할 때 freeze는 따로 코드로 설정하지 않아도 첫번째 레이어는 고정되어지는 것인가요?
감사합니다