클라우데라 매니저 "알 수 없는 상태" 문의

Question

클라우데라 매니저를 실행하면 언제부턴가 아래처럼 모두 "알 수 없는 상태"로만 나옵니다. 재시작을 해봐도 계속 그렇구요.. 일단 무시하고 계속 수업을 따라가고 있고 아직까진 별 문제는 없는 것 같은데, 왜 저렇게 나오는걸까요? 일단 위 경고를 보고 구글링해보니 sysctl -w vm.swappiness=10 을 입력하면 된다고 해서 putty에 입력했습니다. 그래도 여전히 모두 알 수 없는 상태이고 경고도 여전히 있습니다. 찾아보다 service monitor 문제인가 해서 확인해보았습니다. 상태 기록은 너무 많아서 글로 넣겠습니다(가장 위가 최신입니다.) 근데 봐도 뭘 어떻게 해야할지 모르겠네요ㅠ 뭐가 문제인걸까요? 호스트 상태 주의 The health test result for SERVICE_MONITOR_HOST_HEALTH has become concerning: The health of this role's host is concerning. The following health tests are concerning: swapping. 5 이 ( 가 ) 양호가 됨 1 이 ( 가 ) 여전히 주의 상태입니다 . The health test result for SERVICE_MONITOR_ROLE_PIPELINE has become good: 0 messages dropped by the role stage in the Service Monitor Pipeline over the previous 5 minute(s). The health test result for SERVICE_MONITOR_SCM_DESCRIPTOR_FETCH has become good: The Cloudera Manager descriptor was refreshed 521 millisecond(s) ago. The health test result for SERVICE_MONITOR_METRIC_SCHEMA_FETCH has become good: The Cloudera Manager metric schema was refreshed 569 millisecond(s) ago. The health test result for SERVICE_MONITOR_UNEXPECTED_EXITS has become good: This role encountered 0 unexpected exit(s) in the previous 5 minute(s). The health test result for SERVICE_MONITOR_PAUSE_DURATION has become good: Average time spent paused was 808 millisecond(s) (1.35%) per minute over the previous 5 minute(s). 호스트 상태 양호 2 이 ( 가 ) 여전히 불량입니다 . The health test result for SERVICE_MONITOR_HOST_HEALTH has become good: The health of this role's host is good. 프로세스 상태 양호 3 이 ( 가 ) 여전히 불량입니다 . The health test result for SERVICE_MONITOR_SCM_HEALTH has become good: This role's status is as expected. The role is started. 프로세스 상태 불량 The health test result for SERVICE_MONITOR_SCM_HEALTH has become bad: This role's host has been out of contact with Cloudera Manager for too long. 3 이 ( 가 ) 불량이 됨 1 이 ( 가 ) 주의가 됨 8 이 ( 가 ) 양호가 됨 The health test result for SERVICE_MONITOR_SCM_DESCRIPTOR_FETCH has become bad: The Cloudera Manager descriptor was refreshed 28 minute(s), 38 second(s) ago. Critical threshold: 2 minute(s). The health test result for SERVICE_MONITOR_METRIC_SCHEMA_FETCH has become bad: The Cloudera Manager metric schema was refreshed 28 minute(s), 38 second(s) ago. Critical threshold: 2 minute(s). The health test result for SERVICE_MONITOR_HOST_HEALTH has become bad: The health of this role's host is bad. The following health tests are bad: agent status. The health test result for SERVICE_MONITOR_SWAP_MEMORY_USAGE has become concerning: 13.4 MiB of swap memory is being used by this role's process. Warning threshold: 200 B. The health test result for SERVICE_MONITOR_AGGREGATION_RUN_DURATION has become good: The last metrics aggregation run duration is 852 millisecond(s). The health test result for SERVICE_MONITOR_SCM_HEALTH has become good: This role's status is as expected. The role is started. The health test result for SERVICE_MONITOR_FILE_DESCRIPTOR has become good: Open file descriptors: 553. File descriptor limit: 65,536. Percentage in use: 0.84%. The health test result for SERVICE_MONITOR_LOG_DIRECTORY_FREE_SPACE has become good: This role's Log Directory (/var/log/cloudera-scm-firehose) is on a filesystem with more than 10.0 GiB of its space free. The health test result for SERVICE_MONITOR_HEAP_DUMP_DIRECTORY_FREE_SPACE has become good: This role's Heap Dump Directory (/tmp) is on a filesystem with more than 10.0 GiB of its space free. The health test result for SERVICE_MONITOR_WEB_METRIC_COLLECTION has become good: The web server of this role is responding with metrics. The most recent collection took 2.2 second(s). The health test result for SERVICE_MONITOR_HEAP_SIZE has become good: Heap used: 164M. JVM maximum available heap size: 256M. Percentage of maximum heap: 64.06%. The health test result for SERVICE_MONITOR_STORAGE_DIRECTORY_FREE_SPACE has become good: This role's Service Monitor Storage Directory (/var/lib/cloudera-service-monitor) is on a filesystem with more than 10.0 GiB of its space free. 15 이 ( 가 ) 알 수 없음이 됨 The health test result for SERVICE_MONITOR_ROLE_PIPELINE has become unknown: Not enough data to test: Test of whether the Service Monitor's role pipeline is dropping messages. The health test result for SERVICE_MONITOR_SCM_DESCRIPTOR_FETCH has become unknown: Not enough data to test: Test of whether the Cloudera Manager descriptor is up to date. The health test result for SERVICE_MONITOR_METRIC_SCHEMA_FETCH has become unknown: Not enough data to test: Test of whether the Cloudera Manager metric schema is up to date. The health test result for SERVICE_MONITOR_AGGREGATION_RUN_DURATION has become unknown: Not enough data to test: Test of whether the metrics aggregation takes too long. The health test result for SERVICE_MONITOR_SCM_HEALTH has become unknown: Not enough data to test: Test of whether the process state of this role is as expected. The health test result for SERVICE_MONITOR_UNEXPECTED_EXITS has become unknown: Not enough data to test: Test of whether a role has encountered unexpected exits The health test result for SERVICE_MONITOR_FILE_DESCRIPTOR has become unknown: Not enough data to test: Test of whether this role has too many open file descriptors. The health test result for SERVICE_MONITOR_SWAP_MEMORY_USAGE has become unknown: Not enough data to test: Test of whether the role is using swap memory. The health test result for SERVICE_MONITOR_LOG_DIRECTORY_FREE_SPACE has become unknown: Not enough data to test: Test of whether this role's log directory has enough free space. The health test result for SERVICE_MONITOR_HEAP_DUMP_DIRECTORY_FREE_SPACE has become unknown: Not enough data to test: Test of whether this role's heap dump directory has enough free space. The health test result for SERVICE_MONITOR_HOST_HEALTH has become unknown: Not enough data to test: Test of whether the host running this role is healthy. The health test result for SERVICE_MONITOR_WEB_METRIC_COLLECTION has become unknown: Not enough data to test: Test of whether this role's web server is responding to requests for metrics. The health test result for SERVICE_MONITOR_PAUSE_DURATION has become unknown: Not enough data to test: Test of whether this role's threads are being scheduled appropriately. The health test result for SERVICE_MONITOR_HEAP_SIZE has become unknown: Not enough data to test: Test of whether this role needs more heap The health test result for SERVICE_MONITOR_STORAGE_DIRECTORY_FREE_SPACE has become unknown: Not enough data to test: Test of whether the Service Monitor Storage Directory has enough free space.

Big.D · Answer

안녕하세요! 빅디 입니다. 서비스들의 모니터링 정보들이 "알수 없는 상태"로 표기 되는건 좌측 하단의 Cloudera Management Service 가 정지 되어 있기 때문입니다. 앞선 강의에서요.. "파일럿 PC환경에선 자원 부족이 발생 할 수 있고, 이를 위해 모니터링 역할만 하는 Cloudera Management Service를 종료 하는게 좋습니다." 라고 설명 하면서 정지 하는 과정이 있었습니다. 아마 이 부분의 내용을 잠시 놓치신것 같습니다. ^^; Cloudera Management Service 를 시작 시키고 약 5~10분 정도 기다리면 모니터링 상태가 정상 표기 될겁니다. 참고로 조치 하셨던 메모리 스왑 옵션은 아래 명령을 통해 다시 100으로 설정해 주세요~ $ sysctl -w vm.swappiness=100 이유는 파일럿 환경에선 메모리가 부족 하기때문에 디스크에 메모리를 스왑하는 기능을 적극적으로 써야 하기 때문 입니다. - 빅디 드림