프로메테우스

강사님, 안녕하세요. 이전에 프로메테우스 질문 글 남겼는데
추가적으로 질문하고 싶어 댓글을 달았다가 새로 글을 작성합니다.

질문이 두서가 없고 정신없지만 양해부탁드립니다... 늘 감사합니다.

현재 미니큐브 인스턴스에 프로메테우스와 그라파나를 올린 상황입니다. 그래서 다른 mongodb 인스턴스를 모니터링하고, alert manager로 슬랙 알림을 주고 싶습니다.

1. alertmanager로 슬랙을 연동시키고자 강사님이 작성하신 values-prometheus.yaml 파일을 수정하였습니다. 헬름 차트 깃허브에 있는 야믈 파일을 참고했습니다.

cat <<EOF > values-prometheus.yaml

alertmanager: # 30~33
  enabled: true
  persistentVolume:
    ## If true, alertmanager will create/use a Persistent Volume Claim
    ## If false, use emptyDir
    enabled: true
    accessModes:
      - ReadWriteOnce
    size: 2Gi
    replicaCount: 1
  service:
    type: LoadBalancer

## alertmanager ConfigMap entries # 1360
alertmanagerFiles:
  alertmanager.yml:
    global:
       resolve_timeout: 5m
       slack_api_url: 'https://hooks.slack.com/services/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'
    route:
      group_by: ['monitoring']
      group_wait: 30s
      repeat_interval: 1h
      receiver: default-receiver
      routes:
      - match:
          alertname: DeadMansSwitch
        receiver: 'null'
      - match:
        receiver: 'slack'
        continue: true
    receivers:
    - name: 'null'
    - name: 'slack'
      slack_configs:
       - channel: 'test'
         username: 'prometheus'
         send_resolved: true
         icon_url: https://avatars3.githubusercontent.com/u/3380462
         title: |-
          [{{ .Status | toUpper }}{{ if eq .Status "firing" }}:{{ .Alerts.Firing | len }}{{ end }}] {{ .CommonLabels.alertname }} for {{ .CommonLabels.job }}
         text: >-
            {{ range .Alerts -}}
            *Alert:* {{ .Annotations.title }}{{ if .Labels.severity }} - {{ end }}

            *Description:* {{ .Annotations.description }}

            *Details:*
              {{ range .Labels.SortedPairs }} • *{{ .Name }}:*
              {{ end }}
            {{ end }}

server:
  enabled: true

  persistentVolume:
    enabled: true
    accessModes:
      - ReadWriteOnce
    mountPath: /data
    size: 100Gi
  replicaCount: 1

  ## Prometheus data retention period (default if not specified is 15 days)
  retention: "15d"  # 15일간 데이터를 보존함

serverFiles:
  alerting_rules.yml:
    groups:
    - name: example
      rules:
      - alert: HighRequestLatency
        expr: sum(rate(container_network_receive_bytes_total{namespace="kube-logging"}[5m]))>20000
        for: 1m
        labels:
          severity: page
        annotations:
          summary: High request latency

  prometheus.yml:
    rule_files:
      - /etc/config/alerting_rules.yml
      - /etc/config/alerts

EOF

문제는 helm install 하였더니 alertmanager 파드에서 Crash Loop BackOff 오류가 납니다...
(exit code : 1) 이미지가 이미 존재한다는데 구글링을 했음에도 명확한 해결책을 찾지 못했습니다.

제가 야믈파일을 작성하는 과정에서 문제가 생긴 것 같은데 잘 모르겠어서 질문 드립니다..

2. mongoDB인스턴스를 타겟으로 설정해주고자 헬름차트 깃허브에 있었던 mongodb-exporter.yaml 파일을 그대로 가져왔습니다.

cat <<EOF > mongodb.yaml
mongodb:
  uri: "mongodb://mongodb0.example.com:27017"
existingSecret:
  name: "MONGO_INITDB_ROOT_PASSWORD"
  key: "secret"
port: "80"

readinessProbe:
  httpGet:
    path: /
    port: metrics
  initialDelaySeconds: 10

metrics:
  enabled: true

	serviceMonitor:
	  enabled: true
	  interval: 30s
	  scrapeTimeout: 10s
	  namespace:
	  additionalLabels: {}
	  targetLabels: []
	  metricRelabelings: []

EOF

저장 후
helm install mongodb prometheus-community/prometheus-mongodb-exporter -f mongodb.yaml -n prometheus
해주었는데 이렇게 하는 것이 과연 맞을지 궁금합니다...

인프런 커뮤니티 질문&답변