k8s 1.20 prometheus
下载 github地址
https://github.com/prometheus-operator/kube-prometheus
wget https://github.com/prometheus-operator/kube-prometheus/archive/refs/tags/v0.8.0.tar.gz
tar -zxf kube-prometheus-0.8.0.tar.gz cd kube-prometheus-0.8.0/manifests
所有配置均在manifests下
配置持久化存储 创建ceph的secrt 在 monitoring 命名空间创建pvc用于访问ceph的 secret
kubectl create secret generic ceph-user-secret --type="kubernetes.io/rbd"
--from-literal=key=AQDlGKZgG2xRNxAA4DYniPBpaV5SAyU1/QH/5w==
--namespace=monitoring
grafana-deployment.yaml 修改存储类型为pvc
volumes: #- emptyDir:
- name: grafana-storage persistentVolumeClaim: claimName: grafana-data
最下方加入PersistentVolumeClaim配置
apiVersion: v1 kind: PersistentVolumeClaim metadata: name: grafana-data namespace: monitoring spec: storageClassName: dynamic-ceph-rbd accessModes:
- ReadWriteOnce resources: requests: storage: 5Gi
prometheus-prometheus.yaml 下方加入storageClass配置
serviceMonitorSelector: version: 2.26.0
下方加入
storage: volumeClaimTemplate: spec: storageClassName: dynamic-ceph-rbd resources: requests: storage: 50Gi
部署kube-prometheus kubectl apply -f manifests/setup/ kubectl apply -f manifests/
查看
kubectl get pods -n monitoring kubectl get svc -n monitoring kubectl get ep -n monitoring 1 2 3 使用ingress代理prometheus
cat > ingress-prometheus.yaml << EOF
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: ingress-prometheus
namespace: monitoring
annotations:
kubernetes.io/ingress.class: "nginx"
prometheus.io/http_probe: "true"
spec:
rules:
- host: alert.localprom.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: alertmanager-main
port:
number: 9093
- host: grafana.localprom.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: grafana
port:
number: 3000
- host: prom.localprom.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: prometheus-k8s
port:
number: 9090
EOF
kubectl apply -f ingress-prometheus.yaml kubectl get ing -n monitoring
## 排错
kube-state-metrics 镜像下载失败 修改kube-state-metrics-deployment.yaml镜像地址
containers:
- args:
- --host=127.0.0.1
- --port=8081
- --telemetry-host=127.0.0.1
- --telemetry-port=8082 image: bitnami/kube-state-metrics:2.0.0 name: kube-state-metrics
prometheus监控ControllerManager、Scheduler没有数据 修改k8s集群配置文件: kube-controller-manager.conf和kube-scheduler.conf,修改后重启服务 --bind-address=0.0.0.0 1 创建kube-controller-namager和kube-scheduler的svc
cat > kube-controller-namager-svc-ep.yaml << 'EOF'
apiVersion: v1
kind: Service
metadata:
name: kube-controller-manager
namespace: kube-system
labels:
app.kubernetes.io/name: kube-controller-manager
spec:
type: ClusterIP
clusterIP: None
ports:
- name: http-metrics
port: 10252
targetPort: 10252
protocol: TCP
- name: https-metrics
port: 10257
targetPort: 10257
protocol: TCP
---
apiVersion: v1
kind: Endpoints
metadata:
name: kube-controller-manager
namespace: kube-system
labels:
app.kubernetes.io/name: kube-controller-manager
subsets:
- addresses:
- ip: 192.168.2.101
ports:
- name: http-metrics
port: 10252
protocol: TCP
- name: https-metrics
port: 10257
protocol: TCP
EOF
kubectl apply -f kube-controller-namager-svc-ep.yaml kubectl get ep -n kube-system
kube-scheduler
cat > kube-scheduler-svc-ep.yaml << 'EOF'
apiVersion: v1
kind: Service
metadata:
name: kube-scheduler
namespace: kube-system
labels:
app.kubernetes.io/name: kube-scheduler
spec:
type: ClusterIP
clusterIP: None
ports:
- name: http-metrics
port: 10251
targetPort: 10251
protocol: TCP
- name: https-metrics
port: 10259
targetPort: 10259
protocol: TCP
---
apiVersion: v1
kind: Endpoints
metadata:
name: kube-scheduler
namespace: kube-system
labels:
app.kubernetes.io/name: kube-scheduler
subsets:
- addresses:
- ip: 192.168.2.101
ports:
- name: http-metrics
port: 10251
protocol: TCP
- name: https-metrics
port: 10259
protocol: TCP
EOF
kubectl apply -f kube-scheduler-svc-ep.yaml
kubectl get ep -n kube-system
注意labels,要与kube-prometheus中kubernetes-serviceMonitorKubeScheduler.yaml和kubernetes-serviceMonitorKubeControllerManager.yaml里面的标签对应
修改kubernetes-serviceMonitorKubeControllerManager.yaml和kubernetes-serviceMonitorKubeScheduler.yaml配置文件,修改采集方式为http方式
spec:
endpoints:
- bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
interval: 30s
port: http-metrics
scheme: http
tlsConfig:
insecureSkipVerify: true
生效配置
kubectl delete -f kubernetes-serviceMonitorKubeControllerManager.yaml
kubectl apply -f kubernetes-serviceMonitorKubeControllerManager.yaml
kubectl delete -f kubernetes-serviceMonitorKubeScheduler.yaml
kubectl apply -f kubernetes-serviceMonitorKubeScheduler.yaml
CoreDNS没有数据
查看coredns的现有标签
kubectl get ep kube-dns -n kube-system -o yaml|grep -A 5 'labels'
labels:
addonmanager.kubernetes.io/mode: Reconcile
k8s-app: kube-dns
kubernetes.io/cluster-service: "true"
kubernetes.io/name: CoreDNS
修改kubernetes-serviceMonitorCoreDNS.yaml配置, 修改标签为coredns现有标签
spec:
endpoints:
- bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
interval: 15s
port: metrics
jobLabel: app.kubernetes.io/name
namespaceSelector:
matchNames:
- kube-system
selector:
matchLabels:
kubernetes.io/name: CoreDNS
生效配置
kubectl delete -f kubernetes-serviceMonitorCoreDNS.yaml
kubectl apply -f kubernetes-serviceMonitorCoreDNS.yaml