Grafana Loki 全量部署指南(EKS + EBS gp3 + Promtail + Grafana)
目录
1. 架构概览
2. 前置条件
3. 存储配置
4. Loki 部署
5. Promtail 部署
6. 对外访问
7. Grafana 部署
8. 快速验证
9. Dashboard
10. 常见问题
11. 生产化建议
1) 架构概览
+----------------------+ +------------------------------+
| K8s Nodes | | Amazon EBS (gp3) |
| (DaemonSet) | | PVC → PV (RWO, xfs) |
| Promtail | push +------------------------------+
| └─ tail /var/log | ======> | Loki (Single Binary) |
| | | └─ /var/loki ←─ PVC(EBS) |
+----------------------+ | └─ Gateway (LoadBalancer) |
+------------------------------+
↑
Grafana / curl / logcli
- Promtail:每个节点 DaemonSet 采集容器日志(/var/log/pods)
- Loki 单二进制:单 Pod 写入 EBS PVC,本地 filesystem 模式
- Gateway:统一入口,外部访问 NLB / Ingress
- Grafana:配置 Loki 数据源,可视化日志
2) 前置条件
- 已有 EKS 集群
- 节点可挂载 EBS 卷,建议多 AZ
- 已安装
kubectl与helm
- 安装 EBS CSI Driver:
aws eks create-addon --cluster-name <cluster> --addon-name aws-ebs-csi-driver
kubectl create ns loki || true
3) 存储配置(EBS gp3)
StorageClass 示例:
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: gp3-loki
provisioner: ebs.csi.aws.com
parameters:
type: gp3
iops: "3000"
throughput: "125"
csi.storage.k8s.io/fstype: xfs
encrypted: "true"
allowVolumeExpansion: true
reclaimPolicy: Retain
volumeBindingMode: WaitForFirstConsumer
kubectl apply -f storageclass-gp3-loki.yaml
> ⚠️ 易错点:volumeBindingMode: WaitForFirstConsumer 必须,否则 PVC Pending
4) Loki 部署(单二进制 + filesystem + PVC)
values-loki.yaml 样例:
deploymentMode: SingleBinary
read: { replicas: 0 }
write: { replicas: 0 }
backend: { replicas: 0 }
singleBinary:
replicas: 1
podSecurityContext:
fsGroup: 10001
fsGroupChangePolicy: "OnRootMismatch"
persistence:
storageClass: gp3-loki
accessModes: ["ReadWriteOnce"]
size: 300Gi
loki:
storage:
type: filesystem
filesystem:
chunks_directory: /var/loki/chunks
rules_directory: /var/loki/rules
bucketNames:
chunks: chunks
ruler: ruler
admin: admin
storage_config:
filesystem:
directory: /var/loki/tsdb
commonConfig:
replication_factor: 1
path_prefix: /var/loki
schemaConfig:
configs:
- from: "2024-04-01"
store: tsdb
object_store: filesystem
schema: v13
index:
prefix: loki_index_
period: 24h
limits_config:
retention_period: 168h
chunksCache: { enabled: false }
resultsCache: { enabled: false }
canary: { enabled: false }
gateway:
service:
type: LoadBalancer
port: 80
annotations:
service.beta.kubernetes.io/aws-load-balancer-type: "nlb"
service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: "ip"
service.beta.kubernetes.io/aws-load-balancer-scheme: "internet-facing"
安装 Loki:
helm repo add grafana https://grafana.github.io/helm-charts
helm repo update
helm upgrade --install loki grafana/loki -n loki -f values-loki.yaml
> ⚠️ 易错点:
> - 3.5.x+ 不支持 admin_api_directory
> - fsGroup 未设置导致 PVC 权限错误
> - Gateway 500 / empty ring → 单二进制与 SimpleScalable 配置冲突
5) Promtail 部署
values-promtail.yaml 示例:
rbac: { create: true }
daemonset: { enabled: true }
tolerations: [ { operator: Exists } ]
config:
server:
http_listen_port: 3101
grpc_listen_port: 0
positions:
filename: /run/promtail/positions.yaml
clients:
- url: http://loki-gateway.loki.svc.cluster.local/loki/api/v1/push
scrape_configs:
- job_name: kubernetes-pods
pipeline_stages:
- cri: {}
kubernetes_sd_configs:
- role: pod
relabel_configs:
- action: replace
source_labels: [__meta_kubernetes_namespace]
target_label: namespace
- action: replace
source_labels: [__meta_kubernetes_pod_name]
target_label: pod
- action: replace
source_labels: [__meta_kubernetes_container_name]
target_label: container
- action: replace
source_labels: [__meta_kubernetes_pod_node_name]
target_label: node
- action: replace
source_labels: [__meta_kubernetes_pod_uid]
target_label: __path__
replacement: /var/log/pods/*$1/*.log
- action: labeldrop
regex: __meta_kubernetes_pod_label_.+
- action: labeldrop
regex: __meta_kubernetes_pod_annotation_.+
helm upgrade --install promtail grafana/promtail -n loki -f values-promtail.yaml
> ⚠️ 易错点:Promtail pod 内无 curl,需外部 pod 测试 DNS;401 Unauthorized → 多租户 auth_enabled 问题
6) Loki Gateway / 对外访问
- ClusterIP 内网访问:
loki-gateway.loki.svc.cluster.local
- LoadBalancer 外网访问:NLB + scheme=internet-facing
- 端口转发调试:
kubectl port-forward svc/loki-gateway -n loki 3100:80 &
curl -XPOST http://127.0.0.1:3100/loki/api/v1/push -H "Content-Type: application/json" --data-raw '{"streams":[{"stream":{"job":"test"},"values":[["'"$(date +%s)000000000"'","fizzbuzz"]]}]}'
7) Grafana 部署与数据源
values-grafana.yaml 示例:
adminUser: admin
adminPassword: StrongPassw0rd!
service:
type: LoadBalancer
port: 80
annotations:
service.beta.kubernetes.io/aws-load-balancer-type: nlb
service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: ip
service.beta.kubernetes.io/aws-load-balancer-scheme: internet-facing
persistence:
enabled: true
size: 10Gi
storageClassName: gp3-loki
datasources:
datasources.yaml:
apiVersion: 1
datasources:
- name: Loki
type: loki
access: proxy
isDefault: true
url: http://loki-gateway.loki.svc.cluster.local
jsonData:
maxLines: 1000
helm upgrade --install grafana grafana/grafana -n loki -f values-grafana.yaml
8) 快速验证
NS=loki
LOKI_SVC=http://loki.${NS}.svc.cluster.local:3100
GW=http://loki-gateway.${NS}.svc.cluster.local
# Loki Ready
kubectl get pods -n $NS
curl -s -o /dev/null -w "%{http_code}
" ${LOKI_SVC}/ready
# Gateway Ready
curl -s -o /dev/null -w "%{http_code}
" ${GW}/loki/api/v1/status/buildinfo
# Promtail Logs
kubectl logs -n $NS ds/promtail --since=2m | egrep '204|error|failed'
9) Dashboard / Logs Starter
- 官方推荐:Grafana Dashboard ID
15141或15324
- 导入步骤:Grafana → Dashboards → Import → 输入 Dashboard ID → 选择 Loki 数据源 → Import
10) 常见问题与易错点
| 问题 | 原因 | 解决 |
|---|---|---|
| Promtail 404 / 500 / 502 | Gateway 未就绪或单二进制 ring | 检查 Loki pod logs,重启 Gateway |
| Promtail 401 Unauthorized | auth_enabled 多租户 | 关闭 auth_enabled 或配置 tenant_id |
| PVC Pending | AZ 不匹配,StorageClass 错 | volumeBindingMode: WaitForFirstConsumer |
| Loki 无法启动 | 权限不足 | 设置 fsGroup 10001,卷权限读写 |
| Dashboard 无日志 | Promtail 无日志推送 | 检查 /var/log/pods 是否挂载到 Promtail |
11) 生产化建议
- 单二进制适合测试/小集群,生产可考虑 SimpleScalable + S3
- gp3 IOPS/吞吐可按需调整
- retention_period 控制日志保留天数
- labeldrop 控制高基数标签
- 定期巡检
/ready,Promtail 204 状态
一键部署总结
# 0) EKS + CSI
aws eks create-addon --cluster-name <cluster> --addon-name aws-ebs-csi-driver
kubectl create ns loki || true
# 1) StorageClass
kubectl apply -f storageclass-gp3-loki.yaml
# 2) Loki
helm upgrade --install loki grafana/loki -n loki -f values-loki.yaml
# 3) Promtail
helm upgrade --install promtail grafana/promtail -n loki -f values-promtail.yaml
# 4) Grafana
helm upgrade --install grafana grafana/grafana -n loki -f values-grafana.yaml
下一步
- EFK/ELK 日志集群 — Elasticsearch + Fluentd 日志方案对比