EFK / ELK 日志集群

概述

ELK(Elasticsearch + Logstash + Kibana)是经典的日志采集、存储、分析开源解决方案。EFK 将 Logstash 替换为 Fluentd/Fluent Bit,适用于 Kubernetes 环境。

组件介绍:

  • Elasticsearch:分布式搜索引擎,负责日志存储与全文检索
  • Logstash:数据处理管道,支持日志采集、过滤、转换(可选)
  • Fluentd / Fluent Bit:轻量级日志收集器,适合 K8s 环境
  • Kibana:可视化前端,Elasticsearch 的图形化管理界面

架构


应用日志 → Fluentd/Fluent Bit → Elasticsearch ← Kibana
                                    ↑
                              Logstash(可选)

Fluent Bit 部署架构(K8s 环境推荐)


Pod Logs → Node-Level Fluent Bit → Elasticsearch
ConfigMap → DaemonSet(每个 Node 一个实例)

Elasticsearch 部署

Docker Compose 单节点


version: '3.8'
services:
  elasticsearch:
    image: docker.elastic.co/elasticsearch/elasticsearch:8.11.3
    container_name: elasticsearch
    environment:
      - discovery.type=single-node
      - xpack.security.enabled=false
      - "ES_JAVA_OPTS=-Xms2g -Xmx2g"
    ports:
      - "9200:9200"
      - "9300:9300"
    volumes:
      - es-data:/usr/share/elasticsearch/data
    mem_limit: 4g

  kibana:
    image: docker.elastic.co/kibana/kibana:8.11.3
    container_name: kibana
    environment:
      - ELASTICSEARCH_HOSTS=http://elasticsearch:9200
    ports:
      - "5601:5601"
    depends_on:
      - elasticsearch

volumes:
  es-data:

Kubernetes 部署(ECK Operator)


# 安装 ECK Operator
kubectl apply -f https://download.elastic.co/downloads/eck/2.11.0/crds.yaml
kubectl apply -f https://download.elastic.co/downloads/eck/2.11.0/operator.yaml

# 部署 Elasticsearch 集群
cat <<EOF | kubectl apply -f -
apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
  name: efk-es
  namespace: logging
spec:
  version: 8.11.3
  nodeSets:
    - name: default
      count: 3
      config:
        node.store.allow_mmap: false
      volumeClaimTemplates:
        - metadata:
            name: elasticsearch-data
          spec:
            accessModes:
              - ReadWriteOnce
            resources:
              requests:
                storage: 100Gi
            storageClassName: gp3
EOF

# 部署 Kibana
cat <<EOF | kubectl apply -f -
apiVersion: kibana.k8s.elastic.co/v1
kind: Kibana
metadata:
  name: efk-kibana
  namespace: logging
spec:
  version: 8.11.3
  count: 1
  elasticsearchRef:
    name: efk-es
EOF

生产环境集群配置


# elasticsearch.yml 关键配置
cluster.name: efk-cluster
node.name: ${HOSTNAME}
network.host: 0.0.0.0
discovery.seed_hosts: ["es-0", "es-1", "es-2"]
cluster.initial_master_nodes: ["es-0", "es-1", "es-2"]

# 内存锁定(生产环境)
bootstrap.memory_lock: true

# 熔断器
indices.breaker.total.limit: 60%
indices.breaker.fielddata.limit: 40%

# 副本策略
index.number_of_replicas: 1
index.auto_create_index: true

Fluent Bit 部署

Kubernetes DaemonSet 部署


apiVersion: v1
kind: ServiceAccount
metadata:
  name: fluent-bit
  namespace: logging
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: fluent-bit
rules:
  - apiGroups: [""]
    resources:
      - namespaces
      - pods
      - pods/logs
    verbs: ["get", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: fluent-bit
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: fluent-bit
subjects:
  - kind: ServiceAccount
    name: fluent-bit
    namespace: logging
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: fluent-bit
  namespace: logging
  labels:
    app: fluent-bit
spec:
  selector:
    matchLabels:
      app: fluent-bit
  template:
    metadata:
      labels:
        app: fluent-bit
    spec:
      serviceAccountName: fluent-bit
      containers:
        - name: fluent-bit
          image: cr.fluentbit.io/fluent/fluent-bit:2.2.2
          imagePullPolicy: Always
          ports:
            - containerPort: 2020
          env:
            - name: ES_HOST
              value: "efk-es-es-http.logging"
            - name: ES_PORT
              value: "9200"
            - name: ES_USER
              value: "elastic"
            - name: ES_PASS
              valueFrom:
                secretKeyRef:
                  name: efk-es-elastic-user
                  key: elastic
          volumeMounts:
            - name: varlog
              mountPath: /var/log
            - name: varlibdockercontainers
              mountPath: /var/lib/docker/containers
              readOnly: true
            - name: fluent-bit-config
              mountPath: /fluent-bit/etc
      volumes:
        - name: varlog
          hostPath:
            path: /var/log
        - name: varlibdockercontainers
          hostPath:
            path: /var/lib/docker/containers
        - name: fluent-bit-config
          configMap:
            name: fluent-bit-config

Fluent Bit ConfigMap


apiVersion: v1
kind: ConfigMap
metadata:
  name: fluent-bit-config
  namespace: logging
data:
  fluent-bit.conf: |
    [SERVICE]
        Flush         5
        Log_Level     info
        Daemon        off
        Parsers_File  parsers.conf
        HTTP_Server   On
        HTTP_Listen   0.0.0.0
        HTTP_Port     2020
        Health_Check  On

    [INPUT]
        Name              tail
        Path              /var/log/containers/*.log
        Parser            docker
        Tag               kube.*
        Refresh_Interval  5
        Mem_Buf_Limit     50MB
        Skip_Long_Lines   On

    [INPUT]
        Name              systemd
        Tag               host.*
        Systemd_Filter    _SYSTEMD_UNIT=docker.service
        DB                /var/log/flb_lib.db

    [FILTER]
        Name                kubernetes
        Match               kube.*
        Kube_URL            https://kubernetes.default.svc:443
        Kube_CA_File        /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        Kube_Token_File     /var/run/secrets/kubernetes.io/serviceaccount/token
        Kube_Tag_Prefix     kube.var.log.containers.
        Merge_Log           On
        Merge_Log_Key       log_processed
        K8S-Logging.Parser  On
        K8S-Logging.Exclude On

    [OUTPUT]
        Name            es
        Match           kube.*
        Host            ${ES_HOST}
        Port            ${ES_PORT}
        HTTP_User       ${ES_USER}
        HTTP_Passwd     ${ES_PASS}
        Logstash_Format On
        Logstash_Prefix kubernetes
        Retry_Limit     False
        Replace_Dots    On
        Suppress_Type_Name On

    [OUTPUT]
        Name            stdout
        Match           *
        Format          json

  parsers.conf: |
    [PARSER]
        Name             docker
        Format           json
        Time_Key         time
        Time_Format      %Y-%m-%dT%H:%M:%S.%L
        Time_Keep        On

    [PARSER]
        Name             json
        Format           json
        Time_Key         timestamp
        Time_Format      %Y-%m-%dT%H:%M:%S.%LZ

    [PARSER]
        Name             syslog
        Format           regex
        Regex            ^<(?<priority>[0-9]+)>(?<time>[^ ]* {1,2}[^ ]* [^ ]*) (?<host>[^ ]*) (?<ident>[^ ]*?)[\[](?<pid>[0-9]+)\]?: (?<message>.*)$
        Time_Key         time
        Time_Format      %b %d %H:%M:%S

Elasticsearch 索引管理

索引模板配置


# 创建索引模板
curl -X PUT "localhost:9200/_index_template/efk-template" -H 'Content-Type: application/json' -d'
{
  "index_patterns": ["kubernetes-*"],
  "template": {
    "settings": {
      "number_of_shards": 2,
      "number_of_replicas": 1,
      "index.lifecycle.name": "efk-policy",
      "index.lifecycle.rollover_alias": "kubernetes"
    },
    "mappings": {
      "properties": {
        "@timestamp": { "type": "date" },
        "log": { "type": "text", "analyzer": "standard" },
        "stream": { "type": "keyword" },
        "kubernetes": {
          "properties": {
            "pod_name": { "type": "keyword" },
            "namespace_name": { "type": "keyword" },
            "container_name": { "type": "keyword" },
            "labels": { "type": "object", "enabled": false }
          }
        }
      }
    }
  }
}'

ILM(索引生命周期管理)


curl -X PUT "localhost:9200/_ilm/policy/efk-policy" -H 'Content-Type: application/json' -d'
{
  "policy": {
    "phases": {
      "hot": {
        "min_age": "0ms",
        "actions": {
          "rollover": {
            "max_age": "7d",
            "max_size": "50gb"
          },
          "set_priority": { "priority": 100 }
        }
      },
      "warm": {
        "min_age": "7d",
        "actions": {
          "shrink": { "number_of_shards": 1 },
          "forcemerge": { "max_num_segments": 1 },
          "set_priority": { "priority": 50 }
        }
      },
      "delete": {
        "min_age": "30d",
        "actions": {
          "delete": {}
        }
      }
    }
  }
}'

Kibana 配置与使用

索引模式创建

1. Kibana → Management → Index Patterns → Create index pattern

2. 输入 kubernetes-* 作为索引模式

3. 选择 @timestamp 作为时间字段

4. 完成创建

Discover 日志查询

功能 操作
全文搜索 直接输入关键字
字段过滤 kubernetes.namespace_name:"production"
时间范围 右上角时间选择器
字段筛选 添加 filter 条件
保存查询 Save → 输入名称和描述

可视化仪表盘

常用可视化类型:

  • Pie Chart: 按 Namespace/Container 分组统计
  • Data Table: Top N 日志量来源
  • Tag Cloud: 高频错误关键字
  • Markdown: 自定义说明文本

告警规则


{
  "name": "High Error Rate",
  "tags": ["logging", "critical"],
  "trigger": {
    "schedule": { "interval": "5m" }
  },
  "input": {
    "search": {
      "request": {
        "indices": ["kubernetes-*"],
        "body": {
          "size": 0,
          "query": {
            "bool": {
              "must": [
                { "range": { "@timestamp": { "gte": "now-5m" } } },
                { "match": { "log": "ERROR" } }
              ]
            }
          }
        }
      }
    }
  },
  "condition": {
    "compare": {
      "ctx.payload.hits.total.value": {
        "gt": 100
      }
    }
  }
}

Fluentd 部署(替代 Fluent Bit)

适用于需要更复杂数据处理的场景。

Kubernetes Deployment


apiVersion: apps/v1
kind: Deployment
metadata:
  name: fluentd
  namespace: logging
spec:
  selector:
    matchLabels:
      app: fluentd
  replicas: 2
  template:
    metadata:
      labels:
        app: fluentd
    spec:
      containers:
        - name: fluentd
          image: fluent/fluentd:v1.16.2
          env:
            - name: FLUENT_ELASTICSEARCH_HOST
              value: "efk-es-es-http.logging"
            - name: FLUENT_ELASTICSEARCH_PORT
              value: "9200"
          resources:
            limits:
              memory: 512Mi
              cpu: 500m
            requests:
              memory: 256Mi
              cpu: 100m
          volumeMounts:
            - name: config
              mountPath: /etc/fluent/config.d
      volumes:
        - name: config
          configMap:
            name: fluentd-config

Fluentd 配置文件


# /etc/fluent/config.d/filter.conf
<filter kubernetes.**>
  @type kubernetes_metadata
  kubernetes_url "https://kubernetes.default.svc:443"
  verify_ssl false
  cache_size 1000
  keep_tag false
</filter>

<filter docker.**>
  @type parser
  key_name log
  <parse>
    @type json
    time_format %Y-%m-%dT%H:%M:%S.%L
    time_type string
  </parse>
</filter>

# /etc/fluent/config.d/output.conf
<match kubernetes.**>
  @type elasticsearch
  host "#{ENV['FLUENT_ELASTICSEARCH_HOST']}"
  port "#{ENV['FLUENT_ELASTICSEARCH_PORT']}"
  logstash_format true
  logstash_prefix kubernetes
  include_tag_key true
  type_name _doc
  flush_interval 5s
  buffer_type file
  buffer_path /var/log/fluentd-buffers/kubernetes.buffer
  buffer_queue_full_action block
  buffer_chunk_limit 2M
  buffer_queue_limit 256
  max_retry_wait 30
  disable_retry_limit
</match>

日志安全

用户认证


# 创建只读用户
curl -X POST "localhost:9200/_security/user/reader" -H 'Content-Type: application/json' -u elastic:password -d'
{
  "password": "reader_password",
  "roles": ["kibana_read_only"],
  "enabled": true
}'

字段级安全


{
  "roles": [
    {
      "name": "log-reader",
      "cluster": ["monitor"],
      "index": [
        {
          "names": ["kubernetes-*"],
          "privileges": ["read"]
        }
      ]
    }
  ]
}

传输加密


# elasticsearch.yml
xpack.security.http.ssl.enabled: true
xpack.security.http.ssl.key: /usr/share/elasticsearch/config/certs/node.key
xpack.security.http.ssl.certificate: /usr/share/elasticsearch/config/certs/node.crt
xpack.security.http.ssl.certificate_authorities: /usr/share/elasticsearch/config/certs/ca.crt

xpack.security.transport.ssl.enabled: true
xpack.security.transport.ssl.verification_mode: certificate
xpack.security.transport.ssl.key: /usr/share/elasticsearch/config/certs/node.key
xpack.security.transport.ssl.certificate: /usr/share/elasticsearch/config/certs/node.crt
xpack.security.transport.ssl.certificate_authorities: /usr/share/elasticsearch/config/certs/ca.crt

性能调优

Elasticsearch JVM 调优


# 推荐堆大小:物理内存的 50%,最大 32GB
-Xms16g
-Xmx16g
-XX:+UseG1GC
-XX:MaxGCPauseMillis=200
-XX:InitiatingHeapOccupancyPercent=75

Fluent Bit 性能优化


[SERVICE]
    Flush         1          # 更频繁 flush
    Log_Level     error      # 减少日志输出
    Parsers_File  parsers.conf

[INPUT]
    Name           tail
    Path           /var/log/containers/*.log
    DB             /var/log/flb.db
    Mem_Buf_Limit  100MB      # 增加缓冲区
    Buffer_Chunk_Size  1MB
    Buffer_Max_Size   5MB

资源规划参考

集群规模 Daily 日志量 ES 节点数 内存配置 存储
小型(<10 nodes) < 50GB 3 16GB 500GB SSD
中型(10-50 nodes) 50-200GB 5 32GB 2TB SSD
大型(>50 nodes) > 200GB 10+ 64GB 10TB SSD

常见问题

Fluent Bit 无法收集日志


# 检查 DaemonSet 运行状态
kubectl get ds -n logging

# 检查 Fluent Bit 日志
kubectl logs -n logging -l app=fluent-bit --tail=100

# 常见原因:
# 1. 权限不足 → 检查 ServiceAccount 和 RBAC
# 2. ConfigMap 挂载失败 → 检查 volumes 配置
# 3. Elasticsearch 不可达 → 检查 ES 服务状态

Elasticsearch 集群 RED/UNASSIGNED


# 查看分片分配状态
curl -X GET "localhost:9200/_cat/shards?v"

# 查看未分配原因
curl -X GET "localhost:9200/_cluster/allocation/explain?pretty"

# 常见处理:
# 1. 磁盘空间不足 → 增加节点或清理数据
# 2. 副本无法分配 → 检查节点健康状态
# 3. 手动重新分配
curl -X POST "localhost:9200/_cluster/reroute?pretty" -d'
{
  "commands": [
    {
      "allocate_replica": {
        "index": "kubernetes-000001",
        "shard": 0,
        "node": "node-2"
      }
    }
  ]
}'

Kibana 无法连接 Elasticsearch


# 检查认证信息
kubectl get secret -n logging efk-es-elastic-user -o jsonpath='{.data.elastic}' | base64 -d

# 验证连接
curl -u "elastic:<password>" "http://efk-es-es-http:9200/_cluster/health?pretty"

Loki vs EFK 选择指南

© 2026 灵王 OPS · 运维文档集合 基于 Cloudflare Pages 构建
场景 推荐方案
Kubernetes 原生日志,成本优先 Loki + Promtail
频繁全文搜索,复杂查询 EFK
高吞吐量(>100GB/天) EFK (Fluentd)
轻量级嵌入,现有 Prometheus 生态 Loki
多租户日志隔离需求 EFK (X-Pack Security)
日志 + 指标 + 链路追踪统一平台 Grafana Stack (Loki/Prometheus/Tempo)