EFK / ELK 日志集群
概述
ELK(Elasticsearch + Logstash + Kibana)是经典的日志采集、存储、分析开源解决方案。EFK 将 Logstash 替换为 Fluentd/Fluent Bit,适用于 Kubernetes 环境。
组件介绍:
- Elasticsearch:分布式搜索引擎,负责日志存储与全文检索
- Logstash:数据处理管道,支持日志采集、过滤、转换(可选)
- Fluentd / Fluent Bit:轻量级日志收集器,适合 K8s 环境
- Kibana:可视化前端,Elasticsearch 的图形化管理界面
架构
应用日志 → Fluentd/Fluent Bit → Elasticsearch ← Kibana
↑
Logstash(可选)
Fluent Bit 部署架构(K8s 环境推荐)
Pod Logs → Node-Level Fluent Bit → Elasticsearch
ConfigMap → DaemonSet(每个 Node 一个实例)
Elasticsearch 部署
Docker Compose 单节点
version: '3.8'
services:
elasticsearch:
image: docker.elastic.co/elasticsearch/elasticsearch:8.11.3
container_name: elasticsearch
environment:
- discovery.type=single-node
- xpack.security.enabled=false
- "ES_JAVA_OPTS=-Xms2g -Xmx2g"
ports:
- "9200:9200"
- "9300:9300"
volumes:
- es-data:/usr/share/elasticsearch/data
mem_limit: 4g
kibana:
image: docker.elastic.co/kibana/kibana:8.11.3
container_name: kibana
environment:
- ELASTICSEARCH_HOSTS=http://elasticsearch:9200
ports:
- "5601:5601"
depends_on:
- elasticsearch
volumes:
es-data:
Kubernetes 部署(ECK Operator)
# 安装 ECK Operator
kubectl apply -f https://download.elastic.co/downloads/eck/2.11.0/crds.yaml
kubectl apply -f https://download.elastic.co/downloads/eck/2.11.0/operator.yaml
# 部署 Elasticsearch 集群
cat <<EOF | kubectl apply -f -
apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
name: efk-es
namespace: logging
spec:
version: 8.11.3
nodeSets:
- name: default
count: 3
config:
node.store.allow_mmap: false
volumeClaimTemplates:
- metadata:
name: elasticsearch-data
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 100Gi
storageClassName: gp3
EOF
# 部署 Kibana
cat <<EOF | kubectl apply -f -
apiVersion: kibana.k8s.elastic.co/v1
kind: Kibana
metadata:
name: efk-kibana
namespace: logging
spec:
version: 8.11.3
count: 1
elasticsearchRef:
name: efk-es
EOF
生产环境集群配置
# elasticsearch.yml 关键配置
cluster.name: efk-cluster
node.name: ${HOSTNAME}
network.host: 0.0.0.0
discovery.seed_hosts: ["es-0", "es-1", "es-2"]
cluster.initial_master_nodes: ["es-0", "es-1", "es-2"]
# 内存锁定(生产环境)
bootstrap.memory_lock: true
# 熔断器
indices.breaker.total.limit: 60%
indices.breaker.fielddata.limit: 40%
# 副本策略
index.number_of_replicas: 1
index.auto_create_index: true
Fluent Bit 部署
Kubernetes DaemonSet 部署
apiVersion: v1
kind: ServiceAccount
metadata:
name: fluent-bit
namespace: logging
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: fluent-bit
rules:
- apiGroups: [""]
resources:
- namespaces
- pods
- pods/logs
verbs: ["get", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: fluent-bit
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: fluent-bit
subjects:
- kind: ServiceAccount
name: fluent-bit
namespace: logging
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: fluent-bit
namespace: logging
labels:
app: fluent-bit
spec:
selector:
matchLabels:
app: fluent-bit
template:
metadata:
labels:
app: fluent-bit
spec:
serviceAccountName: fluent-bit
containers:
- name: fluent-bit
image: cr.fluentbit.io/fluent/fluent-bit:2.2.2
imagePullPolicy: Always
ports:
- containerPort: 2020
env:
- name: ES_HOST
value: "efk-es-es-http.logging"
- name: ES_PORT
value: "9200"
- name: ES_USER
value: "elastic"
- name: ES_PASS
valueFrom:
secretKeyRef:
name: efk-es-elastic-user
key: elastic
volumeMounts:
- name: varlog
mountPath: /var/log
- name: varlibdockercontainers
mountPath: /var/lib/docker/containers
readOnly: true
- name: fluent-bit-config
mountPath: /fluent-bit/etc
volumes:
- name: varlog
hostPath:
path: /var/log
- name: varlibdockercontainers
hostPath:
path: /var/lib/docker/containers
- name: fluent-bit-config
configMap:
name: fluent-bit-config
Fluent Bit ConfigMap
apiVersion: v1
kind: ConfigMap
metadata:
name: fluent-bit-config
namespace: logging
data:
fluent-bit.conf: |
[SERVICE]
Flush 5
Log_Level info
Daemon off
Parsers_File parsers.conf
HTTP_Server On
HTTP_Listen 0.0.0.0
HTTP_Port 2020
Health_Check On
[INPUT]
Name tail
Path /var/log/containers/*.log
Parser docker
Tag kube.*
Refresh_Interval 5
Mem_Buf_Limit 50MB
Skip_Long_Lines On
[INPUT]
Name systemd
Tag host.*
Systemd_Filter _SYSTEMD_UNIT=docker.service
DB /var/log/flb_lib.db
[FILTER]
Name kubernetes
Match kube.*
Kube_URL https://kubernetes.default.svc:443
Kube_CA_File /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
Kube_Token_File /var/run/secrets/kubernetes.io/serviceaccount/token
Kube_Tag_Prefix kube.var.log.containers.
Merge_Log On
Merge_Log_Key log_processed
K8S-Logging.Parser On
K8S-Logging.Exclude On
[OUTPUT]
Name es
Match kube.*
Host ${ES_HOST}
Port ${ES_PORT}
HTTP_User ${ES_USER}
HTTP_Passwd ${ES_PASS}
Logstash_Format On
Logstash_Prefix kubernetes
Retry_Limit False
Replace_Dots On
Suppress_Type_Name On
[OUTPUT]
Name stdout
Match *
Format json
parsers.conf: |
[PARSER]
Name docker
Format json
Time_Key time
Time_Format %Y-%m-%dT%H:%M:%S.%L
Time_Keep On
[PARSER]
Name json
Format json
Time_Key timestamp
Time_Format %Y-%m-%dT%H:%M:%S.%LZ
[PARSER]
Name syslog
Format regex
Regex ^<(?<priority>[0-9]+)>(?<time>[^ ]* {1,2}[^ ]* [^ ]*) (?<host>[^ ]*) (?<ident>[^ ]*?)[\[](?<pid>[0-9]+)\]?: (?<message>.*)$
Time_Key time
Time_Format %b %d %H:%M:%S
Elasticsearch 索引管理
索引模板配置
# 创建索引模板
curl -X PUT "localhost:9200/_index_template/efk-template" -H 'Content-Type: application/json' -d'
{
"index_patterns": ["kubernetes-*"],
"template": {
"settings": {
"number_of_shards": 2,
"number_of_replicas": 1,
"index.lifecycle.name": "efk-policy",
"index.lifecycle.rollover_alias": "kubernetes"
},
"mappings": {
"properties": {
"@timestamp": { "type": "date" },
"log": { "type": "text", "analyzer": "standard" },
"stream": { "type": "keyword" },
"kubernetes": {
"properties": {
"pod_name": { "type": "keyword" },
"namespace_name": { "type": "keyword" },
"container_name": { "type": "keyword" },
"labels": { "type": "object", "enabled": false }
}
}
}
}
}
}'
ILM(索引生命周期管理)
curl -X PUT "localhost:9200/_ilm/policy/efk-policy" -H 'Content-Type: application/json' -d'
{
"policy": {
"phases": {
"hot": {
"min_age": "0ms",
"actions": {
"rollover": {
"max_age": "7d",
"max_size": "50gb"
},
"set_priority": { "priority": 100 }
}
},
"warm": {
"min_age": "7d",
"actions": {
"shrink": { "number_of_shards": 1 },
"forcemerge": { "max_num_segments": 1 },
"set_priority": { "priority": 50 }
}
},
"delete": {
"min_age": "30d",
"actions": {
"delete": {}
}
}
}
}
}'
Kibana 配置与使用
索引模式创建
1. Kibana → Management → Index Patterns → Create index pattern
2. 输入 kubernetes-* 作为索引模式
3. 选择 @timestamp 作为时间字段
4. 完成创建
Discover 日志查询
| 功能 | 操作 |
|---|---|
| 全文搜索 | 直接输入关键字 |
| 字段过滤 | kubernetes.namespace_name:"production" |
| 时间范围 | 右上角时间选择器 |
| 字段筛选 | 添加 filter 条件 |
| 保存查询 | Save → 输入名称和描述 |
可视化仪表盘
常用可视化类型:
- Pie Chart: 按 Namespace/Container 分组统计
- Data Table: Top N 日志量来源
- Tag Cloud: 高频错误关键字
- Markdown: 自定义说明文本
告警规则
{
"name": "High Error Rate",
"tags": ["logging", "critical"],
"trigger": {
"schedule": { "interval": "5m" }
},
"input": {
"search": {
"request": {
"indices": ["kubernetes-*"],
"body": {
"size": 0,
"query": {
"bool": {
"must": [
{ "range": { "@timestamp": { "gte": "now-5m" } } },
{ "match": { "log": "ERROR" } }
]
}
}
}
}
}
},
"condition": {
"compare": {
"ctx.payload.hits.total.value": {
"gt": 100
}
}
}
}
Fluentd 部署(替代 Fluent Bit)
适用于需要更复杂数据处理的场景。
Kubernetes Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: fluentd
namespace: logging
spec:
selector:
matchLabels:
app: fluentd
replicas: 2
template:
metadata:
labels:
app: fluentd
spec:
containers:
- name: fluentd
image: fluent/fluentd:v1.16.2
env:
- name: FLUENT_ELASTICSEARCH_HOST
value: "efk-es-es-http.logging"
- name: FLUENT_ELASTICSEARCH_PORT
value: "9200"
resources:
limits:
memory: 512Mi
cpu: 500m
requests:
memory: 256Mi
cpu: 100m
volumeMounts:
- name: config
mountPath: /etc/fluent/config.d
volumes:
- name: config
configMap:
name: fluentd-config
Fluentd 配置文件
# /etc/fluent/config.d/filter.conf
<filter kubernetes.**>
@type kubernetes_metadata
kubernetes_url "https://kubernetes.default.svc:443"
verify_ssl false
cache_size 1000
keep_tag false
</filter>
<filter docker.**>
@type parser
key_name log
<parse>
@type json
time_format %Y-%m-%dT%H:%M:%S.%L
time_type string
</parse>
</filter>
# /etc/fluent/config.d/output.conf
<match kubernetes.**>
@type elasticsearch
host "#{ENV['FLUENT_ELASTICSEARCH_HOST']}"
port "#{ENV['FLUENT_ELASTICSEARCH_PORT']}"
logstash_format true
logstash_prefix kubernetes
include_tag_key true
type_name _doc
flush_interval 5s
buffer_type file
buffer_path /var/log/fluentd-buffers/kubernetes.buffer
buffer_queue_full_action block
buffer_chunk_limit 2M
buffer_queue_limit 256
max_retry_wait 30
disable_retry_limit
</match>
日志安全
用户认证
# 创建只读用户
curl -X POST "localhost:9200/_security/user/reader" -H 'Content-Type: application/json' -u elastic:password -d'
{
"password": "reader_password",
"roles": ["kibana_read_only"],
"enabled": true
}'
字段级安全
{
"roles": [
{
"name": "log-reader",
"cluster": ["monitor"],
"index": [
{
"names": ["kubernetes-*"],
"privileges": ["read"]
}
]
}
]
}
传输加密
# elasticsearch.yml
xpack.security.http.ssl.enabled: true
xpack.security.http.ssl.key: /usr/share/elasticsearch/config/certs/node.key
xpack.security.http.ssl.certificate: /usr/share/elasticsearch/config/certs/node.crt
xpack.security.http.ssl.certificate_authorities: /usr/share/elasticsearch/config/certs/ca.crt
xpack.security.transport.ssl.enabled: true
xpack.security.transport.ssl.verification_mode: certificate
xpack.security.transport.ssl.key: /usr/share/elasticsearch/config/certs/node.key
xpack.security.transport.ssl.certificate: /usr/share/elasticsearch/config/certs/node.crt
xpack.security.transport.ssl.certificate_authorities: /usr/share/elasticsearch/config/certs/ca.crt
性能调优
Elasticsearch JVM 调优
# 推荐堆大小:物理内存的 50%,最大 32GB
-Xms16g
-Xmx16g
-XX:+UseG1GC
-XX:MaxGCPauseMillis=200
-XX:InitiatingHeapOccupancyPercent=75
Fluent Bit 性能优化
[SERVICE]
Flush 1 # 更频繁 flush
Log_Level error # 减少日志输出
Parsers_File parsers.conf
[INPUT]
Name tail
Path /var/log/containers/*.log
DB /var/log/flb.db
Mem_Buf_Limit 100MB # 增加缓冲区
Buffer_Chunk_Size 1MB
Buffer_Max_Size 5MB
资源规划参考
| 集群规模 | Daily 日志量 | ES 节点数 | 内存配置 | 存储 |
|---|---|---|---|---|
| 小型(<10 nodes) | < 50GB | 3 | 16GB | 500GB SSD |
| 中型(10-50 nodes) | 50-200GB | 5 | 32GB | 2TB SSD |
| 大型(>50 nodes) | > 200GB | 10+ | 64GB | 10TB SSD |
常见问题
Fluent Bit 无法收集日志
# 检查 DaemonSet 运行状态
kubectl get ds -n logging
# 检查 Fluent Bit 日志
kubectl logs -n logging -l app=fluent-bit --tail=100
# 常见原因:
# 1. 权限不足 → 检查 ServiceAccount 和 RBAC
# 2. ConfigMap 挂载失败 → 检查 volumes 配置
# 3. Elasticsearch 不可达 → 检查 ES 服务状态
Elasticsearch 集群 RED/UNASSIGNED
# 查看分片分配状态
curl -X GET "localhost:9200/_cat/shards?v"
# 查看未分配原因
curl -X GET "localhost:9200/_cluster/allocation/explain?pretty"
# 常见处理:
# 1. 磁盘空间不足 → 增加节点或清理数据
# 2. 副本无法分配 → 检查节点健康状态
# 3. 手动重新分配
curl -X POST "localhost:9200/_cluster/reroute?pretty" -d'
{
"commands": [
{
"allocate_replica": {
"index": "kubernetes-000001",
"shard": 0,
"node": "node-2"
}
}
]
}'
Kibana 无法连接 Elasticsearch
# 检查认证信息
kubectl get secret -n logging efk-es-elastic-user -o jsonpath='{.data.elastic}' | base64 -d
# 验证连接
curl -u "elastic:<password>" "http://efk-es-es-http:9200/_cluster/health?pretty"
Loki vs EFK 选择指南
| 场景 | 推荐方案 |
|---|---|
| Kubernetes 原生日志,成本优先 | Loki + Promtail |
| 频繁全文搜索,复杂查询 | EFK |
| 高吞吐量(>100GB/天) | EFK (Fluentd) |
| 轻量级嵌入,现有 Prometheus 生态 | Loki |
| 多租户日志隔离需求 | EFK (X-Pack Security) |
| 日志 + 指标 + 链路追踪统一平台 | Grafana Stack (Loki/Prometheus/Tempo) |