RAG 检索增强生成

RAG(Retrieval-Augmented Generation)让大模型基于私有知识库回答问题,解决模型知识过时或缺乏领域知识的问题。

1. RAG 架构

用户问题 → Embedding 模型 → 向量数据库检索(Top-K)→ 组装 Prompt → 大模型生成答案

2. 向量数据库选型

数据库 特点 适用场景
Milvus 开源,分布式,高性能 大规模生产部署
Qdrant Rust 实现,高效 个人/中小规模
Chroma 轻量,Python 原生 快速原型
Pinecone 云托管,免运维 企业级 SaaS

3. Milvus 部署


# 单机部署(Docker Compose)
wget https://github.com/milvus-io/milvus/releases/download/v2.4.0/milvus-standalone-docker-compose.yml -O docker-compose.yml
docker-compose up -d

# 验证
curl http://localhost:9091/healthz

4. Embedding 与分块


from sentence_transformers import SentenceTransformer

# 推荐模型(中文效果好)
model = SentenceTransformer('BAAI/bge-large-zh-v1.5')

# 向量化
vectors = model.encode(documents)

# 文档分块策略
from langchain.text_splitter import RecursiveCharacterTextSplitter

def chunk_documents(text, chunk_size=500, chunk_overlap=50):
    splitter = RecursiveCharacterTextSplitter(
        chunk_size=chunk_size,
        chunk_overlap=chunk_overlap,
        separators=["

", "
", "。", "!", "?", " ", ""],
    )
    return splitter.split_text(text)

5. 检索与生成


from pymilvus import MilvusClient
from sentence_transformers import SentenceTransformer
import openai

# 1. 检索
client = MilvusClient("http://localhost:19530")
model = SentenceTransformer('BAAI/bge-large-zh-v1.5')

query = "如何配置 Prometheus 告警规则?"
query_vec = model.encode([query])

results = client.search(
    collection_name="ops_docs",
    anns_field="vector",
    data=query_vec,
    limit=5,
)

context = "
".join([r["entity"]["text"] for r in results[0]])

# 2. 组装 Prompt + 生成
prompt = '''基于以下运维文档回答问题。如果文档中没有相关信息,请说明"文档中未找到相关内容"。

--- 文档内容 ---
{context}
--- 文档结束 ---

问题:{query}
答案:'''

response = client_llm.chat.completions.create(
    model="deepseek-chat",
    messages=[{"role": "user", "content": prompt}],
    temperature=0.3,
)
print(response.choices[0].message.content)

6. 下一步