向量索引与检索调参
→ 返回工程实践
Annoy、HNSW、IVF 等 ANN 近似最近邻 把检索从暴力 O(n) 降到可接受的亚线性时间;工程上要在 P99 延迟、召回@K、内存、建索引耗时 之间折中。下面给出 NumPy 暴力对比、FAISS、Milvus(pymilvus 2.x)、Qdrant 的可跑骨架(向量维、距离类型需与你的 Embedding 一致)。
相似度与归一化(NumPy)
import numpy as np
def l2_normalize(X: np.ndarray) -> np.ndarray:
"""行向量 L2 归一化;归一化后内积等价于余弦相似度。"""
norms = np.linalg.norm(X, axis=1, keepdims=True)
norms = np.maximum(norms, 1e-12)
return X / norms
def cosine_scores(query: np.ndarray, corpus: np.ndarray) -> np.ndarray:
"""query: (d,), corpus: (n, d) -> scores (n,)"""
q = query / (np.linalg.norm(query) + 1e-12)
c = l2_normalize(corpus)
return c @ q
# 示例
d = 128
rng = np.random.default_rng(0)
query = rng.standard_normal(d)
corpus = rng.standard_normal((10_000, d))
top_idx = np.argsort(-cosine_scores(query, corpus))[:5]
print("top-5 indices", top_idx)FAISS:IVF + HNSW 入门
import faiss
import numpy as np
rng = np.random.default_rng(42)
dim = 64
nb = 50_000 # 库大小
nq = 10
xb = rng.standard_normal((nb, dim)).astype("float32")
xq = rng.standard_normal((nq, dim)).astype("float32")
faiss.normalize_L2(xb)
faiss.normalize_L2(xq)
# --- IVF_FLAT:大数据量时常用,靠 nprobe 调召回/延迟 ---
nlist = int(np.sqrt(nb))
quantizer = faiss.IndexFlatIP(dim) # 内积;已 L2 normalize 时等价余弦
index_ivf = faiss.IndexIVFFlat(quantizer, dim, nlist, faiss.METRIC_INNER_PRODUCT)
index_ivf.train(xb)
index_ivf.add(xb)
index_ivf.nprobe = 16 # 增大更准更慢
D, I = index_ivf.search(xq, k=10)
print("IVF top-3 ids row0:", I[0, :3])
# --- HNSW:图索引,查询延迟通常更低 ---
index_hnsw = faiss.IndexHNSWFlat(dim, faiss.METRIC_INNER_PRODUCT)
index_hnsw.hnsw.efConstruction = 128
index_hnsw.add(xb)
index_hnsw.hnsw.efSearch = 64 # 查询阶段;越大越准越慢
D2, I2 = index_hnsw.search(xq, k=10)
print("HNSW top-3 ids row0:", I2[0, :3])| 参数 | 作用(经验) |
|---|---|
nlist(IVF) | 桶数,常见 sqrt(N)~4*sqrt(N);过大训练慢 |
nprobe | 查询探查桶数;↑ 召回 & 延迟 |
efConstruction / efSearch(HNSW) | 构图与搜索宽度;↑ 质量 & 建索引/查询成本 |
Milvus(pymilvus 2.x 骨架)
以下演示:连接 → 建 collection → 建 HNSW 索引 → 插入 → 检索。字段名、枚举请以 pip install pymilvus 对应版本文档为准。
from pymilvus import connections, FieldSchema, CollectionSchema, DataType, Collection, utility
connections.connect(alias="default", host="localhost", port="19530")
dim = 128
collection_name = "demo_chunks"
if utility.has_collection(collection_name):
utility.drop_collection(collection_name)
fields = [
FieldSchema(name="id", dtype=DataType.INT64, is_primary=True, auto_id=True),
FieldSchema(name="doc_id", dtype=DataType.VARCHAR, max_length=64),
FieldSchema(name="embedding", dtype=DataType.FLOAT_VECTOR, dim=dim),
]
schema = CollectionSchema(fields, description="chunk vectors")
coll = Collection(name=collection_name, schema=schema)
index_params = {
"index_type": "HNSW",
"metric_type": "COSINE",
"params": {"M": 16, "efConstruction": 200},
}
coll.create_index(field_name="embedding", index_params=index_params)
import random
vectors = [[random.random() for _ in range(dim)] for _ in range(200)]
doc_ids = [f"doc-{i // 50}" for i in range(200)]
coll.insert([doc_ids, vectors])
coll.flush()
coll.load()
search_params = {"metric_type": "COSINE", "params": {"ef": 64}}
qvec = [[random.random() for _ in range(dim)]]
results = coll.search(
data=qvec,
anns_field="embedding",
param=search_params,
limit=5,
output_fields=["doc_id"],
)
for hit in results[0]:
print(hit.id, hit.score, hit.entity.get("doc_id"))
connections.disconnect("default")调参提示:M/efConstruction 影响 建索引时间与内存;ef(查询)影响 延迟与召回。变更后应用 同一套黄金集 回归(见 评测集与线上回归)。
Qdrant(qdrant-client 本地 / 远程)
from qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams, PointStruct
client = QdrantClient(url="http://localhost:6333")
collection = "demo_rag"
dim = 64
client.recreate_collection(
collection_name=collection,
vectors_config=VectorParams(size=dim, distance=Distance.COSINE),
)
points = [
PointStruct(id=i, vector=[0.1] * dim, payload={"doc_id": f"d{i // 10}", "text": f"chunk {i}"})
for i in range(30)
]
client.upload_points(collection_name=collection, points=points)
hits = client.query_points(
collection_name=collection,
query=[0.12] * dim,
limit=5,
with_payload=True,
).points
for h in hits:
print(h.id, h.score, h.payload)HNSW 索引参数(创建 collection 后可通过 create_payload_index / 服务端配置调整):m、ef_construct 与 Milvus/FAISS 思路同族——更大 → 更准更占资源。
调参检查清单
| 症状 | 先查 |
|---|---|
| 延迟尖刺 | nprobe/ef 是否过大、网络、是否单分片热点 |
| 召回低 | 增大探查参数、换 embedding、RAG Hybrid、重排 |
| 内存爆 | PQ/SQ、降维(需重嵌入)、分片、冷热集合 |
| 分数不可比 | 是否换过模型或未统一归一化 |