向量索引与检索调参

返回工程实践

Annoy、HNSW、IVF 等 ANN 近似最近邻 把检索从暴力 O(n) 降到可接受的亚线性时间;工程上要在 P99 延迟、召回@K、内存、建索引耗时 之间折中。下面给出 NumPy 暴力对比、FAISS、Milvus(pymilvus 2.x)、Qdrant 的可跑骨架(向量维、距离类型需与你的 Embedding 一致)。


相似度与归一化(NumPy)

import numpy as np
 
def l2_normalize(X: np.ndarray) -> np.ndarray:
    """行向量 L2 归一化;归一化后内积等价于余弦相似度。"""
    norms = np.linalg.norm(X, axis=1, keepdims=True)
    norms = np.maximum(norms, 1e-12)
    return X / norms
 
 
def cosine_scores(query: np.ndarray, corpus: np.ndarray) -> np.ndarray:
    """query: (d,), corpus: (n, d) -> scores (n,)"""
    q = query / (np.linalg.norm(query) + 1e-12)
    c = l2_normalize(corpus)
    return c @ q
 
 
# 示例
d = 128
rng = np.random.default_rng(0)
query = rng.standard_normal(d)
corpus = rng.standard_normal((10_000, d))
top_idx = np.argsort(-cosine_scores(query, corpus))[:5]
print("top-5 indices", top_idx)

FAISS:IVF + HNSW 入门

import faiss
import numpy as np
 
rng = np.random.default_rng(42)
dim = 64
nb = 50_000  # 库大小
nq = 10
xb = rng.standard_normal((nb, dim)).astype("float32")
xq = rng.standard_normal((nq, dim)).astype("float32")
faiss.normalize_L2(xb)
faiss.normalize_L2(xq)
 
# --- IVF_FLAT:大数据量时常用,靠 nprobe 调召回/延迟 ---
nlist = int(np.sqrt(nb))
quantizer = faiss.IndexFlatIP(dim)  # 内积;已 L2 normalize 时等价余弦
index_ivf = faiss.IndexIVFFlat(quantizer, dim, nlist, faiss.METRIC_INNER_PRODUCT)
index_ivf.train(xb)
index_ivf.add(xb)
index_ivf.nprobe = 16  # 增大更准更慢
D, I = index_ivf.search(xq, k=10)
print("IVF top-3 ids row0:", I[0, :3])
 
# --- HNSW:图索引,查询延迟通常更低 ---
index_hnsw = faiss.IndexHNSWFlat(dim, faiss.METRIC_INNER_PRODUCT)
index_hnsw.hnsw.efConstruction = 128
index_hnsw.add(xb)
index_hnsw.hnsw.efSearch = 64  # 查询阶段;越大越准越慢
D2, I2 = index_hnsw.search(xq, k=10)
print("HNSW top-3 ids row0:", I2[0, :3])
参数作用(经验)
nlist(IVF)桶数,常见 sqrt(N)4*sqrt(N);过大训练慢
nprobe查询探查桶数;↑ 召回 & 延迟
efConstruction / efSearch(HNSW)构图与搜索宽度;↑ 质量 & 建索引/查询成本

Milvus(pymilvus 2.x 骨架)

以下演示:连接 → 建 collection → 建 HNSW 索引 → 插入 → 检索。字段名、枚举请以 pip install pymilvus 对应版本文档为准。

from pymilvus import connections, FieldSchema, CollectionSchema, DataType, Collection, utility
 
connections.connect(alias="default", host="localhost", port="19530")
 
dim = 128
collection_name = "demo_chunks"
if utility.has_collection(collection_name):
    utility.drop_collection(collection_name)
 
fields = [
    FieldSchema(name="id", dtype=DataType.INT64, is_primary=True, auto_id=True),
    FieldSchema(name="doc_id", dtype=DataType.VARCHAR, max_length=64),
    FieldSchema(name="embedding", dtype=DataType.FLOAT_VECTOR, dim=dim),
]
schema = CollectionSchema(fields, description="chunk vectors")
coll = Collection(name=collection_name, schema=schema)
 
index_params = {
    "index_type": "HNSW",
    "metric_type": "COSINE",
    "params": {"M": 16, "efConstruction": 200},
}
coll.create_index(field_name="embedding", index_params=index_params)
 
import random
vectors = [[random.random() for _ in range(dim)] for _ in range(200)]
doc_ids = [f"doc-{i // 50}" for i in range(200)]
coll.insert([doc_ids, vectors])
coll.flush()
coll.load()
 
search_params = {"metric_type": "COSINE", "params": {"ef": 64}}
qvec = [[random.random() for _ in range(dim)]]
results = coll.search(
    data=qvec,
    anns_field="embedding",
    param=search_params,
    limit=5,
    output_fields=["doc_id"],
)
for hit in results[0]:
    print(hit.id, hit.score, hit.entity.get("doc_id"))
 
connections.disconnect("default")

调参提示M/efConstruction 影响 建索引时间与内存ef(查询)影响 延迟与召回。变更后应用 同一套黄金集 回归(见 评测集与线上回归)。


Qdrant(qdrant-client 本地 / 远程)

from qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams, PointStruct
 
client = QdrantClient(url="http://localhost:6333")
collection = "demo_rag"
dim = 64
 
client.recreate_collection(
    collection_name=collection,
    vectors_config=VectorParams(size=dim, distance=Distance.COSINE),
)
 
points = [
    PointStruct(id=i, vector=[0.1] * dim, payload={"doc_id": f"d{i // 10}", "text": f"chunk {i}"})
    for i in range(30)
]
client.upload_points(collection_name=collection, points=points)
 
hits = client.query_points(
    collection_name=collection,
    query=[0.12] * dim,
    limit=5,
    with_payload=True,
).points
for h in hits:
    print(h.id, h.score, h.payload)

HNSW 索引参数(创建 collection 后可通过 create_payload_index / 服务端配置调整):mef_construct 与 Milvus/FAISS 思路同族——更大 → 更准更占资源


调参检查清单

症状先查
延迟尖刺nprobe/ef 是否过大、网络、是否单分片热点
召回低增大探查参数、换 embedding、RAG Hybrid、重排
内存爆PQ/SQ、降维(需重嵌入)、分片、冷热集合
分数不可比是否换过模型或未统一归一化

相关文档