Elasticsearch

Elasticsearch（简称 ES）是基于 Lucene 的分布式搜索与分析引擎，支持全文搜索、结构化查询、聚合分析，是 ELK Stack（Elasticsearch + Logstash + Kibana）的核心组件。

核心概念

ES 概念	对应关系型数据库	说明
Index（索引）	数据库 / 表	存储一类文档的地方
Document（文档）	行	JSON 格式的数据单元
Field（字段）	列	文档中的键值对
Shard（分片）	无对应	索引水平拆分的最小单元
Replica（副本）	无对应	分片的备份，提高可用性

分布式架构

ES 集群
  ├── Node 1（Master）
  │     ├── Shard 0（Primary）
  │     └── Shard 1（Replica）
  ├── Node 2
  │     ├── Shard 1（Primary）
  │     └── Shard 0（Replica）
  └── Node 3
        └── Shard 2（Primary）

Master 节点：管理集群元数据（索引创建/删除、节点加入/离开）
Data 节点：存储数据和执行搜索
主分片：数据写入的目标，创建索引后数量不可更改
副本分片：主分片的备份，可扩展读性能

数据类型

类型	说明
`text`	全文搜索字段，会被分词
`keyword`	精确匹配字段，不分词（邮箱、标签、状态等）
`integer` / `long` / `float`	数值
`boolean`	布尔
`date`	日期，支持多种格式
`object`	嵌套对象
`nested`	嵌套对象数组，独立索引每个子文档
`geo_point`	地理坐标（经纬度）
`dense_vector`	稠密向量，用于向量相似度搜索

Mapping（映射）

Mapping 定义索引的字段类型，类似建表 DDL。

PUT /articles
{
  "mappings": {
    "properties": {
      "title": {
        "type": "text",
        "analyzer": "ik_max_word",
        "search_analyzer": "ik_smart"
      },
      "content": { "type": "text", "analyzer": "ik_max_word" },
      "author":  { "type": "keyword" },
      "tags":    { "type": "keyword" },
      "view_count": { "type": "integer" },
      "created_at": {
        "type": "date",
        "format": "yyyy-MM-dd HH:mm:ss"
      }
    }
  },
  "settings": {
    "number_of_shards": 3,
    "number_of_replicas": 1
  }
}

text vs keyword：需要分词搜索用 text；精确过滤、聚合、排序用 keyword。一个字段可以同时定义两种类型（multi-field）。

CRUD 操作

// 指定 ID 写入
PUT /articles/_doc/1
{ "title": "Elasticsearch 入门", "author": "Alice", "view_count": 100 }
 
// 自动生成 ID
POST /articles/_doc
{ "title": "深入 ES 分词", "author": "Bob" }
 
// 局部更新
POST /articles/_update/1
{ "doc": { "view_count": 200 } }
 
// 脚本更新（计数器）
POST /articles/_update/1
{
  "script": {
    "source": "ctx._source.view_count += params.n",
    "params": { "n": 10 }
  }
}
 
// 查询
GET /articles/_doc/1
 
// 删除
DELETE /articles/_doc/1

查询 DSL

全文搜索

GET /articles/_search
{
  "query": {
    "match": { "title": "Elasticsearch 入门" }
  }
}

精确匹配

// 单值
{ "query": { "term":  { "author": "Alice" } } }
 
// 多值（类似 IN）
{ "query": { "terms": { "tags": ["elasticsearch", "kibana"] } } }

范围查询

{
  "query": {
    "range": {
      "view_count": { "gte": 100, "lte": 500 }
    }
  }
}

Bool 组合查询

{
  "query": {
    "bool": {
      "must":     [ { "match": { "title": "elasticsearch" } } ],
      "filter":   [ { "term": { "author": "Alice" } } ],
      "must_not": [ { "term": { "tags": "draft" } } ],
      "should":   [ { "term": { "tags": "热门" } } ]
    }
  }
}

子句	说明
`must`	必须匹配，影响相关性得分
`filter`	必须匹配，不影响得分（有缓存，性能好）
`must_not`	必须不匹配
`should`	至少匹配一个，增加相关性得分

分页与排序

{
  "query": { "match_all": {} },
  "sort":  [ { "view_count": "desc" } ],
  "from":  0,
  "size":  20,
  "_source": ["title", "author", "view_count"]
}

深分页（from 很大）性能差，推荐用 search_after。

search_after 深分页

// 第一页
{ "size": 20, "sort": [{ "created_at": "desc" }, { "_id": "asc" }] }
 
// 后续页（传入上一页最后一条的 sort 值）
{
  "size": 20,
  "sort": [{ "created_at": "desc" }, { "_id": "asc" }],
  "search_after": ["2024-01-15 10:30:00", "abc123"]
}

聚合（Aggregation）

Bucket 聚合 + 嵌套 Metric

{
  "size": 0,
  "aggs": {
    "by_author": {
      "terms": { "field": "author", "size": 10 },
      "aggs": {
        "avg_views": { "avg": { "field": "view_count" } }
      }
    }
  }
}

按时间分组

{
  "aggs": {
    "by_day": {
      "date_histogram": {
        "field": "created_at",
        "calendar_interval": "day",
        "format": "yyyy-MM-dd"
      }
    }
  }
}

统计指标

{
  "aggs": {
    "view_stats":     { "stats":       { "field": "view_count" } },
    "total_views":    { "sum":         { "field": "view_count" } },
    "unique_authors": { "cardinality": { "field": "author" } }
  }
}

分词器

中文分词推荐 IK 分词器：

模式	说明
`ik_max_word`	最细粒度切分，适合索引时使用
`ik_smart`	最粗粒度切分，适合搜索时使用

GET /_analyze
{
  "analyzer": "ik_smart",
  "text": "Elasticsearch 分布式搜索引擎"
}

索引生命周期管理（ILM）

日志类索引常用 ILM 自动管理冷热数据：

PUT /_ilm/policy/logs_policy
{
  "policy": {
    "phases": {
      "hot":    { "actions": { "rollover": { "max_size": "50gb", "max_age": "7d" } } },
      "warm":   { "min_age": "7d",  "actions": { "shrink": { "number_of_shards": 1 } } },
      "cold":   { "min_age": "30d", "actions": { "freeze": {} } },
      "delete": { "min_age": "90d", "actions": { "delete": {} } }
    }
  }
}

Spring Boot 集成

<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-data-elasticsearch</artifactId>
</dependency>

spring:
  elasticsearch:
    uris: http://localhost:9200
    username: elastic
    password: secret

实体映射：

@Document(indexName = "articles")
public class ArticleDocument {
 
    @Id
    private String id;
 
    @Field(type = FieldType.Text, analyzer = "ik_max_word", searchAnalyzer = "ik_smart")
    private String title;
 
    @Field(type = FieldType.Keyword)
    private String author;
 
    @Field(type = FieldType.Integer)
    private Integer viewCount;
 
    @Field(type = FieldType.Date, format = DateFormat.date_time)
    private LocalDateTime createdAt;
}

Repository：

public interface ArticleRepository
        extends ElasticsearchRepository<ArticleDocument, String> {
    List<ArticleDocument> findByAuthor(String author);
}

复杂查询用 ElasticsearchOperations：

@Service
@RequiredArgsConstructor
public class ArticleSearchService {
 
    private final ElasticsearchOperations esOps;
 
    public SearchHits<ArticleDocument> search(String keyword, String author) {
        Query query = NativeQuery.builder()
            .withQuery(q -> q.bool(b -> b
                .must(m -> m.match(t -> t.field("title").query(keyword)))
                .filter(f -> f.term(t -> t.field("author").value(author)))
            ))
            .withSort(Sort.by(Sort.Direction.DESC, "viewCount"))
            .withPageable(PageRequest.of(0, 20))
            .build();
        return esOps.search(query, ArticleDocument.class);
    }
}

知识仓库

探索

Elasticsearch

Elasticsearch

核心概念

分布式架构

数据类型

Mapping（映射）

CRUD 操作

查询 DSL

全文搜索

精确匹配

范围查询

Bool 组合查询

分页与排序

search_after 深分页

聚合（Aggregation）

Bucket 聚合 + 嵌套 Metric

按时间分组

统计指标

分词器

索引生命周期管理（ILM）

Spring Boot 集成

相关

关系图谱

目录

反向链接