Elasticsearch

返回 搜索引擎

Elasticsearch(简称 ES)是基于 Lucene 的分布式搜索与分析引擎,支持全文搜索、结构化查询、聚合分析,是 ELK Stack(Elasticsearch + Logstash + Kibana)的核心组件。


核心概念

ES 概念对应关系型数据库说明
Index(索引)数据库 / 表存储一类文档的地方
Document(文档)JSON 格式的数据单元
Field(字段)文档中的键值对
Shard(分片)无对应索引水平拆分的最小单元
Replica(副本)无对应分片的备份,提高可用性

分布式架构

ES 集群
  ├── Node 1(Master)
  │     ├── Shard 0(Primary)
  │     └── Shard 1(Replica)
  ├── Node 2
  │     ├── Shard 1(Primary)
  │     └── Shard 0(Replica)
  └── Node 3
        └── Shard 2(Primary)
  • Master 节点:管理集群元数据(索引创建/删除、节点加入/离开)
  • Data 节点:存储数据和执行搜索
  • 主分片:数据写入的目标,创建索引后数量不可更改
  • 副本分片:主分片的备份,可扩展读性能

数据类型

类型说明
text全文搜索字段,会被分词
keyword精确匹配字段,不分词(邮箱、标签、状态等)
integer / long / float数值
boolean布尔
date日期,支持多种格式
object嵌套对象
nested嵌套对象数组,独立索引每个子文档
geo_point地理坐标(经纬度)
dense_vector稠密向量,用于向量相似度搜索

Mapping(映射)

Mapping 定义索引的字段类型,类似建表 DDL。

PUT /articles
{
  "mappings": {
    "properties": {
      "title": {
        "type": "text",
        "analyzer": "ik_max_word",
        "search_analyzer": "ik_smart"
      },
      "content": { "type": "text", "analyzer": "ik_max_word" },
      "author":  { "type": "keyword" },
      "tags":    { "type": "keyword" },
      "view_count": { "type": "integer" },
      "created_at": {
        "type": "date",
        "format": "yyyy-MM-dd HH:mm:ss"
      }
    }
  },
  "settings": {
    "number_of_shards": 3,
    "number_of_replicas": 1
  }
}

text vs keyword:需要分词搜索用 text;精确过滤、聚合、排序用 keyword。一个字段可以同时定义两种类型(multi-field)。


CRUD 操作

// 指定 ID 写入
PUT /articles/_doc/1
{ "title": "Elasticsearch 入门", "author": "Alice", "view_count": 100 }
 
// 自动生成 ID
POST /articles/_doc
{ "title": "深入 ES 分词", "author": "Bob" }
 
// 局部更新
POST /articles/_update/1
{ "doc": { "view_count": 200 } }
 
// 脚本更新(计数器)
POST /articles/_update/1
{
  "script": {
    "source": "ctx._source.view_count += params.n",
    "params": { "n": 10 }
  }
}
 
// 查询
GET /articles/_doc/1
 
// 删除
DELETE /articles/_doc/1

查询 DSL

全文搜索

GET /articles/_search
{
  "query": {
    "match": { "title": "Elasticsearch 入门" }
  }
}

精确匹配

// 单值
{ "query": { "term":  { "author": "Alice" } } }
 
// 多值(类似 IN)
{ "query": { "terms": { "tags": ["elasticsearch", "kibana"] } } }

范围查询

{
  "query": {
    "range": {
      "view_count": { "gte": 100, "lte": 500 }
    }
  }
}

Bool 组合查询

{
  "query": {
    "bool": {
      "must":     [ { "match": { "title": "elasticsearch" } } ],
      "filter":   [ { "term": { "author": "Alice" } } ],
      "must_not": [ { "term": { "tags": "draft" } } ],
      "should":   [ { "term": { "tags": "热门" } } ]
    }
  }
}
子句说明
must必须匹配,影响相关性得分
filter必须匹配,不影响得分(有缓存,性能好)
must_not必须不匹配
should至少匹配一个,增加相关性得分

分页与排序

{
  "query": { "match_all": {} },
  "sort":  [ { "view_count": "desc" } ],
  "from":  0,
  "size":  20,
  "_source": ["title", "author", "view_count"]
}

深分页(from 很大)性能差,推荐用 search_after

search_after 深分页

// 第一页
{ "size": 20, "sort": [{ "created_at": "desc" }, { "_id": "asc" }] }
 
// 后续页(传入上一页最后一条的 sort 值)
{
  "size": 20,
  "sort": [{ "created_at": "desc" }, { "_id": "asc" }],
  "search_after": ["2024-01-15 10:30:00", "abc123"]
}

聚合(Aggregation)

Bucket 聚合 + 嵌套 Metric

{
  "size": 0,
  "aggs": {
    "by_author": {
      "terms": { "field": "author", "size": 10 },
      "aggs": {
        "avg_views": { "avg": { "field": "view_count" } }
      }
    }
  }
}

按时间分组

{
  "aggs": {
    "by_day": {
      "date_histogram": {
        "field": "created_at",
        "calendar_interval": "day",
        "format": "yyyy-MM-dd"
      }
    }
  }
}

统计指标

{
  "aggs": {
    "view_stats":     { "stats":       { "field": "view_count" } },
    "total_views":    { "sum":         { "field": "view_count" } },
    "unique_authors": { "cardinality": { "field": "author" } }
  }
}

分词器

中文分词推荐 IK 分词器

模式说明
ik_max_word最细粒度切分,适合索引时使用
ik_smart最粗粒度切分,适合搜索时使用
GET /_analyze
{
  "analyzer": "ik_smart",
  "text": "Elasticsearch 分布式搜索引擎"
}

索引生命周期管理(ILM)

日志类索引常用 ILM 自动管理冷热数据:

PUT /_ilm/policy/logs_policy
{
  "policy": {
    "phases": {
      "hot":    { "actions": { "rollover": { "max_size": "50gb", "max_age": "7d" } } },
      "warm":   { "min_age": "7d",  "actions": { "shrink": { "number_of_shards": 1 } } },
      "cold":   { "min_age": "30d", "actions": { "freeze": {} } },
      "delete": { "min_age": "90d", "actions": { "delete": {} } }
    }
  }
}

Spring Boot 集成

<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-data-elasticsearch</artifactId>
</dependency>
spring:
  elasticsearch:
    uris: http://localhost:9200
    username: elastic
    password: secret

实体映射:

@Document(indexName = "articles")
public class ArticleDocument {
 
    @Id
    private String id;
 
    @Field(type = FieldType.Text, analyzer = "ik_max_word", searchAnalyzer = "ik_smart")
    private String title;
 
    @Field(type = FieldType.Keyword)
    private String author;
 
    @Field(type = FieldType.Integer)
    private Integer viewCount;
 
    @Field(type = FieldType.Date, format = DateFormat.date_time)
    private LocalDateTime createdAt;
}

Repository:

public interface ArticleRepository
        extends ElasticsearchRepository<ArticleDocument, String> {
    List<ArticleDocument> findByAuthor(String author);
}

复杂查询用 ElasticsearchOperations:

@Service
@RequiredArgsConstructor
public class ArticleSearchService {
 
    private final ElasticsearchOperations esOps;
 
    public SearchHits<ArticleDocument> search(String keyword, String author) {
        Query query = NativeQuery.builder()
            .withQuery(q -> q.bool(b -> b
                .must(m -> m.match(t -> t.field("title").query(keyword)))
                .filter(f -> f.term(t -> t.field("author").value(author)))
            ))
            .withSort(Sort.by(Sort.Direction.DESC, "viewCount"))
            .withPageable(PageRequest.of(0, 20))
            .build();
        return esOps.search(query, ArticleDocument.class);
    }
}

相关集成:


相关

  • Kibana — ES 的可视化工具,查询 / 监控 / 看板
  • MySQL — 结构化数据存储,ES 常作为 MySQL 的搜索层
  • ClickHouse — 日志分析的另一选择
  • Redis — 缓存层,配合 ES 减轻热点查询压力