搬砖小抄

使用投影(projections)来优化数据库查询

字数统计: 1.2k阅读时长: 5 min
2018/04/26 Share

Spring data 把数据库常见操作都封装得差不多了,甚至连分页都为不用自己写代码,确实大大提高了码砖的效率,对于一般的需求,往Reposistory里加几个接口就妥当了.那么反过来说,如果有’不一般’的需求呢?

这就是矛盾的地方,ORM用它自己的一套概念给程序员洗脑,让你感觉不到数据库的存在,但实际上要达到这个目标很难,总归要和数据库直接打交道.

我最近碰到的一个问题就是索引失效,这个问题一开始我还没察觉到,因为我已经提前根据查询语句创建了好了索引,但是当我观察数据库的性能报告却发现查询效率很低,把相应的语句弄出来执行EXPLAIN,居然是ALL.

背景

entity

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
@Data
@Entity
@Table(name = "t_clan_tracking")
public class ClanTracking {
@Id
@Column(name = "f_clan_tag")
private String clan;

@Column(name = "f_name",nullable = false)
private String name;

@Column(name = "f_score")
private int score;

@Column(name = "f_last_hit_time")
@Temporal(TemporalType.TIMESTAMP)
private Date lastHit;
}

repositiry

1
2
3
public interface ClanTrackingRepository extends JpaRepository<ClanTracking,String> {
ClanTracking findFirstByOrderByLastHitAscScoreDesc();
}

以上的代码来自shufork项目的coc-discovery模块,由于这个模块扮演着类似爬虫的角色(定期去抓数据),为了尽量抓取有意义的数据,因此需要简单记录一下抓取时间(lastHit)和对象权重(score).每一次需要抓数据的时候就从数据库里面按照抓取时间和权重值找一个目标出来,然后去抓取它的详细信息.总之,findFirstByOrderByLastHitAscScoreDesc这个查询的执行非常频繁.

使用(伪代码)

1
2
3
4
5
6
7
8
9
10
11
12
13
public ClanTracking retrieveOne(){
ClanTracking found = clanTrackingRepository.findFirstByOrderByLastHitAscScoreDesc();
if(found != null){
found.setLastHit(DateTimeUtil.utc().toDate());
clanTrackingRepository.save(found);
}
return found;
}

public void pull(){
ClanTracking o = retrieveOne();
log.debug("tag = {},name = {}",o.getClan(),o.getName());
}

为了便于理解,这里写了个伪代码pull()来表示抓数据的方法.

现象

程序跑起来后,打开MySql的性能报告看了一下,发现t_clan_tracking表存在明显的查询性能问题.

Executed (#) Errors (#) Warnings (#) Total Time Max Time Avg Time
17962 0 0 2700217844 2069254.11 150329.4

其对应的SQL是

1
2
3
4
5
6
7
8
9
10
11
SELECT `clantracki0_` . `f_clan_tag` AS `f_clan_t1_0_` ,
`clantracki0_` . `z_created_time` AS `z_create2_0_` ,
`clantracki0_` . `f_last_hit_time` AS `f_last_h3_0_` ,
`clantracki0_` . `z_modified_time` AS `z_modifi4_0_` ,
`clantracki0_` . `f_name` AS `f_name5_0_` ,
`clantracki0_` . `f_score` AS `f_score6_0_` ,
`clantracki0_` . `z_version` AS `z_versio7_0_`
FROM `t_clan_tracking` `clantracki0_`
ORDER BY `clantracki0_` . `f_last_hit_time` ASC ,
`clantracki0_` . `f_score` DESC
LIMIT ?

很显然它就是ClanTrackingRepositoryfindFirstByOrderByLastHitAscScoreDesc了.

检查一下索引:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
mysql> show index from t_clan_tracking \G;
*************************** 1. row ***************************
Table: t_clan_tracking
Non_unique: 0
Key_name: PRIMARY
Seq_in_index: 1
Column_name: f_clan_tag
Collation: A
Cardinality: 105610
Sub_part: NULL
Packed: NULL
Null:
Index_type: BTREE
Comment:
Index_comment:
*************************** 2. row ***************************
Table: t_clan_tracking
Non_unique: 1
Key_name: idx_top_last_hit_score
Seq_in_index: 1
Column_name: f_last_hit_time
Collation: A
Cardinality: 13886
Sub_part: NULL
Packed: NULL
Null: YES
Index_type: BTREE
Comment:
Index_comment:
*************************** 3. row ***************************
Table: t_clan_tracking
Non_unique: 1
Key_name: idx_top_last_hit_score
Seq_in_index: 2
Column_name: f_score
Collation: A
Cardinality: 21919
Sub_part: NULL
Packed: NULL
Null: YES
Index_type: BTREE
Comment:
Index_comment:
*************************** 4. row ***************************
Table: t_clan_tracking
Non_unique: 1
Key_name: idx_top_last_hit_score
Seq_in_index: 3
Column_name: f_clan_tag
Collation: A
Cardinality: 101087
Sub_part: NULL
Packed: NULL
Null:
Index_type: BTREE
Comment:
Index_comment:
*************************** 5. row ***************************
Table: t_clan_tracking
Non_unique: 1
Key_name: idx_top_last_hit_score
Seq_in_index: 4
Column_name: f_name
Collation: A
Cardinality: 94932
Sub_part: NULL
Packed: NULL
Null:
Index_type: BTREE
Comment:
Index_comment:
5 rows in set (0.00 sec)

查询条件中的列是有索引的,顺序也正确,不过取出的列太多了,导致索引没有起作用.而上面的代码说明实际的业务逻辑需要用到的字段只有

  • 查询条件字段f_last_hit_timef_score
  • 查询结果字段f_clan_tagf_name

使用EXPLAIN验证一下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
mysql> EXPLAIN SELECT *  FROM t_clan_tracking  ORDER BY f_last_hit_time ASC, f_score DESC  LIMIT 1 \G;
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: t_clan_tracking
partitions: NULL
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 105610
filtered: 100.00
Extra: Using filesort

Join Type 是ALL,MySql文档对于它的描述是

A full table scan is done for each combination of rows from the previous tables. This is normally not good if the table is the first table not marked const, and usually very bad in all other cases. Normally, you can avoid ALL by adding indexes that enable row retrieval from the table based on constant values or column values from earlier tables.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
mysql> EXPLAIN SELECT f_clan_tag,f_name FROM t_clan_tracking  ORDER BY f_last_hit_time ASC, f_score DESC LIMIT 1 \G;
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: t_clan_tracking
partitions: NULL
type: index
possible_keys: NULL
key: idx_top_last_hit_score
key_len: 2055
ref: NULL
rows: 105610
filtered: 100.00
Extra: Using index; Using filesort

知道原因就好对症下药了,查阅sping data的文档发现投影(prijections)操作就可以解决这个问题.

改造后的代码

entity

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
@Data
@Entity
@Table(name = "t_clan_tracking")
public class ClanTracking {
@Id
@Column(name = "f_clan_tag")
private String clan;

@Column(name = "f_name",nullable = false)
private String name;

@Column(name = "f_score")
private int score;

@Column(name = "f_last_hit_time")
@Temporal(TemporalType.TIMESTAMP)
private Date lastHit;

public interface ClanTracker{
String getClan();
String getName();
}
}

repositiry

1
2
3
4
5
6
7
8
9
public interface ClanTrackingRepository extends JpaRepository<ClanTracking,String> {
ClanTracking findFirstByOrderByLastHitAscScoreDesc();

<T> T findFirstByOrderByLastHitAscScoreDesc(Class<T> type);

@Modifying
@Query("update ClanTracking u set u.lastHit = ?2 where u.clan = ?1")
void updateLastHit(String clan,Date lastHit);
}

使用(伪代码)

1
2
3
4
5
6
7
8
9
10
11
12
public ClanTracking.ClanTracker retrieveOne(){
ClanTracking.ClanTracker found = clanTrackingRepository.findFirstByOrderByLastHitAscScoreDesc(ClanTracking.ClanTracker.class);
if(found != null){
clanTrackingRepository.updateLastHit(found.getClan(),DateTimeUtil.utc().toDate());
}
return found;
}

public void pull(){
ClanTracking.ClanTracker o = retrieveOne();
log.debug("tag = {},name = {}",o.getClan(),o.getName());
}

参考资料

CATALOG
  1. 1. 背景
  2. 2. 现象
  3. 3. 改造后的代码
  4. 4. 参考资料