在mysql使用时到超大千万级数据量时我们大多会考虑到分表查询,分库查询,下面我们来介绍一下关于分表查询时的优化.
我们的项目中有好多不等于的情况,今天写这篇文章简单的分析一下怎么个优化法,这里的分表逻辑是根据t_group表的user_name组的个数来分的.
因为这种情况单独user_name字段上的索引就属于烂索引,起不了啥名明显的效果.
1、试验PROCEDURE,代码如下:
- DELIMITER $$
- DROP PROCEDURE `t_girl`.`sp_split_table`$$
- CREATE PROCEDURE `t_girl`.`sp_split_table`()
- BEGIN
- declare done int default 0;
- declare v_user_name varchar(20) default '';
- declare v_table_name varchar(64) default '';
- -- Get all users' name.
- declare cur1 cursor for select user_name from t_group group by user_name;
- -- Deal with error or warnings.
- declare continue handler for 1329 set done = 1;
- -- Open cursor.
- open cur1;
- while done <> 1
- do
- fetch cur1 into v_user_name;
- if not done then
- -- Get table name.
- set v_table_name = concat('t_group_',v_user_name);
- -- Create new extra table.
- set @stmt = concat('create table ',v_table_name,' like t_group');
- prepare s1 from @stmt;
- execute s1;
- drop prepare s1;
- -- Load data into it.
- set @stmt = concat('insert into ',v_table_name,' select * from t_group where user_name = ''',v_user_name,'''');
- prepare s1 from @stmt;
- execute s1;
- drop prepare s1;
- end if;
- end while;
- -- Close cursor.
- close cur1;
- -- Free variable from memory.
- set @stmt = NULL;
- END$$
- DELIMITER ;
2、试验表,我们用一个有一千万条记录的表来做测试,代码如下:
- mysql> select count(*) from t_group;
- +----------+
- | count(*) |
- +----------+
- | 10388608 |
- +----------+
- 1 row in set (0.00 sec)
表结构,代码如下:
- mysql> desc t_group;
- +-------------+------------------+------+-----+-------------------+----------------+
- | Field | Type | Null | Key | Default | Extra |
- +-------------+------------------+------+-----+-------------------+----------------+
- | id | int(10) unsigned | NO | PRI | NULL | auto_increment |
- | money | decimal(10,2) | NO | | | |
- | user_name | varchar(20) | NO | MUL | | |
- | create_time | timestamp | NO | | CURRENT_TIMESTAMP | |
- +-------------+------------------+------+-----+-------------------+----------------+
- 4 rows in set (0.00 sec)
索引情况.
- mysql> show index from t_group;
- +---------+------------+------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
- | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
- +---------+------------+------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
- | t_group | 0 | PRIMARY | 1 | id | A | 10388608 | NULL | NULL | | BTREE | |
- | t_group | 1 | idx_user_name | 1 | user_name | A | 8 | NULL | NULL | | BTREE | |
- | t_group | 1 | idx_combination1 | 1 | user_name | A | 8 | NULL | NULL | | BTREE | |
- | t_group | 1 | idx_combination1 | 2 | money | A | 3776 | NULL | NULL | | BTREE | |
- +---------+------------+------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
- 4 rows in set (0.00 sec) --Vevb.com
PS:idx_combination1 这个索引是必须的,因为要对user_name来GROUP BY,此时属于松散索引扫描,当然完了后你可以干掉她.
idx_user_name 这个索引是为了加快单独执行constant这种类型的查询.
我们要根据用户名来分表,代码如下:
- mysql> select user_name from t_group where 1 group by user_name;
- +-----------+
- | user_name |
- +-----------+
- | david |
- | leo |
- | livia |
- | lucy |
- | sarah |
- | simon |
- | sony |
- | sunny |
- +-----------+
- 8 rows in set (0.00 sec)
所以结果表应该是这样的,代码如下:
- mysql> show tables like 't_group_%';
- +------------------------------+
- | Tables_in_t_girl (t_group_%) |
- +------------------------------+
- | t_group_david |
- | t_group_leo |
- | t_group_livia |
- | t_group_lucy |
- | t_group_sarah |
- | t_group_simon |
- | t_group_sony |
- | t_group_sunny |
- +------------------------------+
- 8 rows in set (0.00 sec)
3、对比结果,代码如下:
- mysql> select count(*) from t_group where user_name = 'david';
- +----------+
- | count(*) |
- +----------+
- | 1298576 |
- +----------+
- 1 row in set (1.71 sec)
执行了将近2秒.
- mysql> select count(*) from t_group_david;
- +----------+
- | count(*) |
- +----------+
- | 1298576 |
- +----------+
- 1 row in set (0.00 sec)
几乎是瞬间的.
- mysql> select count(*) from t_group where user_name <> 'david';
- +----------+
- | count(*) |
- +----------+
- | 9090032 |
- +----------+
- 1 row in set (9.26 sec)
执行了将近10秒,可以想象,这个是实际的项目中是不能忍受的.
- mysql> select (select count(*) from t_group) - (select count(*) from t_group_david) as total;
- +---------+
- | total |
- +---------+
- | 9090032 |
- +---------+
- 1 row in set (0.00 sec)
几乎是瞬间的.
我们来看看聚集函数,对于原表的操作,代码如下:
- mysql> select min(money),max(money) from t_group where user_name = 'david';
- +------------+------------+
- | min(money) | max(money) |
- +------------+------------+
- | -6.41 | 500.59 |
- +------------+------------+
- 1 row in set (0.00 sec)
最小,最大值都是FULL INDEX SCAN,所以是瞬间的.
- mysql> select sum(money),avg(money) from t_group where user_name = 'david';
- +--------------+------------+
- | sum(money) | avg(money) |
- +--------------+------------+
- | 319992383.84 | 246.417910 |
- +--------------+------------+
- 1 row in set (2.15 sec)
其他聚集函数的结果就不是FULL INDEX SCAN了,耗时2.15秒,对于小表的操作,代码如下:
- mysql> select min(money),max(money) from t_group_david;
- +------------+------------+
- | min(money) | max(money) |
- +------------+------------+
- | -6.41 | 500.59 |
- +------------+------------+
- 1 row in set (1.50 sec)
最大最小值完全是FULL TABLE SCAN,耗时1.50秒,不划算,以此看来,代码如下:
- mysql> select sum(money),avg(money) from t_group_david;
- +--------------+------------+
- | sum(money) | avg(money) |
- +--------------+------------+
- | 319992383.84 | 246.417910 |
- +--------------+------------+
- 1 row in set (1.68 sec)
取得这两个结果也是花了快2秒,快了一点,我们来看看这个小表的结构,代码如下:
- mysql> desc t_group_david;
- +-------------+------------------+------+-----+-------------------+----------------+
- | Field | Type | Null | Key | Default | Extra |
- +-------------+------------------+------+-----+-------------------+----------------+
- | id | int(10) unsigned | NO | PRI | NULL | auto_increment |
- | money | decimal(10,2) | NO | | | |
- | user_name | varchar(20) | NO | MUL | | |
- | create_time | timestamp | NO | | CURRENT_TIMESTAMP | |
- +-------------+------------------+------+-----+-------------------+----------------+
- 4 rows in set (0.00 sec)
明显的user_name属性是多余的,那么就干掉它,代码如下:
- mysql> alter table t_group_david drop user_name;
- Query OK, 1298576 rows affected (7.58 sec)
- Records: 1298576 Duplicates: 0 Warnings: 0
现在来重新对小表运行查询,代码如下:
- mysql> select min(money),max(money) from t_group_david;
- +------------+------------+
- | min(money) | max(money) |
- +------------+------------+
- | -6.41 | 500.59 |
- +------------+------------+
- 1 row in set (0.00 sec)
此时是瞬间的.
- mysql> select sum(money),avg(money) from t_group_david;
- +--------------+------------+
- | sum(money) | avg(money) |
- +--------------+------------+
- | 319992383.84 | 246.417910 |
- +--------------+------------+
- 1 row in set (0.94 sec)
这次算是控制在一秒以内了.
mysql> Aborted
小总结一下:分出的小表的属性尽量越少越好,大胆的去干吧.
新闻热点
疑难解答