在sql中我们经常会碰到有重复的一些数据,下面我来介绍在mysql中删除重复记录的多种方法,有需要的朋友可参考参考.
删除重复记录方法一:
1.新建一个临时表,代码如下:
create table tmp as select * from youtable group by name(name为不希望有重复的列)
2.删除原来的表,代码如下:drop table youtable
3.重命名表,代码如下:
alter table tmp rename youtable
但是这个方法有个问题,由临时表转变过来的最终表,其表结构会和原来的不一致,需要手工更改。这个问题,待解决。
删除重复记录方法二:
1.新建一个临时表,代码如下:
CREATE TABLE tmp AS SELECT * FROM youtable GROUP BY name(name为不希望有重复的列)
2.清空原来的表,代码如下:
TRUNCATE TABLE youtable
3.把临时表插入到youtable,代码如下:
INSERT INTO tablename SELECT * FROM temp
4.删除临时表,代码如下:
DROP TABLE temp
删除重复记录方法三:代码如下:
delete table where ID not in(select min(ID) from table group by name(name:重复的字段))
删除重复记录方法四:
具体实现如下:
- Table Create Table
- ------------ --------------------------------------------------------
- users_groups CREATE TABLE `users_groups` (
- `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
- `uid` int(11) NOT NULL,
- `gid` int(11) NOT NULL,
- PRIMARY KEY (`id`)
- ) ENGINE=InnoDB AUTO_INCREMENT=15 DEFAULT CHARSET=utf8
users_groups.txt内容:
- 1,11,502
- 2,107,502
- 3,100,503
- 4,110,501
- 5,112,501
- 6,104,502
- 7,100,502
- 8,100,501
- 9,102,501
- 10,104,502
- 11,100,502
- 12,100,501
- 13,102,501
- 14,110,501
- mysql> load data infile 'c:/users_groups.txt' into table users_groups fields
- terminated by ',' lines terminated by 'n';
- Query OK, 14 rows affected (0.05 sec)
- Records: 14 Deleted: 0 Skipped: 0 Warnings: 0
- mysql> select * from users_groups;
- query result(14 records)
- id uid gid
- 1 11 502
- 2 107 502
- 3 100 503
- 4 110 501
- 5 112 501
- 6 104 502
- 7 100 502
- 8 100 501
- 9 102 501
- 10 104 502
- 11 100 502
- 12 100 501
- 13 102 501
- 14 110 501
- 14 rows in set (0.00 sec)
根据一位兄弟的建议修改,代码如下:
- mysql> create temporary table tmp_wrap select * from users_groups group by uid having count(1) >= 1;
- Query OK, 7 rows affected (0.11 sec)
- Records: 7 Duplicates: 0 Warnings: 0
- mysql> truncate table users_groups;
- Query OK, 14 rows affected (0.03 sec)
- mysql> insert into users_groups select * from tmp_wrap;
- Query OK, 7 rows affected (0.03 sec)
- Records: 7 Duplicates: 0 Warnings: 0
- mysql> select * from users_groups;
- --Vevb.com
- query result(7 records)
- id uid gid
- 1 11 502
- 2 107 502
- 3 100 503
- 4 110 501
- 5 112 501
- 6 104 502
- 9 102 501
- mysql> drop table tmp_wrap;
- Query OK, 0 rows affected (0.05 sec)
2、还有一个很精简的办法.
查找重复的,并且除掉最小的那个,代码如下:
- delete users_groups as a from users_groups as a,
- (
- select *,min(id) from users_groups group by uid having count(1) > 1
- ) as b
- where a.uid = b.uid and a.id > b.id;
- (7 row(s)affected)
- (0 ms taken)
- query result(7 records)
- id uid gid
- 1 11 502
- 2 107 502
- 3 100 503
- 4 110 501
- 5 112 501
- 6 104 502
- 9 102 501
3、现在来看一下这两个办法的效率,运行一下以下SQL 语句,代码如下:
- create index f_uid on users_groups(uid);
- explain select * from users_groups group by uid having count(1) > 1 union all
- select * from users_groups group by uid having count(1) = 1;
- explain select * from users_groups as a,
- (
- select *,min(id) from users_groups group by uid having count(1) > 1
- ) as b
- where a.uid = b.uid and a.id > b.id;
- query result(3 records)
- id select_type table type possible_keys key key_len ref rows Extra
- 1 PRIMARY users_groups index (NULL) f_uid 4 (NULL) 14
- 2 UNION users_groups index (NULL) f_uid 4 (NULL) 14
- (NULL) UNION RESULT <union1,2> ALL (NULL) (NULL) (NULL) (NULL) (NULL)
- query result(3 records)
- id select_type table type possible_keys key key_len ref rows Extra
- 1 PRIMARY <derived2> ALL (NULL) (NULL) (NULL) (NULL) 4
- 1 PRIMARY a ref PRIMARY,f_uid f_uid 4 b.uid 1 Using where
- 2 DERIVED users_groups index (NULL) f_uid 4 (NULL) 14
很明显的第二个比第一个扫描的函数要少,当没有创建表或创建索引权限的时候,创建一个新表,然后将原表中不重复的数据插入新表,代码如下:
- mysql> create table demo_new as select * from demo group by site;
- Query OK, 3 rows affected (0.19 sec)
- Records: 3 Duplicates: 0 Warnings: 0
- mysql> show tables;
- +----------------+
- | Tables_in_test |
- +----------------+
- | demo |
- | demo_new |
- +----------------+
- 2 rows in set (0.00 sec)
- mysql> select * from demo order by id;
- +----+------------------------+
- | id | site |
- +----+------------------------+
- | 1 | http://www.Vevb.com |
- | 2 | http://Vevb.com |
- | 3 | http://www.Vevb.com |
- | 4 | http://www.Vevb.com |
- | 5 | http://www.Vevb.com |
- +----+------------------------+
- 5 rows in set (0.00 sec)
- mysql> select * from demo_new order by id;
- +----+------------------------+
- | id | site |
- +----+------------------------+
- | 1 | http://www.Vevb.com |
- | 2 | http://Vevb.com |
- | 3 | http://www.Vevb.com |
- +----+------------------------+
- 3 rows in set (0.00 sec)
然后将原表备份,将新表重命名为当前表,代码如下:
- mysql> rename table demo to demo_old, demo_new to demo;
- Query OK, 0 rows affected (0.04 sec)
- mysql> show tables;
- +----------------+
- | Tables_in_test |
- +----------------+
- | demo |
- | demo_old |
- +----------------+
- 2 rows in set (0.00 sec)
- mysql> select * from demo order by id;
- +----+------------------------+
- | id | site |
- +----+------------------------+
- | 1 | http://www.Vevb.com |
- | 2 | http://Vevb.com |
- | 3 | http://www.Vevb.com |
- +----+------------------------+
- 3 rows in set (0.00 sec)
新闻热点
疑难解答