错误现象:
服务器CPU增加、响应慢。
问题原因:
取回statspack报告一看,问题很快找到了,有一条语句的physical reads非常高,初步判断这条语句没中索引,是全表扫描。但奇怪的是,这条语句结构很简单,对一个表的查询,带一个查询条件。类似如下:
select a, b from ttt where b like 'aaa%'
ttt表是个大表,这样简单而且会导致全表扫描的语句没道理能轻易跑到生成库上去的啊。
查了一下,果然字段b上面也是有索引的,而且b的cardinality值很高,以上语句几乎肯定命中索引,在开发库上看了它的查询计划,确实没错,命中了索引,效率也很高,几乎没有物理读:
Execution Plan
----------------------------------------------------------
Plan hash value: 240739660
-------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2 | 60 | 2 (0)| 00:00:01 |
| 1 | TABLE access BY INDEX ROWID| TTT | 2 | 60 | 2 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | TTT_IDX | 2 | | 1 (0)| 00:00:01 |
-------------------------------------------------------------------------
PRedicate Information (identified by operation id):
---------------------------------------------------
2 - access("B" LIKE 'aaa%')
filter("B" LIKE 'aaa%')
Statistics
----------------------------------------------------------
246 recursive calls
0 db block gets
57 consistent gets
4 physical reads
0 redo size
464 bytes sent via SQL*Net to client
385 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
5 sorts (memory)
0 sorts (disk)
1 rows processed
嗯,难道是生产库上的分析数据不正确(采用的是CBO),于是让现场获取语句的查询计划,竟然和开发库结果是一样的!这就奇怪了,statspack报告上显示这条语句每次都几百万的物理读啊,对比一下刚刚从生产库拿到的报告,还是和前面一样。难道有什么东西改变了语句的查询计划?为了证实这个想法,决定用更高级别的statspack来获取该语句运行时的查询计划(具体方法可以参加另一篇文章《利用statspack来获取生成环境中top SQL及其执行计划》)。
先让现场产生2个级别为6的快照,从这两个快照生成的报告中找到了语句的hash值,然后生成语句的报告,果然和我们得到的查询计划不同,是全表扫描!
SQL Text
~~~~~~~~
select a, b from ttt where b like 'aaa%'
Plans in shared pool between Begin and End Snap Ids
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Shows the Execution Plans found in the shared pool between the begin and end
snapshots specified. The values for Rows, Bytes and Cost shown below are those
which existed at the time the first-ever snapshot captured this plan - these
values often change over time, and so may not be indicative of current values
-> Rows indicates Cardinality, PHV is Plan Hash Value
-> ordered by Plan Hash Value
--------------------------------------------------------------------------------
| Operation | PHV/Object Name | Rows | Bytes| Cost |
--------------------------------------------------------------------------------
|SELECT STATEMENT | 2 | 60 | | 56 |
| TABLE ACCESS FULL | | 2 | 60 | 56 |
--------------------------------------------------------------------------------
为什么会这样呢?
检查程序代码,结果在调用这条语句所在函数之前的语句中,我们发现了以下语句:
Execute Immediate ' alter session set nls_comp=LINGUISTIC'
Execute Immediate ' alter session set nls_sort=BINARY_CI'
嗯,这下我基本明白了。看到这两个会话参数的设置我知道程序员是想对查询条件不区分大小写。这是10gR2的新特性,我也曾经在我的blog上介绍过,但是,我也提到过,这会导致查询无法正确命中索引,这一点却被该程序员忽略了。可以看以下结果:
SQL> alter session set nls_sort='BINARY_CI';
Session altered.
SQL> alter session set nls_comp='LINGUISTIC';
Session altered.
SQL> set autot trace
SQL> select a, b from ttt where b like 'aaa%';
Execution Plan
----------------------------------------------------------
Plan hash value: 774701505
-------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2 | 60 | 56 (2)| 00:00:01|
|* 1 | TABLE ACCESS FULL| TTT | 2 | 60 | 56 (2)| 00:00:01|
-------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("B" LIKE 'aaa%')
Statistics
----------------------------------------------------------
364 recursive calls
0 db block gets
321 consistent gets
73 physical reads
0 redo size
560 bytes sent via SQL*Net to client
385 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
8 sorts (memory)
0 sorts (disk)
5 rows processed
SQL> alter session set nls_sort='BINARY';
Session altered.
SQL> alter session set nls_comp='BINARY';
Session altered.
SQL> select a, b from ttt where b like 'aaa%';
Execution Plan
----------------------------------------------------------
Plan hash value: 240739660
-------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2 | 60 | 2 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| TTT | 2 | 60 | 2 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | TTT_IDX | 2 | | 1 (0)| 00:00:01 |
------------------------------------------------------------------------------
---------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("B" LIKE 'aaa%')
filter("B" LIKE 'aaa%')
Statistics
----------------------------------------------------------
8 recursive calls
0 db block gets
6 consistent gets
3 physical reads
0 redo size
464 bytes sent via SQL*Net to client
385 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
这里关于这个问题再说多两句,当时我的文章中认为通过nls设置不区分大小写的实质是Oracle内部加了upper函数,这个结论应该是错的。实质上,此时在做精确匹配时应该是做了NLSSORT函数转换:
SQL> alter session set nls_sort='BINARY_CI';
Session altered.
SQL> alter session set nls_comp='LINGUISTIC';
Session altered.
SQL> select a, b from ttt where b = 'aaa';
Execution Plan
----------------------------------------------------------
Plan hash value: 774701505
-------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2 | 60 | 57 (4)| 00:00:01|
|* 1 | TABLE ACCESS FULL| TTT | 2 | 60 | 57 (4)| 00:00:01|
-------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(NLSSORT("B",'nls_sort=''BINARY_CI''')=HEXTORAW('61616100')
)
Statistics
----------------------------------------------------------
217 recursive calls
0 db block gets
293 consistent gets
68 physical reads
0 redo size
461 bytes sent via SQL*Net to client
385 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
5 sorts (memory)
0 sorts (disk)
1 rows processed
最终,对于这个问题的解决,还是通过传统的加upper函数来解决。
结论:
1、 不要轻易修改会话属性,对于一个具有独立属性的会话,将很难定位其中的问题;
2、 新特性固然有好处,但在使用之前,一定要先了解它的负面影响。
注意,本文目的是描述问题及其解决过程,所用数据都是在试验库上模拟出来的。
新闻热点
疑难解答