SQL Story摘录（三）————可扩展设计

2024-07-21 02:09:03

字体：大中小

来源：转载

供稿：网友

面向集合的结构化设计。这一点很多人都知道，可真正能够活用的就太少了。举一个简单的例子：
例1-3：有一个简单的数据表orders，存储某商店的订单信息：
create table [dbo].[orders] (
[id] [int] identity (1, 1) not null ,
[customerid] [int] not null ,
[orderdate] [datetime] not null
) on [primary]
go
create clustered index [cu_inx_orderdate] on [dbo].[orders]([orderdate]) with fillfactor = 50 on [primary]
go
alter table [dbo].[orders] with nocheck add
constraint [pk_orders] primary key nonclustered([id])
on [primary]
go
表中现在有以下数据：
id customerid orderdate
----------- ----------- ------------------------------------------------------
1 1 1999-1-4
2 10 1999-3-5
3 22 1999-5-2
4 2 1999-6-7
5 2 2000-3-6
7 101 2001-5-3
8 10 2001-6-5
6 101 2002-4-2
那么，我们如何生成一个1999-2002的年度订单数报表呢（四年只有8个订单？我为了演示方便才这样做的，这并不代表真实的情况：p）？现在，我给出实际报表的数据格式，读者们请先试一下这个语句的写法
customerid 1999 2000 2001 2002
-------------- ------ ------ ------ ------
1 1 0 0 0
2 1 1 0 0
10 1 0 1 0
22 1 0 0 0
101 0 0 1 1
最直观的想法，是在前台，用其它语言实现这一功能。不过有一个办法，可以用sql语言来实现它。而且不一定比你想像的更复杂：
select customerid,
sum(case when year(isnull(orderdate, 0)) = 1999 then 1 else 0 end) as "1999",
sum(case when year(isnull(orderdate, 0)) = 2000 then 1 else 0 end) as "2000",
sum(case when year(isnull(orderdate, 0)) = 2001 then 1 else 0 end) as "2001",
sum(case when year(isnull(orderdate, 0)) = 2002 then 1 else 0 end) as "2002"
from orders
group by customerid
我想这时会有朋友提出interbase不支持case的问题。不过即使如此，我还是要向大家推荐这种写法。因为它优美、简洁，不仅我们读着好懂，还可以很方便地写出程序来自动生成它。事实上，case关键字已是sql标准之一，大势所趋，会有越来越多的数据库系统支持它的。
那么它又是怎么来的呢？我在设计这个语句时是这样的思路：
1、我们需要一个同时在时间和客户两个坐标轴上展开的报表；
2、纵向上，我们要为每一位客户建立一行数据，这个比较好办，我们首先确定了这个语句会有一个基本框架
select customerid,
………………
from orders
group by customerid
如果不区分年度，已下语句就是我们要的结果
select customerid,
count(id) as orders_count,
from orders
group by customerid
3、设所有订单为一全集，那么这个集合的总数用以下语句来统计：
select count(id) from orders
横向上，我们为每一年度的订单数定义一列，以1999年为例，取年份为1999年的订单子集的元素数为
select sum(case when year(isnull(orderdate, 0)) = 1999 then 1 else 0 end) as "1999"
from orders
其它年份依此类推，我们得到每一年的订单数：
select sum(case when year(isnull(orderdate, 0)) = 1999 then 1 else 0 end) as "1999",
sum(case when year(isnull(orderdate, 0)) = 2000 then 1 else 0 end) as "2000",
sum(case when year(isnull(orderdate, 0)) = 2001 then 1 else 0 end) as "2001",
sum(case when year(isnull(orderdate, 0)) = 2002 then 1 else 0 end) as "2002"
from orders
其返回结果如下：
1999 2000 2001 2002
----------- ----------- ----------- -----------
4 1 2 1

(所影响的行数为 1 行)
4、顾及到关系型数据库“诡异”的null值问题后，综合2、3步，我们得出最终的语句：
select customerid,
sum(case when year(isnull(orderdate, 0)) = 1999 then 1 else 0 end) as "1999",
sum(case when year(isnull(orderdate, 0)) = 2000 then 1 else 0 end) as "2000",
sum(case when year(isnull(orderdate, 0)) = 2001 then 1 else 0 end) as "2001",
sum(case when year(isnull(orderdate, 0)) = 2002 then 1 else 0 end) as "2002"
from orders
group by customerid
现在这个报表结构清晰明白。扩展性极强。比如明年我们需要2003年的统计数据，只要再依葫芦画瓢，来一列
sum(case when year(isnull(orderdate, 0)) = 2003 then 1 else 0 end) as "2003"
加在最后就可以了，它是全集中的2003年数据的子集。还有，用来判断空值的isnull函数不一定所有的数据库都有，没关系，只要在case的分支里加一行
when orderdate is null then 0
就可以了。基于这个思想，我们可以很容易地写出一个存储过程，只要给定起讫年份，就可以生成一个完整的年度报表。由于所有的运算都在服务器端运行，并且是随着数据检索一次就完成了。它的速度快于客户端的报表。而且传输的数据量也少，可以有效减轻网络负载。
在《sql server6.5技术内幕》中，有一个类似的例子。不过作者使用的语句结构比我的复杂，他的例子中，from关键字是从一个子查询导出表中选择的数据，这让我百思不得其解。也许6.5版的ms sql server还不支持我的写法，也许那样写性能更好。作者并没有说明，我也一直没有机会接触到ms sql server6.5。
对于interbase，我还没有办法用足够优雅的语句生成这个报表。这主要是由于interbase不支持case。不过，如果你对语句的性能和美感要求不高的话，下面这个语句可以实现与以上的sql server版本相同的功能：
select o.customerid,
(select count(i.id)
from orders i
where (i.customerid = o.customerid)
and (extract(year from i.orderdate) = 1999))
as count_1999,
(select count(i.id)
from orders i
where (i.customerid = o.customerid)
and (extract(year from i.orderdate) = 2000))
as count_2000,
(select count(i.id)
from orders i
where (i.customerid = o.customerid)
and (extract(year from i.orderdate) = 2001))
as count_2001,
(select count(i.id)
from orders i
where (i.customerid = o.customerid)
and (extract(year from i.orderdate) = 2002))
as count_2002
from orders o
group by o.customerid
依照sql server版本，我们完成了interbase版的年度报表。不同的是由于使用了相关子查询统计数据，它的效率会差一些(好在你不需要即时更新你的年度报表吧)。不过由于它同样是基于面向集合的设计构架，至少我们保证了它的可扩展性。只是很明显的，当子查询版本中增加一列年度统计，所带来的开销增长会比case版本多很多。如果你对速度要求较高，还是在客户端另写程序生成吧。
interbase数据库的用户会在这个示例中遇到很多不满意的地方：不支持自动标识列、没有聚簇索引、没有case、没有……更可恨的是，这个数据系统的开放源码版本没有附带odbc或ado驱动，在得到一个免费的数据库系统后，我们却要为它花几十美元去买一套odbc驱动？
不过，interbase正在得到开放源码社区的支持，borland公司也通过dbexpress和interclient技术来为interbase提供开放的接口（目前dbexpress驱动基本上也只存在于borland昂贵的企业版开发工具中:(）。只要每一个interbase的程序员和用户都为这个属于我们自己的软件做出贡献，它的前途还很光明。

面向集合的设计方法虽然只适用于特定的目标，并不是通用的软件设计方法。但也不是三言两语能说清的，以后的章节中，我们会一直实际这种设计方法，还会有专门的章节讨论这个问题。那时，我们的示例数据库也建设的比较完整了，我也许会给出更实用的年度订单统计报表。现在，我们先简单地总结一下：
1、定义我们要生成的结果集的结构；
2、找出结果集的数据来源，定义全集；
3、定义结果集的取值范围，定义所取的子集；
4、完成操作。

上一篇：增加或修改列时的规则

下一篇：SQL Story摘录（四）————信息挖掘初步