首页 > 开发 > 综合 > 正文

Heaps of data: tables without clustered indexes

2024-07-21 02:07:15
字体:
来源:转载
供稿:网友



if you create a table on adaptive server, but do not create a clustered index, the table is stored as a heap. the data rows are not stored in any particular order. this section describes how select, insert, delete, and update operations perform on heaps when there is no "useful" index to aid in retrieving data.

the phrase "no useful index" is important in describing the optimizer's decision to perform a table scan. sometimes, an index exists on the columns named in a where clause, but the optimizer determines that it would be more costly to use the index than to perform a table scan.

other chapters in this book describe how the optimizer costs queries using indexes and how you can get more information about why the optimizer makes these choices.

table scans are always used when you select all rows in a table. the only exception is when the query includes only columns that are keys in a nonclustered index.

for more information, see "index covering".

the following sections describe how adaptive server locates rows when a table has no useful index.
lock schemes and differences between heaps
the data pages in an allpages-locked table are linked into a doubly-linked list of pages by pointers on each page. pages in data-only-locked tables are not linked into a page chain.

in an allpages-locked table, each page stores a pointer to the next page in the chain and to the previous page in the chain. when new pages need to be inserted, the pointers on the two adjacent pages change to point to the new page. when adaptive server scans an allpages-locked table, it reads the pages in order, following these page pointers.

pages are also doubly-linked at each index level of allpages-locked tables, and the leaf level of indexes on data-only-locked tables. if an allpages-locked table is partitioned, there is one page chain for each partition.

another difference between allpages-locked tables and data-only-locked tables is that data-only-locked tables use fixed row ids. this means that row ids (a combination of the page number and the row number on the page) do not change in a data-only-locked table during normal query processing.

row ids change only when one of the operations that require data-row copying is performed, for example, during reorg rebuild or while creating a clustered index.

for information on how fixed row ids affect heap operations, see "deleting from a data-only locked heap table" and "data-only-locked heap tables".
select operations on heaps
when you issue a select query on a heap, and there is no useful nonclustered index, adaptive server must scan every data page in the table to find every row that satisfies the conditions in the query. there may be one row, many rows, or no rows that match.
allpages-locked heap tables
for allpages-locked tables, adaptive server reads the first column in sysindexes for the table, reads the first page into cache, and follows the next page pointers until it finds the last page of the table.
data-only locked heap tables
since the pages of data-only-locked tables are not linked in a page chain, a select query on a heap table uses the table's oam and the allocation pages to locate all the rows in the table. the oam page points to the allocation pages, which point to the extents and pages for the table.
inserting data into an allpages-locked heap table
when you insert data into an allpages-locked heap table, the data row is always added to the last page of the table. if there is no clustered index on a table, and the table is not partitioned, the sysindexes.root entry for the heap table stores a pointer to the last page of the heap to locate the page where the data needs to be inserted.

if the last page is full, a new page is allocated in the current extent and linked onto the chain. if the extent is full, adaptive server looks for empty pages on other extents being used by the table. if no pages are available, a new extent is allocated to the table.
conflicts during heap inserts
one of the severe performance limits on heap tables that use allpages locking is that the page must be locked when the row is added, and that lock is held until the transaction completes. if many users are trying to insert into an allpages-locked heap table at the same time, each insert must wait for the preceding transaction to complete.

this problem of last-page conflicts on heaps is true for:
single row inserts using insert

multiple row inserts using select into or insert...select, or several insert statements in a batch

bulk copy into the table



some workarounds for last-page conflicts on heaps include:
switching to datapages or datarows locking

creating a clustered index that directs the inserts to different pages

partitioning the table, which creates multiple insert points for the table, giving you multiple "last pages" in an allpages-locked table



other guidelines that apply to all transactions where there may be lock conflicts include:
keeping transactions short

avoiding network activity and user interaction whenever possible, once a transaction acquires locks


inserting data into a data-only-locked heap table
when users insert data into a data-only-locked heap table, adaptive server tracks page numbers where the inserts have recently occurred, and keeps the page number as a hint for future tasks that need space. subsequent inserts to the table are directed to one of these pages. if the page is full, adaptive server allocates a new page and replaces the old hint with the new page number.

blocking while many users are simultaneously inserting data is much less likely to occur during inserts to data-only-locked heap tables. when blocking occurs, adaptive server allocates a small number of empty pages and directs new inserts to those pages using these newly allocated pages as hints.

for datarows-locked tables, blocking occurs only while the actual changes to the data page are being written; although row locks are held for the duration of the transaction, other rows can be inserted on the page. the row-level locks allow multiple transaction to hold locks on the page.

there may be slight blocking on data-only-locked tables, because adaptive server allows a small amount of blocking after many pages have just been allocated, so that the newly allocated pages are filled before additional pages are allocated.
if conflicts occur during heap inserts
conflicts during inserts to heap tables are greatly reduced for data-only-locked tables, but can still take place. if these conflicts slow inserts, some workarounds can be used, including:
switching to datarows locking, if the table uses datapages locking

using a clustered index to spread data inserts

partitioning the table, which provides additional hints and allows new pages to be allocated on each partition when blocking takes place


deleting data from a heap table
when you delete rows from a heap table, and there is no useful index, adaptive server scans the data rows in the table to find the rows to delete. it has no way of knowing how many rows match the conditions in the query without examining every row.
deleting from an allpages-locked heap table
when a data row is deleted from a page in an allpages-locked table, the rows that follow it on the page move up so that the data on the page remains contiguous.
deleting from a data-only locked heap table
when you delete rows from a data-only-locked heap table, a table scan is required if there is no useful index. the oam and allocation pages are used to locate the pages.

the space on the page is not recovered immediately. rows in data-only-locked tables must maintain fixed row ids, and need to be reinserted in the same place if the transaction is rolled back.

after a delete transaction completes, one of the following processes shifts rows on the page to make the space usage contiguous:
the housekeeper process

an insert that needs to find space on the page

the reorg reclaim_space command


deleting the last row on a page
if you delete the last row on a page, the page is deallocated. if other pages on the extent are still in use by the table, the page can be used again by the table when a page is needed.

if all other pages on the extent are empty, the entire extent is deallocated. it can be allocated to other objects in the database. the first data page for a table or an index is never deallocated.
updating data on a heap table
like other operations on heaps, an update that has no useful index on the columns in the where clause performs a table scan to locate the rows that need to be changed.
allpages-locked heap tables
updates on allpages-locked heap tables can be performed in several ways:
if the length of the row does not change, the updated row replaces the existing row, and no data moves on the page.

if the length of the row changes, and there is enough free space on the page, the row remains in the same place on the page, but other rows move up or down to keep the rows contiguous on the page.

the row offset pointers at the end of the page are adjusted to point to the changed row locations.

if the row does not fit on the page, the row is deleted from its current page, and the "new" row is inserted on the last page of the table.

this type of update can cause a conflict on the last page of the heap, just as inserts do. if there are any nonclustered indexes on the table, all index references to the row need to be updated.


data-only-locked heap tables
one of the requirements for data-only-locked tables is that the row id of a data row never changes (except during intentional rebuilds of the table). therefore, updates to data-only-locked tables can be performed by the first two methods described above, as long as the row fits on the page.

but when a row in a data-only-locked table is updated so that it no longer fits on the page, a process called row forwarding performs the following steps:
the row is inserted onto a different page, and

a pointer to the row id on the new page is stored in the original location for the row.



indexes do not need to be modified when rows are forwarded. all indexes still point to the original row id.

if the row needs to be forwarded a second time, the original location is updated to point to the new page--the forwarded row is never more than one hop away from its original location.

row forwarding increases concurrency during update operations because indexes do not have to be updated. it can slow data retrieval, however, because a task needs to read the page at the original location and then read the page where the forwarded data is stored.

forwarded rows can be cleared from a table using the reorg command.

for more information on updates, see "how update operations are performed".
发表评论 共有条评论
用户名: 密码:
验证码: 匿名发表