October 07, 2005

SQL Performance Tuning using Indexes

1. Useful Index Queries

Just like the reader searching for a word in a book, an index helps when you are
looking for a specific record or set of records with a WHERE clause. This includes
queries looking for a range of values, queries designed to match a specific value,
and queries performing a join on two tables. Since index entries are stored in
sorted order, indexes also help when processing ORDER BY clauses. Grouping records
with a GROUP BY clause will often require sorting.

2. Index Drawbacks

Indexes are a performance drag when the time comes to modify records. Any time
a query modifies the data in a table the indexes on the data must change also.
Static systems, where databases are used heavily for reporting, can afford more
indexes to support the read only queries. A database with a heavy number of transactions
to modify data will need fewer indexes to allow for higher throughput. Indexes
also use disk space. The exact size will depends on the number of records in the
table as well as the number and size of the columns in the index. Generally this
is not a major concern as disk space is easy to trade for better performance.

3. Building The Best Index

Short Keys

Larger index keys will cause the database to perform more disk reads, which
limits throughput. Secondly, since index entries are often involved in comparisons,
smaller entries are easier to compare. A single integer column makes the absolute
best index key because an integer is small and easy for the database to compare.
Character strings, on the other hand, require a character-by-character comparison
and attention to collation settings.


Distinct Keys

The most effective indexes are the indexes with a small percentage of duplicated
values. An index with a high percentage of unique values is a selective index.
Obviously, a unique index is highly selective since there are no duplicate entries.

As an analogy, think of a phone book for a town where almost everyone has the
last name of Smith. A phone book in this town is not very useful if sorted in
order of last name, because you can only discount a small number of records
when you are looking for a Smith.


Covering Queries

Indexes generally contain only the data values for the columns they index and
a pointer back to the row with the rest of the data. Generally the database
will have to follow pointers from an index back to a row to gather all the information
required for a query. However, if the index contains all of he columns needed
for a query, the database can save a disk read by not returning to the table
for more information. The database could use just the index entries to satisfy
the following query.

Eg: SELECT Count(*), UnitPrice FROM Products GROUP BY UnitPrice (UnitPrice column
indexed)

We call these types of queries covered queries, because all of the columns requested
in the output are covered by a single index. For your most crucial queries,
you might consider creating a covering index to give the query the best performance
possible. Such an index would probably be a composite index (using more than
one column), which appears to go against our first guideline of keeping index
entries as short as possible. Obviously this is another tradeoff you can only
evaluate with performance testing and monitoring.


Clustered Indexes

Many databases have one special index per table where all of the data from a
row exists in the index. SQL Server calls this index a clustered index. As a
general rule of thumb, every non-trivial table should have a clustered index.
If you only create one index for a table, make the index a clustered index.
In SQL Server, creating a primary key will automatically create a clustered
index (if none exists) using the primary key column as the index key. Clustered
indexes are the most effective indexes (when used, they always cover a query).
When choosing the column or columns for a clustered index, be careful to choose
a column with static data. If you modify a record and change the value of a
column in a clustered index, the database might need to move the index entry
(to keep the entries in sorted order). Remember, index entries for a clustered
index contain all of the column values. For this reason, clustered indexes are
often found on primary or foreign key columns. Key values will rarely, if ever,
change.

No comments:

Post a Comment