Microsoft SQL Server reporting database configuration

Posted at — Sep 23, 2010

A check-list for using Microsoft SQL Server as a reporting database. This is a ‘be aware’ list for myself in first place, since I forget some details as time passes.

Overall database settings

Use simple recovery mode<

Set up simple recovery mode for database to minimize performance issues with transaction log records. Usually you can use daily backups with DWH, since data is loaded daily. In more complex situations you can use ETL-based ‘checkpoints’, but in general, daily backups are more than enough.

Separate log and data io channels

Use decent I\O systems and separate data and log I\O channels. This is obvious, just reiterating.

RAID 1+0 seems to be preferable choice of disk configuration for DWH loads. See this entry, this one or test it yourself )

2008 r2

Use SQL Server 2008r2 Enterprise if you have a choice. 2008 adds table\index compression (which is great) and get’s partitioning to work way better in queries. And I’ve 20% performance speed-up against 2005 on the same query set (with same execution plans and that surprised me).

Auto shrink Off

Set Auto shrink off, it creates unpredictable I\O strain. Shrink database daily, just after backup in the same maintenance task.

Table configuration

identity columns as primary keys<

Use dentity columns as table primary keys. And using this column as clustered index will help I\O in most cases.

Set up foreign keys

Again, obvious, but I’ve seen this violated more than once. Use foreign keys for fact-dimension table reference. This will help optimizer a lot.

You can disable these keys to speed up inserts, but in 90% cases it’s not needed.

Table partitioning

Always use partitioned tables (even fake, with 1 partition) to:

achieve partition elimination during query execution. Especially in Ms SQL 2008
use partition switching for data load (check this article as well)

Table compression in 2008

SQL Server 2008 introduces table compression which helps to reduce I\O (but increases CPU load), which is very good for reporting tasks. Use it for fact tables.

ETL

Partition switching as default load strategy

More benefits of partition switching for loading:

while you’re loading data in ’new_data’ partition, base table is accessible and consistent
after you’ve prepared ’new_data’ partition you can check it’s data for quality (planning to write about later)

You can implement switching by creating another temporary partition, which I like to call ’last load’. So you switch:

current -> last_load * new_data -> current in one transaction

After that you have ’last_load’ to:

switch back in emergency
check data from current\previous load for some more quality checks

Filtered indexes for calculations

Another 2008-only feature is called filtered indexes. These indexes can be essentially smaller than common ones if you know for sure what region of data you’ll be accessing. And this turns out to be very efficient for SQL code you’re using for different calculations. Since I mostly work with Oracle Data Integrator as an ETL tool and it’s an SQL-generating machine: I get up to 10 times calculation speed up by building small, calculation-specific filtered indexes.

Tablock hint

All inserts should be done with TABLOCK hint, which speeds up insert by acquiring a full table lock during the whole transaction. Since in DWH main inserts are performed by ETL there should be no conflicts. TABLOCK also speeds up logging during bcp.

Query tuning

Indexes

Really a vast topic, but to sum it up ) :

use covering indexes. It seems better to have a ‘wide index’ than a number of indexes on all dimensions and to apply index intersection.
columns in indexes should be ordered by selectivity

Be very-very careful, ‘your mileage may vary’:

creating a huge number of large indexes may actually slow system down due to I\O overload
you should try to create indexes that can help a number of reports\query instead of tuning a single request

See Books session below, skip through at least one of them before starting it out wild.

Filtered indexes

As with ETL calculations — filtered indexes may be used for tuning specific queries.

Index compression

Indexes can be compressed in the same way as tables (which is obvious given that a table is a clustered index) and this speeds queries by reducing IO

Books to read

I’d recommend following:

Professional SQL Server 2005 Performance Tuning
SQL Server Execution Plans
SQL Server 2008 Query Performance Tuning Distilled (Expert’s Voice in SQL Server)
And if you have time and iron will : Relational Database Index Design and the Optimizers

Applied Dimensionality