Readme_Data Warehouse Tuning Techniques Sample
11/05/2008 21:36:06This sample works only with SQL Server 2005 and SQL Server 2008. It will not work with any version of SQL Server earlier than SQL Server 2005.
This sample demonstrates techniques for tuning data warehouse queries to improve query performance. In most cases, the query optimizer creates high-quality query plans. However, for specific query patterns, you can improve the query plan by manually tuning the query. Here, we present three such scenarios and their tuning techniques.
The run times cited below were recorded by using an AMD Athlon 64 X2 Dual-Core 4200 CPU with 4 GB of RAM, running Windows Server 2008 and SQL Server 2008. The queries were run from SQL Server Management Studio.
Languages
Transact-SQLPrerequisites
Before you run this sample, install the following software.- SQL Server 2008 and the following components:
- The Database Engine
- SQL Server Management Studio
- Database Engine samples for SQL Server 2008. The default installation directory is C:\Program Files\Microsoft SQL Server\100\Samples\Engine\Query Processing\Data Warehouse Tuning Techniques.
- AdventureWorksDW2008 sample database for SQL Server 2008. For example, if you are using the x64 architecture, you can download SQL2008.AdventureWorksAllDatabases.x64.msi to install AdventureWorksDW2008 . For more information about installing the samples, see Considerations for Installing SQL Server Samples and Sample Databases in SQL Server 2008 Books Online.
Running the Sample
Prepare the Database
The examples included with the Data Warehouse Tuning sample are designed to operate on a modified version of AdventureWorksDW2008 . The original sample database has small fact tables in comparison to data warehousing sizes. Execute the following step one time to create the new FactInternetSalesBig fact table. This is approximately 1000 times larger than FactInternetSales .- Run Prepare.sql. Note that, depending on your machine performance, this might take around one hour or more (run time on test machine was 54 minutes). Note that the extended database will need around 25 GB of drive space.
Example 1: Tune Join Into Semi-Join
This example tunes a query by converting the original join into an explicit semi-join using an EXISTS subquery. Note that the DatesRange common expression returns a single row and uses the fact table only to check whether sales exist for specific dates. A join is not needed to achieve the correct query results; a semi-join is sufficient. However, the query optimizer does not replace this join with a semi-join because the replacement does not apply in all scenarios. Some aggregates (for example, SUM) allow the replacement and some aggregates (for example, COUNT) do not.Using EXISTS will avoid the expensive inner join, so that for each date row, only the existence of a single fact row will be checked. To demonstrate this, follow these steps:
- Open SemiJoin.sql in SQL Server Management Studio.
- Execute Original Query. Notice the time it takes to execute the query. On the test machine, the query took 2 minutes, 8 seconds.
- Execute Tuned Query. Notice the time it takes to execute the query. The execution time should be significantly less than that of the original query. On the test machine, the query took 1 minute, 1 second.
Example 2: Split Distinct Aggregates
The Original Query of this sample generates a query plan with a multi-consumer spool, shared by the aggregate function and by the distinct operator. A spool is expensive and can be avoided by breaking up the query into a plan with two SELECT statements. This results in serial GROUP BY operators and avoids the spool and join. To demonstrate this, follow these steps:- Open CountDistinct.sql in SQL Server Management Studio.
- Execute Original Query. Notice the time it takes to execute the query. On the test machine, the query took 3 minutes, 5 seconds.
- Execute Tuned Query. Notice the time it takes to execute the query. The execution time should be significantly less than that of the original query. On the test machine, the query took 1 minute, 18 seconds.
Example 3: Reuse Common Expressions
The original query of this sample generates a query plan with equivalent sub-branches due to the reuse of common expressions. To avoid such duplicate plan branches, a temporary table can be used to store the intermediate result of the common expression. To demonstrate this, follow these steps:- Open CommonExpression.sql in SQL Server Management Studio.
- Execute Original Query. Notice the time it takes to execute the query. On the test machine, the query took 2 minutes, 10 seconds.
- Execute Tuned Query. Notice the time it takes to execute the query. The execution time should be significantly less than the original query. On the test machine, the query took 1 minute, 18 seconds. © 2008 Microsoft Corporation. All rights reserved.