Monday, March 17, 2008

Important and advanced concepts in SQL Server 2005

This article discusses some of the most important and advanced concepts in SQL Server and how one can improve the performance of SQL Server using the best practices.

CLR integration
Any code that runs under the Common Language Runtime (CLR) hood is managed code. The CLR is the core of the .NET environment providing all necessary services for the execution of the managed code. SQL Server 2005 has a tight integration with the CLR which enables the developer to create stored procedures, triggers, user defined functions, aggregates and user defined types using the managed code.

T-SQL suits best when you need to perform little procedural logic and access data from the server. If your data needs to undergo complex logic then the better option is using managed code. When working on data intensive operations, working in T-SQL would be an easy approach, but T-SQL lacks the ease of programming. You might end up in lot of lines of coding in trying to simulate operations that are specific to character, string operations, arrays, collections, bit shifting and so forth. While working with mathematical operations and regular expressions, you might need a language that provides an easy, clean yet powerful way of handling such operations. If you encounter situation where you need to perform such operations using T-SQL, then it is really going to be annoying.

Integrating DML operations with managed code also helps bifurcate logic into classes and namespaces, which is somewhat similar to what we have schemas in the database. Now saying this, it should be understood that integrating CLR into SQL Server does not replace the business tier of your application. The benefits of integrating the CLR with SQL Server include:

The T-SQL statements that you execute actually run on the server end. But at times when you want to distribute the load between the client and server, you could go with the managed code. So using managed code you could perform critical logic operations in client side so that the server could be busy only with data intensive operations.
The fact that SQL Server provides you extended stored procedures to avail certain system related functions from your T-SQL code, but at the same time you may have to compromise with the integrity of the server. When it comes to managed code, it provides type safety, effective memory management and better synchronization of services which is integrated tightly with the CLR and, hence, the SQL Server 2005. So this means that integrating CLR with SQL Server provides a scalable and safer means for accomplishing tasks which are tougher or almost impossible using T-SQL.
.NET Framework provides a rich support for handling XML based operations from managed code; although realizing the fact the SQL Server supports XML based operations, you could perform such operations using .NET with little effort when compared to using T-SQL scripts.
Nested transactions in T-SQL have limitations when dealing with look back connections, whereas this could be better achieved using managed code by setting the attribute "enlist=false" in the connection string.
When working T-SQL you may not be able to fetch rows which form the middle of the operation from a result set until the execution gets finished. This is termed as pipelining of results which could be achieved with CLR integration.
If you could check your database configuration you could notice that the CLR Integration is turned off by default. Enabling or disabling of CLR integration could be done by setting the "clr enabled" option to 1 or 0. Once the CLE integration is disabled, all the executing CLR procedures are unloaded across all application domains. To turn it on you need to use the following.

Listing 1
EXEC sp_configure 'clr enabled', 1;

The RECONFIGURE statement ensures that you need not restart the server with the change in the configuration. But if one among the several configuration options fails, none of the configured values would take effect.

But to configure the above you require ALTER SETTINGS server level permission to enable it. Again, for achieving that you need to be a member of serveradmin and sysadmin roles.

Using RECONFIGURE ensures that the update in the configuration does not need server restart. The configured values do not take effect in case you reconfigure several options in one time and one of the option fail.

Best Practices to Improve Performance
Performance factor is a measure of the amount of response time that you get upon any operation that you perform against the server. Modern databases are designed in a way that they would not halt the business with increasing load. But, the performance factor of the database in an enterprise project is usually given a low priority in the initial stages of design. Poor database design may lead to slow running transactions, excessive blocking, poor resource balancing and so forth which could cost excess amount of time and money to maintain.

So why do we need to care about performance any way? Better performance provides faster transactions and good scalability. This would cause more batch processing jobs to be done in less time with a low down time. Increased performance would help to gain better response time for the users and provide faster services even on increased load operations. Performance factor should be considered from the day we start designing our database. As the complexity of the design increases, it becomes harder and harder to pull out the design issues in order to get a better performance.

There are many techniques where we could monitor and improve the performance, but we shall limit it now to certain tips that will help us to fine tune the database.

Row versioning-based isolation levels
SQL Server 2005 introduces a feature called Row Level Versioning which allows effective management of concurrent access to the data while maintaining the consistency of data. Usually an isolation level decides an extent on how the modified data is isolated from other. RLV benefits when data is accessed across isolation levels because they help to eliminate the read operation locks, thus improving the read concurrency. The fact is that the read operations do not require shared locks on the data when it is running on isolation levels with RLV, this eventually does not block other requests accessing the same data, and that is how the locking resources are minimized. On the other hand, when it comes to write operations, two write requests cannot modify the same data at the same time.

Triggers working on INSERT and DELETE operations change the versions of the rows. So triggers that modify data will benefit from RLV. The rows of the result set is versioned, when an INSERT, DELETE or UPDATE statement is issued prior the data was accessed using the SELECT statement.

Transactions greatly affect the data when you perform CRUD operations. Transactions may also be executed in batch with many requests operating on a single or a row set. So when a transaction modifies a row value the previous committed row value is stored as version in tempdb.

By setting the READ_COMMITTED_SNAPSHOT and ALLOW_SNAPSHOT_ISOLATION options to ON, logical copies are made on the modified data by the transactions, and a transaction sequence number is assigned to every transaction that operates on the data using Row Level Versioning. The transaction sequence number is automatically incremented each time the BEGIN TRANSACTION statement is executed.

So the changes to the row are marked with transaction sequence numbers. These TSN's are linked with the newer rows that reside in the current database. Now, the TSN's are monitored periodically and numbers with least use are deleted from time to time by the database. So it is up to the database which actually decides how long the row versions have to be stored in the tempdb database.

Now, the transaction sequence numbers are tracked periodically and transaction sequence numbers with least use are deleted from time to time. As a matter of fact, the read operations do not require shared locks on the data when it is running on isolation levels with Row Level Versioning, which eventually does not block other readers or writers accessing the same data, as a result the locking resources are minimized. On the other hand, when it comes to write operations, two writers cannot modify the same data at the same time.

The READ_COMMITTED_SNAPSHOT and ALLOW_SNAPSHOT_ISOLATION options should be turned on in order that the transaction isolation levels such as READ_COMMITTED and SNAPSHOT make use of the RLV system. The read committed isolation level supports distributive transactions unlike the snapshot which does not. The temporary database "tempdb" is extensively used by the SQL Server to store its temporary result sets. All the versions are stored in the tempdb database. Once the database has exceeded its maximum space utilization the update operations stops generating versions. The applications that leverage the row committed level transactions does not even need to be re-factored for enabling the RLV and also it consumes less storage space of the tempdb database, and for these reasons the read committed isolation level is preferred over the snapshot isolation.

Row Level Versioning help in situations where an applications lot of insert and update operations on the data and at the same time a bunch of reports are accessing in parallel. It could also prove beneficial if your server is experiencing relatively high deadlocks. Also for systems performing mathematical computation, they require accurate precision and RLV gives a greater amount of accuracy for such kind of operations.

Error handling
Error handling was a pretty tough job in the earlier versions of SQL Server. Developers had to perform lot of conditional checks for the error code returned after each INSERT, UPDATE or DELETE operations. Developers must check the @@ERROR attribute each time they might see the possibility of an error based operations. Error messages could be generated either from the SQL Server or thrown explicitly by the user. Let us first see how developers usually perform the error handling in SQL Server 2000. We shall have a stored procedure for our demonstration. Let us use AdventureWorks database for our demonstration purpose.

Listing 2
INSERT INTO [HumanResources].[Employee] ([NationalIDNumber], [ContactID],
[LoginID], [ManagerID], [Title], [BirthDate], [MaritalStatus], [Gender],
[HireDate], [SalariedFlag], [VacationHours], [SickLeaveHours], [CurrentFlag],
[rowguid], [ModifiedDate])
VALUES ('1441784507', 120439, 'adventure-works\guy43', 13456, 'Production
Technician - WC60', '19720515', 'M', 'M', '19960731', 0, 21, 30, 1,
'AAE1D04A-C237-4974-B4D5-735247737718', '20040731')
INSERT INTO [HumanResources].[EmployeeAddress] ([EmployeeID], [AddressID],
[rowguid], [ModifiedDate])
VALUES (@EmpID, 61, '77253AEF-8883-4E76-97AA-7B7DAC21A2CD',
'20041013 11:15:06.967')

When we try to execute this procedure we get the output as:

Listing 3
Msg 547, Level 16, State 0, Procedure ErrorHandlerDemo, Line 11The INSERT statement conflicted with the FOREIGN KEY constraint "FK_Employee_Contact_ContactID." The conflict occurred in database "AdventureWorks" table "Person.Contact" column "ContactID."

The statement has been terminated.

As you could see, the error message has Msg, Level of severity, State and Line. The “Msg” holds the error number generated for the message, in this case 547. All the error messages are defined in the table called sys.messages. If you are going for custom error handling then you can utilize the sp_addmessage system procedure to implement new error messages.

Next, you have the severity “Level” of the message. Severity codes lie in the range of 0 to 25. Any error message above 20 will terminate the connection. Error messages from 17 to 20 specify a resource problem, from 11 to 16 specify error messages in the T-SQL scripts, and error messages below 11 specify warnings.

Next we have the "State" of the error message. This is an arbitrary integer range falling in between 0 to 127. This provides information on the source that has issued the error message. However, there is not much documentation disclosed by Microsoft on this.

Next is the "Line" number which tells us the line where the error has occurred in the procedure or T-SQL batch. And the last one is the message itself.

The understanding and implementation of the error handling in the earlier versions of SQL Server 2005 was fair enough, but was with a lot of housekeeping activity. SQL Server 2005 provides flexible means to handle error handling mechanism by using the TRY and CATCH blocks.

The syntax looks like:

Listing 4
--perform insert update and delete statements
END CATCHSo as you could see from the above script, the error handling mechanism is simplified. When an error occurs, the statement is terminated from the current execution point and it enters the catch block. The functions following the PRINT statements are built-in functions that provide information on the error message. You could also embed the code from start to end in a stored procedure and call it wherever you need. You can also log the error messages in a table for debugging purpose. AdventureWorks database handles error handling in similar manner; you could find procedures uspLogError and uspPrintError which do the job.

You could also use the RAISERROR to define you own custom error messages. The RAISERROR may take a system error code or a user defined error code which would be eventually fired by the server to the connected application or within the try..catch block. The example of using RAISERROR is:

Listing 5
-- Message text.
'A serious error has terminated the program. Error message is %s, Error code is %d.',
-- Severity
-- State
-- Argument 1.
'...statement conflicted...',
-- Argument 2.
52000); The output of this RAISERROR would look like:

A serious error has terminated the program. Error message is ...statement conflicted..., Error code is 52000.

Next time when you are working with T-SQL code, you need not really worry about implementing numerous checks for errors. The TRY..CATCH feature helps with a better approach for error handling which would minimize the size of your code, thus improving readability.

Efficient Concurrency Control
Concurrency could be defined as an ability of multiple sessions to access a shared data at the same time. Concurrency comes in to picture when a request is trying to read data and the process prevents the other requests to change the same data or vice versa.

The RLV discussed above allows concurrent access automatically with no additional application control to avail this feature. Now any relational database could support multiple and simultaneous connections to the database. The job of handling concurrencies between the requests is usually handled by the server. SQL Server internally takes care of the blocking issues between two or more processes. But sometimes it may be necessary to take over some part of the control over the concurrent access to maintain the balance between data consistency and concurrency.

There are two kinds of concurrency control: Optimistic concurrency control and pessimistic concurrent control. SQL Server has a pessimistic concurrency model by default. So by default, other transactions could not read the data until the current session commits, which in this case is a writer block. Locking could prove a good choice for many of today’s database systems, but it may also introduce blocking issues. If the results are based on only the committed data then the only option left is to wait until changes are committed.

To put it in a straight forward manner, a pessimistic concurrency control the system is pessimistic. It assumes that a conflict will arise when a read operation is requested over the data modification of other users. So in this case locks are imposed to ensure that the access to the data is blocked which is being used by other session.

But in the case of optimistic concurrency, it works with an assumption that any request could modify data which is being currently read by another request. This is where the row level versioning is being used which checks the state before accessing the modified data.

Best Practices to handle Queries
One of the common mistakes developers often perform is to execute T-SQL statements directly from their application. Worse, the performance may get degraded if the combinations of operators, such as LIKE and NOT LIKE, are used with the statements. It is always a good practice to use stored procedures rather than stuffing queries in your application or a web page. Stored procedures help improve the performance since they are precompiled.

Use minimal string operations as are often costly, so do not use them in a JOIN condition. Using implicit or explicit functions in the where clause might impact your server. Also, using complex business logic in triggers is yet another performance issue. When you are working with transactions, always use isolation levels. Proper utilization of isolation levels will help reduce locking and also avoid dirty reads and writes.

If possible, avoid using CURSOR's. The other way around is to use temporary tables with WHILE statements and break complex queries into many temporary tables and later joining them. Also when you are working with large tables, select only those rows or columns that are needed in the result set. Unnecessary inclusion on columns and rows will congest the network traffic which is, again, a bottleneck for performance.

Index Considerations
Create indexes only when they are really required because SQL Server needs to arrange and maintain records for each index that you define. To make sure that you are creating for the right purpose, you can create indexes on columns which are used in WHERE condition and ORDER BY, GROUP BY and DISTINCT clauses. Indexes which are not used may cause extra overhead. Also, it is always recommended to have smaller cluster indexes and moreover define a data range for the cluster indexes that you maintain. Once you define a column as foreign key, it is a good practice to create an index on it. You can also use the index tuning wizard to check the index performance and be sure to remove unused indexes.

Best Practices in handling Transactions
It is advisable not to have transactions that run for a long time. If you need to access large data in a transaction that needs to be sent to the client, then you can have that operation at the end of the transaction. Transactions that require user input to commit are also a degrade factor, ensure that explicit transactions are wither committed or rollback at some point of time. Also, you could find a boost in the performance if the resources are accessed in the same order. Proper utilization of isolation levels helps minimize locking.

Efficient Design Considerations
The way you design your database impacts greatly on the performance of SQL Server. When you are working with tables, always use proper data types for the columns. If your data has very large chunks of characters then you can go with text data type. Check if proper primary and foreign key relationships are defined across various tables. Make a practice of normalizing your database first and then work around de-normalizing for improving any performance. You may use indexed views for de-normalization purpose. Analysis job usually takes more of the system resources, so it is recommended to use separate servers for Analysis and Transaction processing jobs.

Best practices in using Stored Procedures
Do not use prefix in your stored procedures, i.e., do not prefix them with sp_. Microsoft ships system procedures which are prefixed with sp_. So if you prefix your procedures with sp_, SQL Server will first search in the master database for the procedure and then in your application database. Again, this is a bottleneck.

Always use exception handling if you are working with transaction based procedures. Proper error handling ensures security and provides a better approach of what to do when an unexpected error occurs.

If you do not want your client application to check the rows affected for an operation, then it is advisable to use SET NOCOUNT ON in your stored procedures. Not using this would send the number of rows affected to the client application or ADO/ADO.NET. The client application would further work on this result through the command or connection objects. This could cause extra overhead on both the client and server.

Efficient Execution Considerations
When trying to execute your T-SQL code, do not perform index or table scans. You can have index seek instead of index scan. Also, try to evaluate hash joins, filters, sort conditions and bookmarks. A better decision could be made on the execution strategy by observing the execution plan. When working with dynamic SQL, things may not work as anticipated during execution. Although it is not a great idea having your queries dynamically built, in some cases they help reduce the code when the SQL expression needs to be built upon many decisions. In such cases you can always use sp_executesql. Also, while working with stored procedures it is advised not to mix up the DML and DDL statements.

Best practices in deployment
Set your database size initially instead of allowing it to grow automatically. To minimize disk reads and writes, you may create the log file and tempdb into separate devices from the data. You can utilize RAID configuration with multiple disk controllers if the database performs large data warehouse operations. Have an optimal memory for your server and perform index fragmentation as and when needed. You can go with the automatic database shrink option to manage unwanted space. However, it is recommended that you use default server configuration for your application.

Some of the common mistakes that are usually noticed and should be avoided include:

Usage of GUIDs at times wherever not necessary and using GUIDs as primary key.
Having a clustered index GUID primary key makes a row bigger and it also degrades the performance as it would make every non-clustered index bigger.
Not being rational on the usage of data types, for example usage of larger data types where smaller data types would suffice the purpose.
No proper attention to remove missing or unused indexes. Poor decision making capabilities to identify the usage of clustered and non-clustered indexes.
Un-utilized columns and rows which may cause excessive data density.
Execution plan is ignored while writing queries. Each query adds up to extra time for processing.
This article looked at some of the advanced concepts of SQL Server, such as CLR integration, Row versioning-based isolation levels, TRY...CATCH Error Handling, Concurrency Control and the techniques that can be followed to improve database performance.

No comments: