Data Warehouse & Business Intelligence: T-SQL Data Types

All the data values of a column must be of the same data type. (The only exception specifies the values of the SQL_VARIANT data type.) Transact-SQL uses different data types, which can be categorized as follows:

Numeric data types
Character data types
Temporal (date and/or time) data types
Miscellaneous data types
DECIMAL with VARDECIMAL storage type

The following sections describe all these categories.

Numeric Data Types

Numeric data types are used to represent numbers. The following table shows the list of all numeric data types:

Data Type	Explanation
INTEGER	Represents integer values that can be stored in 4 bytes. The range of values is –2,147,483,648 to 2,147,483,647. INT is the short form for INTEGER.
SMALLINT	Represents integer values that can be stored in 2 bytes. The range of values is –32768 to 32767.
TINYINT	Represents nonnegative integer values that can be stored in 1 byte. The range of values is 0 to 255.
BIGINT	Represents integer values that can be stored in 8 bytes. The range of values is –2^63 to 2^63 – 1.
DECIMAL(p,[s])	Describes fixed-point values. The argument p (precision) specifies the total number of digits with assumed decimal point s (scale) digits from the right. DECIMAL values are stored, depending on the value of p, in 5 to 17 bytes. DEC is the short form for DECIMAL.
NUMERIC(p,[s])	Synonym for DECIMAL.
REAL	Used for floating-point values. The range of positive values is approximately 2.23E – 308 to 1.79E +308, and the range of negative values is approximately –1.18E – 38 to –1.18E + 38 (the value zero can also be stored).
FLOAT[(p)]	Represents floating-point values, like REAL. p defines the precision with p < 25 as single precision (4 byte) and p >= 25 as double precision (8 byte).
MONEY	Used for representing monetary values. MONEY values correspond to 8-byte DECIMAL values and are rounded to four digits after the decimal point.
SMALLMONEY	Corresponds to the data type MONEY but is stored in 4 bytes.

Character Data Types

There are two general forms of character data types. They can be strings of single-byte characters or strings of Unicode characters. (Unicode uses several bytes to specify one character.) Further, strings can have fixed or variable length. The following character data types are used:

Data Type	Explanation
CHAR[(n)]	Represents a fixed-length string of single-byte characters, where n is the number of characters inside the string. The maximum value of n is 8000. CHARACTER(n) is an additional equivalent form for CHAR(n). If n is omitted, the length of the string is assumed to be 1.
VARCHAR[(n)]	Describes a variable-length string of single-byte characters (0 < n ≤ 8000). In contrast to the CHAR data type, the values for the VARCHAR data type are stored in their actual length. This data type has two synonyms: CHAR VARYING and CHARACTER VARYING.
NCHAR[(n)]	Stores fixed-length strings of Unicode characters. The main difference between the CHAR and NCHAR data types is that each character of the NCHAR data type is stored in 2 bytes, while each character of the CHAR data type uses 1 byte of storage space. Therefore, the maximum number of characters in a column of the NCHAR data type is 4000.
NVARCHAR[(n)]	Stores variable-length strings of Unicode characters. The main difference between the VARCHAR and the NVARCHAR data types is that each NVARCHAR character is stored in 2 bytes, while each VARCHAR character uses 1 byte of storage space. The maximum number of characters in a column of the NVARCHAR data type is 4000.

Note: The VARCHAR data type is identical to the CHAR data type except for one difference: if the content of a CHAR(n) string is shorter than n characters, the rest of the string is padded with blanks. (A value of the VARCHAR data type is always stored in its actual length.)

Temporal Data Types

Transact-SQL supports the following temporal data types:

DATETIME
SMALLDATETIME
DATE
TIME
DATETIME2
DATETIMEOFFSET

The DATETIME and SMALLDATETIME data types specify a date and time, with each value being stored as an integer value in 4 bytes or 2 bytes, respectively. Values of DATETIME and SMALLDATETIME are stored internally as two separate numeric values. The date value of DATETIME is stored in the range 01/01/1753 to 12/31/9999. The analog value of SMALLDATETIME is stored in the range 01/01/1900 to 06/06/2079. The time component is stored in the second 4-byte (or 2-byte for SMALLDATETIME) field as the number of three-hundredths of a second (DATETIME) or minutes (SMALLDATETIME) that have passed since midnight.

The use of DATETIME and SMALLDATETIME is rather inconvenient if you want to store only the date part or time part. For this reason, SQL Server 2008 introduces the new data types DATE and TIME, which store just the DATE or TIME component of a DATETIME, respectively. The DATE data type is stored in 3 bytes and has the range 01/01/0001 to 12/31/9999. The TIME data type is stored in 3–5 bytes and has an accuracy of 100 nanoseconds (ns).

The DATETIME2 data type is also a new data type that stores high-precision date and time data. The data type can be defined for variable lengths depending on the requirement. (The storage size is 6–8 bytes). The accuracy of the time part is 100 ns. This data type isn't aware of Daylight Saving Time.

All the temporal data types described thus far don't have support for the time zone. The new data type called DATETIMEOFFSET has the time zone offset portion. For this reason, it is stored in 6–8 bytes. (All other properties of this data type are analogous to the corresponding properties of DATETIME2.)

The date value in Transact-SQL is by default specified as a string in a format like 'mmm dd yyyy' (e.g., 'Jan 10 1993') inside two single quotation marks or double quotation marks. (Note that the relative order of month, day, and year can be controlled by the SET DATEFORMAT statement. Additionally, the system recognizes numeric month values with delimiters of / or –.) Similarly, the time value is specified in the format 'hh:mm' and Database Engine uses 24-hour time (23:24, for instance).

Note: Transact-SQL supports a variety of input formats for DATETIME values. As you already know, both objects are identified separately; thus, date and time values can be specified in any order or alone. If one of the values is omitted, the system uses the default values. (The default value for time is 12:00 AM.)

Examples 4.4 and 4.5 show the different ways, how date or time values can be written using the different formats.

Example 4.4
The following date descriptions can be used:

'28/5/1959' (with SET DATEFORMAT dmy)
'May 28, 1959'
'1959 MAY 28'

Example 4.5
The following time descriptions can be used:

'8:45 AM'
'4 pm'

Miscellaneous Data Types

Transact-SQL supports several data types that do not belong to any of the data type groups described previously:

Binary data types
BIT
Large object data types
CURSOR (discussed in Chapter 8)
UNIQUEIDENTIFIER
SQL_VARIANT
TABLE (discussed in Chapter 8)
XML (discussed in Chapter 28)
Spatial (e.g., GEOGRAPHY and GEOMETRY) data types (discussed in Chapter 29 )
HIERARCHYID
TIMESTAMP data type
User-defined data types (discussed in Chapter 5)

The following sections describe each of these data types (other than those designated as being discussed in another chapter).

Binary and BIT Data Types

BINARY and VARBINARY are the two binary data types. They describe data objects being represented in the internal format of the system. They are used to store bit strings. For this reason, the values are entered using hexadecimal numbers.

The values of the BIT data type are stored in a single bit. Therefore, up to 8 bit columns are stored in 1 byte. The following table summarizes the properties of these data types:

Data Type	Explanation
BINARY[(n)]	Specifies a bit string of fixed length with exactly n bytes (0 < n ≤ 8000).
VARBINARY[(n)]	Specifies a bit string of variable length with up to n bytes (0 < n ≤ 8000).
BIT	Used for specifying the Boolean data type with three possible values: FALSE, TRUE, and NULL.

Large Object Data Types

Large objects (LOBs) are data objects with the maximum length of 2GB. These objects are generally used to store large text data and to load modules and audio/video files. Transact-SQL supports two different ways to specify and access LOBs:

Use the data types VARCHAR(MAX), NVARCHAR(MAX), and VARBINARY(MAX)
Use the so-called text/image data type

The following subsections describe the two forms of LOBs.

The MAX Specifier Starting with SQL Server 2005, you can use the same programming model to access values of standard data types and LOBs. In other words, you can use convenient system functions and string operators to work with LOBs.

Database Engine uses the MAX specifier with the data types VARCHAR, NVARCHAR, and VARBINARY to define variable-length columns. When you use MAX by default (instead of an explicit value), the system analyzes the length of the particular string and decides whether to store the string as a convenient value or as a LOB. The MAX specifier indicates that the size of column values can reach the maximum LOB size of the current system. (In a future version of SQL Server, it is possible that MAX will have a higher maximum value for strings.)

Although the database system decides how a LOB will be stored, you can override this default specification using the sp_tableoption system procedure with the LARGE_ VALUE_TYPES_OUT_OF_ROW option. If the option's value is set to 1, the data in columns declared using the MAX specifier will be stored separately from all other data. If this option is set to 0, Database Engine stores all values for the row size < 8060 bytes as regular row data.

In SQL Server 2008, you can apply the new FILESTREAM attribute to a VARBINARY(MAX) column to store large binary data directly in an NTFS file system. The main advantage of this attribute is that the size of the corresponding LOB is limited only by the file system volume size.

TEXT, NTEXT, and IMAGE Data Types The data types TEXT, NTEXT, and IMAGE constitute the so-called text/image data types. Data objects of the type IMAGE can contain any kind of data (load modules, audio/video), while data objects of the data types TEXT and NTEXT can contain any text data (that is, printable data).

The text/image data types are stored by default separately from all other values of a database using a B-tree structure that points to the fragments of that data. (A B-tree structure is a treelike data structure in which all of the bottom nodes are the same number of levels away from the root of the tree.) For columns of a text /image data type, Database Engine stores a 16-byte pointer in the data row that specifies where the data can be found.

If the amount of text/image data is less than 32KB, the pointer points to the root node of the B-tree structure, which is 84 bytes long. The root node points to the physical blocks of the data. If the amount of the data is greater than 32KB, Database Engine builds intermediate nodes between the data blocks and the root node.

For each table that contains more than one column with such data, all values of the columns are stored together. However, one physical page can hold only text/image data from a single table.

Although text/image data is stored separately from all other data, you can modify this using the sp_tableoption system procedure with the TEXT_IN_ROW option. Using this option, you can specify the maximum number of bytes, which are stored together with the regular data.

The text/image data types discussed this far have several limitations. You can't use them as local variables (in stored procedures or in groups of Transact-SQL statements). Also, they can't be a part of an index or can't be used in the following clauses of the SELECT statement: WHERE, ORDER BY, and GROUP BY. The biggest problem concerning all text/image data types is that you have to use special operators (READTEXT, WRITETEXT, and UPDATETEXT) to work with such data.

Note: The text/image data types are marked as a deprecated feature and will be removed in a future version of Database Engine. Use VARCHAR(MAX), NVARCHAR(MAX), and VARBINARY(MAX) instead.

UNIQUEIDENTIFIER Data Type

As its name implies, a value of the UNIQUEIDENTIFIER data type is a unique identification number stored as a 16-byte binary string. This data type is closely related to the globally unique identifier (GUID), which guarantees uniqueness worldwide. Hence, using this data type, you can uniquely identify data and objects in distributed systems.

The initialization of a column or a variable of the UNIQUEIDENTIFIER type can be provided using the functions NEWID and NEWSEQUENTIALID, as well as with a string constant written in a special form using hexadecimal digits and hyphens. (The functions NEWID and NEWSEQUENTIALID are described in the section "System Functions" later in this chapter.)

A column of the UNIQUEIDENTIFIER data type can be referenced using the keyword ROWGUIDCOL in a query to specify that the column contains ID values. (This keyword does not generate any values.) A table can have several columns of the UNIQUEIDENTIFIER type, but only one of them can have the ROWGUIDCOL keyword.

SQL_VARIANT Data Type

The SQL_VARIANT data type can be used to store values of various data types at the same time, such as numeric values, strings, and date values. (The only types of values that cannot be stored are TIMESTAMP values.) Each value of an SQL_VARIANT column has two parts: the data value and the information that describes the value. (This information contains all properties of the actual data type of the value, such as length, scale, and precision.)

Transact-SQL supports the SQL_VARIANT_PROPERTY function, which displays the attached information for each value of an SQL_VARIANT column. For the use of the SQL_VARIANT data type, see Example 5.5 in Chapter 5.

Note: Declare a column of a table using the SQL_VARIANT data type only if it is really necessary. A column should have this data type if its values may be of different types or if it is not possible to determine the type of a column during the database design process.

HIERARCHYID Data Type

The HIERARCHYID data type is used to store an entire hierarchy. It is implemented as a Common Language Runtime (CLR) user-defined type that comprises several system functions for creating and operating on hierarchy nodes. The following functions, among others, belong to the methods of this data type: GetAncestor(), GetDescendant(), Read(), and Write(). (The detailed description of this data type is outside the scope of this book.)

TIMESTAMP Data Type

The TIMESTAMP data type specifies a column being defined as VARBINARY(8) or BINARY(8), depending on nullability of the column. The system maintains a current value (not a date or time) for each database, which it increments whenever any row with a TIMESTAMP column is inserted or updated and sets the TIMESTAMP column to that value. Thus, TIMESTAMP columns can be used to determine the relative time when rows were last changed. (ROWVERSION is a synonym for TIMESTAMP.)

Note: The value stored in a TIMESTAMP column isn't important by itself. This column is usually used to detect whether a specific row has been changed since the last time it was accessed.

DECIMAL with VARDECIMAL Storage Format

The DECIMAL data type is typically stored on the disk as fixed-length data. Since SQL Server 2005 SP2, this data type can be stored as a variable-length column by using the new storage format called VARDECIMAL. Using VARDECIMAL, you can significantly reduce the storage space for a DECIMAL column in which values have significant difference in their lengths.

Note: VARDECIMAL is a storage format and not a data type.

The VARDECIMAL storage format is helpful when you have to specify the largest possible value for a field in which the stored values usually are much smaller. Table 4-1 shows this.

Note: The VARDECIMAL storage format works the same way for the DECIMAL data type as the VARCHAR data type works for alphanumerical data.

Precision	No. of Bytes: VARDECIMAL	No. of Bytes: Fixed Length
0 or NULL	2	5
1	4	5
20	12	13
30	16	17
38	20	17

Table 4–1 Number of Bytes for Storing VARDECIMAL and Fixed Length

To enable the VARDECIMAL storage format, you first have to enable it for the database and then enable it for the particular table of that database. The sp_db_ vardecimal_storage_format system procedure is used for the first step, as Example 4.6 shows.

Example 4.6
EXEC sp_db_vardecimal_storage_format 'sample', 'ON';

The VARDECIMAL STORAGE FORMAT option of the sp_table option system procedure is used to turn on this storage for the table. Example 4.7 turns on the VARDECIMAL storage format for the project table.

Example 4.7
EXEC sp_tableoption 'project', 'vardecimal storage format', 1

As you already know, the main reason to use the VARDECIMAL storage format is to reduce the storage size of the data. If you want to test how much storage space could be gained by using this storage format, use the dynamic management view calledsys.sp_estimated_rowsize_reduction_for_vardecimal. This dynamic management view gives you a detailed estimate for the particular table.

Data Warehouse & Business Intelligence

Labels

Friday, June 1, 2012

T-SQL Data Types

No comments:

Post a Comment

தினம் ஒரு திருக்குறள்