Data Warehouse & Business Intelligence: February 2013

Wednesday, February 27, 2013

How to Find Tables With Primary and Foreign Key Constraint in Database

Table with ForiegnKey constraint

USE EDW; --Database name
GO
SELECT f.name AS ForeignKey,
OBJECT_NAME(f.parent_object_id) AS TableName,
COL_NAME(fc.parent_object_id,
fc.parent_column_id) AS ColumnName,
OBJECT_NAME (f.referenced_object_id) AS ReferenceTableName,
COL_NAME(fc.referenced_object_id,
fc.referenced_column_id) AS ReferenceColumnName
FROM sys.foreign_keys AS f
INNER JOIN sys.foreign_key_columns AS fc
ON f.OBJECT_ID = fc.constraint_object_id

Table with PrimaryKey constraint

USE EDW; --Database name
GO
SELECT i.name AS IndexName,
OBJECT_NAME(ic.OBJECT_ID) AS TableName,
COL_NAME(ic.OBJECT_ID,ic.column_id) AS ColumnName
FROM sys.indexes AS i
INNER JOIN sys.index_columns AS ic
ON i.OBJECT_ID = ic.OBJECT_ID
AND i.index_id = ic.index_id
WHERE i.is_primary_key = 1

Find a table in a database

Below is the code to find a table in a database

USE XXXX --Database name

SELECT * FROM sys.Tables
WHERE name LIKE '%Address%'

Below is the code to find a table in all database with stored procedure

USE Master
GO
CREATE PROCEDURE usp_FindTableNameInAllDatabase
@TableName VARCHAR(256)
AS
DECLARE @DBName VARCHAR(256)
DECLARE @varSQL VARCHAR(512)
DECLARE @getDBName CURSOR
SET @getDBName = CURSOR FOR
SELECT name
FROM sys.databases
CREATE TABLE #TmpTable (DBName VARCHAR(256),
SchemaName VARCHAR(256),
TableName VARCHAR(256))
OPEN @getDBName
FETCH NEXT
FROM @getDBName INTO @DBName
WHILE @@FETCH_STATUS = 0
BEGIN
SET @varSQL = 'USE ' + @DBName + ';
INSERT INTO #TmpTable
SELECT '''+ @DBName + ''' AS DBName,
SCHEMA_NAME(schema_id) AS SchemaName,
name AS TableName
FROM sys.tables
WHERE name LIKE ''%' + @TableName + '%'''
EXEC (@varSQL)
FETCH NEXT
FROM @getDBName INTO @DBName
END
CLOSE @getDBName
DEALLOCATE @getDBName
SELECT *
FROM #TmpTable
DROP TABLE #TmpTable

To run the above store proc:
EXEC usp_FindTableNameInAllDatabase 'xxxTablenamexxx' --Table name

If Exists Logic

If exists logic for a Database

IF EXISTS(select * from sys.databases where name='yourDBname')

If exists logic for a table

IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[DimPACodes]') AND type in (N'U'))

If exists logic for a synonms

IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N's_DimPACodes') AND type in (N'SN'))

If exists logic for a View

IF EXISTS (SELECT * FROM sys.views WHERE object_id = OBJECT_ID(N'vw_DimProgram') AND type in (N'V'))

If exists logic for a column

IF NOT EXISTS (SELECT * FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_NAME = 'Dimprogram' and COLUMN_NAME = 'IAPKey')

If exists logic for a Schema

IF NOT EXISTS(SELECT * FROM sys.schemas where name = N'Test')

EXEC ('CREATE SCHEMA [telligent] Authorization [dbo]')

(OR)

IF NOT EXISTS(SELECT * FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_SCHEMA = 'Test')

EXEC ('CREATE SCHEMA [Test] Authorization [dbo]')

Interpreting type codes in sys.objects in SQL Server

AF = Aggregate function (CLR)
C = CHECK constraint
D = DEFAULT (constraint or stand-alone)
F = FOREIGN KEY constraint
FN = SQL scalar function
FS = Assembly (CLR) scalar-function
FT = Assembly (CLR) table-valued function
IF = SQL inline table-valued function
IT = Internal table
P = SQL Stored Procedure
PC = Assembly (CLR) stored-procedure
PG = Plan guide
PK = PRIMARY KEY constraint
R = Rule (old-style, stand-alone)
RF = Replication-filter-procedure
S = System base table
SN = Synonym
SQ = Service queue
TA = Assembly (CLR) DML trigger
TF = SQL table-valued-function
TR = SQL DML trigger
TT = Table type
U = Table (user-defined)
UQ = UNIQUE constraint
V = View
X = Extended stored procedure

To search a string in every column of every table in a specific database

To search a string in every column of every table in a specific database. Create the stored procedure on the database that you want to search in.

CREATE PROCEDURE FindMyData_String
@DataToFind NVARCHAR(4000),
@ExactMatch BIT = 0
AS
SET NOCOUNT ON

DECLARE @Temp TABLE(RowId INT IDENTITY(1,1), SchemaName sysname, TableName sysname, ColumnName SysName, DataType VARCHAR(100), DataFound BIT)

INSERT INTO @Temp(TableName,SchemaName, ColumnName, DataType)
SELECT C.Table_Name,C.TABLE_SCHEMA, C.Column_Name, C.Data_Type
FROM Information_Schema.Columns AS C
INNER Join Information_Schema.Tables AS T
ON C.Table_Name = T.Table_Name
AND C.TABLE_SCHEMA = T.TABLE_SCHEMA
WHERE Table_Type = 'Base Table'
And Data_Type In ('ntext','text','nvarchar','nchar','varchar','char')

DECLARE @i INT
DECLARE @MAX INT
DECLARE @TableName sysname
DECLARE @ColumnName sysname
DECLARE @SchemaName sysname
DECLARE @SQL NVARCHAR(4000)
DECLARE @PARAMETERS NVARCHAR(4000)
DECLARE @DataExists BIT
DECLARE @SQLTemplate NVARCHAR(4000)

SELECT @SQLTemplate = CASE WHEN @ExactMatch = 1
THEN 'If Exists(Select *
From ReplaceTableName
Where Convert(nVarChar(4000), [ReplaceColumnName])
= ''' + @DataToFind + '''
)
Set @DataExists = 1
Else
Set @DataExists = 0'
ELSE 'If Exists(Select *
From ReplaceTableName
Where Convert(nVarChar(4000), [ReplaceColumnName])
Like ''%' + @DataToFind + '%''
)
Set @DataExists = 1
Else
Set @DataExists = 0'
END,
@PARAMETERS = '@DataExists Bit OUTPUT',
@i = 1

SELECT @i = 1, @MAX = MAX(RowId)
FROM @Temp

WHILE @i <= @MAX
BEGIN
SELECT @SQL = REPLACE(REPLACE(@SQLTemplate, 'ReplaceTableName', QUOTENAME(SchemaName) + '.' + QUOTENAME(TableName)), 'ReplaceColumnName', ColumnName)
FROM @Temp
WHERE RowId = @i

PRINT @SQL
EXEC SP_EXECUTESQL @SQL, @PARAMETERS, @DataExists = @DataExists OUTPUT

IF @DataExists =1
UPDATE @Temp SET DataFound = 1 WHERE RowId = @i

SET @i = @i + 1
END

SELECT SchemaName,TableName, ColumnName
FROM @Temp
WHERE DataFound = 1
GO

To run it just do this:
exec FindMyData_string 'yahoo', 0 --here yahoo is the string name

Execution of job through command mode

Some times SSIS pacakages run through integration server but fails in Job from SQL Agent

This is the most common issue faced when we deploy packages in 64 bit system. When we create any package with Run64bit set as false, this issue occurs.

In order to overcome the 32\64 bit environment issue, we can execute the Job through command line mode

Apply the below code in job by mentioning the control flow package name

C:\Program Files\Microsoft SQL Server\110\DTS\Binn\dtexec.exe /FILE "\"M:/SSIS/MDR2.0/Imports/Imports20_ControlFlow.dtsx\"" /CHECKPOINTING OFF /REPORTING E

Apply the below code in job by mentioning the particular package name

C:\Program Files\Microsoft SQL Server\110\DTS\Binn\dtexec.exe/FILE "M:/SSIS/MDR2.0/EDW/EDW20_OfferedItemCBD.dtsx" /CHECKPOINTING OFF /REPORTING E

Index optimization command

sqlcmd -E -S $(ESCAPE_SQUOTE(SRVR)) -d master -Q "EXECUTE [dbo].[IndexOptimize] @Databases='ListSelect20',@FragmentationHigh='INDEX_REBUILD_OFFLINE',@FragmentationMedium='INDEX_REBUILD_OFFLINE',@FragmentationLevel1 = 1,@FragmentationLevel2 = 2,@PageCountLevel = 1,@SortInTempdb = 'Y'" -b

Script to check the last run status of job

SQL script to check the last run status of a job. This code will retrieve the last run status.

ECLARE @last_run_outcome INT
SET @last_run_outcome = (
SELECT MAX(sjs.last_run_outcome)
FROM msdb.dbo.sysjobs_view sjv
INNER JOIN msdb.dbo.sysjobsteps sjs ON sjs.job_id = sjv.job_id
WHERE name in ('MDR20')
AND last_run_outcome = 0)

--SELECT @last_run_outcome

IF @last_run_outcome = 0
EXEC msdb.dbo.sp_stop_job N'MDR20'

Below is the outcome of the job for the last time ran:

0 = Failed
1 = Succeeded
3 = Canceled
5 = Unknown

Tuesday, February 5, 2013

Database

A database is a collection of information that is organized so that it can easily be accessed, managed, and updated. In one view, databases can be classified according to types of content: bibliographic, full-text, numeric, and images.

In computing, databases are sometimes classified according to their organizational approach. The most prevalent approach is the relational database, a tabular database in which data is defined so that it can be reorganized and accessed in a number of different ways. A distributed database is one that can be dispersed or replicated among different points in a network. An object-oriented programming database is one that is congruent with the data defined in object classes and sub classes.

Computer databases typically contain aggregations of data records or files, such as sales transactions, product catalogs and inventories, and customer profiles. Typically, a database manager provides users the capabilities of controlling read/write access, specifying report generation, and analyzing usage. Databases and database managers are prevalent in large mainframe systems, but are also present in smaller distributed workstation and mid-range systems such as the AS/400 and on personal computers. SQL (Structured Query Language) is a standard language for making interactive queries from and updating a database such as IBM's DB2, Microsoft's SQL Server, and database products from Oracle, Sybase, and Computer Associates.

Relational database

A relational database is a collection of data items organized as a set of formally-described tables from which data can be accessed or reassembled in many different ways without having to reorganize the database tables.

The standard user and application program interface to a relational database is the structured query language (SQL). SQL statements are used both for interactive queries for information from a relational database and for gathering data for reports.

In addition to being relatively easy to create and access, a relational database has the important advantage of being easy to extend. After the original database creation, a new data category can be added without requiring that all existing applications be modified.

A relational database is a set of tables containing data fitted into predefined categories. Each table (which is sometimes called a relation) contains one or more data categories in columns. Each row contains a unique instance of data for the categories defined by the columns. For example, a typical business order entry database would include a table that described a customer with columns for name, address, phone number, and so forth. Another table would describe an order: product, customer, date, sales price, and so forth. A user of the database could obtain a view of the database that fitted the user's needs. For example, a branch office manager might like a view or report on all customers that had bought products after a certain date. A financial services manager in the same company could, from the same tables, obtain a report on accounts that needed to be paid.

When creating a relational database, you can define the domain of possible values in a data column and further constraints that may apply to that data value. For example, a domain of possible customers could allow up to ten possible customer names but be constrained in one table to allowing only three of these customer names to be specifiable.

The definition of a relational database results in a table of metadata or formal descriptions of the tables, columns, domains, and constraints.

Multidimensional database (MDB)

A multidimensional database (MDB) is a type of database that is optimized for data warehouse and online analytical processing (OLAP) applications. Multidimensional databases are frequently created using input from existing relational databases. Whereas a relational database is typically accessed using a Structured Query Language (SQL) query, a multidimensional database allows a user to ask questions like "How many Aptivas have been sold in Nebraska so far this year?" and similar questions related to summarizing business operations and trends. An OLAP application that accesses data from a multidimensional database is known as a MOLAP (multidimensional OLAP) application.

A multidimensional database - or a multidimensional database management system (MDDBMS) - implies the ability to rapidly process the data in the database so that answers can be generated quickly. A number of vendors provide products that use multidimensional databases. Approaches to how data is stored and the user interface vary.

Conceptually, a multidimensional database uses the idea of a data cube to represent the dimensions of data available to a user. For example, "sales" could be viewed in the dimensions of product model, geography, time, or some additional dimension. In this case, "sales" is known as the measure attribute of the data cube and the other dimensions are seen as feature attributes. Additionally, a database creator can define hierarchies and levels within a dimension (for example, state and city levels within a regional hierarchy).

Data Warehouse

Data warehousing is combining data from multiple and usually varied sources into one comprehensive and easily manipulated database. Common accessing systems of data warehousing include queries, analysis and reporting. Because data warehousing creates one database in the end, the number of sources can be anything you want it to be, provided that the system can handle the volume, of course. The final result, however, is homogeneous data, which can be more easily manipulated.

Data warehousing is commonly used by companies to analyze trends over time. In other words, companies may very well use data warehousing to view day-to-day operations, but its primary function is facilitating strategic planning resulting from long-term data overviews. From such overviews, business models, forecasts, and other reports and projections can be made. Routinely, because the data stored in data warehouses is intended to provide more overview-like reporting, the data is read-only. If you want to update the data stored via data warehousing, you'll need to build a new query when you're done.

This is not to say that data warehousing involves data that is never updated. On the contrary, the data stored in data warehouses is updated all the time. It's the reporting and the analysis that take more of a long-term view.

Data warehousing is not the be-all and end-all for storing all of a company's data. Rather, data warehousing is used to house the necessary data for specific analysis. More comprehensive data storage requires different capacities that are more static and less easily manipulated than those used for data warehousing.

Data warehousing is typically used by larger companies analyzing larger sets of data for enterprise purposes. Smaller companies wishing to analyze just one subject, for example, usually access data marts, which are much more specific and targeted in their storage and reporting. Data warehousing often includes smaller amounts of data grouped into data marts. In this way, a larger company might have at its disposal both data warehousing and data marts, allowing users to choose the source and functionality depending on current needs.

Enterprise Data Warehouse Design

Seven Principles for Enterprise Data Warehouse Design

Organizational Consensus

From the outset of the data warehousing effort, there should be a consensus-building process that helps guide the planning, design and implementation process. If your knowledge workers and managers see the DW as an unnecessary intrusion - or worse, a threatening intrusion - into their jobs, they won't like it and won't use it.

Make every effort to gain acceptance for, and minimize resistance to, the DW. If you involve the stakeholders early in the process, they're much more likely to embrace the DW, use it and, hopefully, champion it to the rest of the company.

Data Integrity

The brass ring of data warehousing - of any business intelligence (BI) project - is a single version of the truth about organizational data. The path to this brass ring begins with achieving data integrity in your DW.

Therefore, any design for your DW should begin by minimizing the chances for data replication and inconsistency. It should also promote data integration and standardization. Any reasonable methodology you choose to achieve data integrity should work, as long as you implement the methodology effectively with the end result in mind.

Implementation Efficiency

To help meet the needs of your company as early as possible and minimize project costs, the DW design should be straightforward and efficient to implement. This is truly a fundamental design issue. You can design a technically elegant DW, but if that design is difficult to understand or implement or doesn't meet user needs, your DW project will be mired in difficulty and cost overruns almost from the start.

Opt for simplicity in your design plans and choose (to the most practical extent) function over beautiful form. This choice will help you stay within budgetary constraints, and it will go a long way toward providing user needs that are effective.

User Friendliness

User friendliness and ease of use issues, though they are addressed by the technical people, are really business issues. Why? Because, again, if the end business users don't like the DW or if they find it difficult to use, they won't use it, and all your work will be for naught.

To help achieve a user-friendly design, the DW should leverage a common front-end across the company - based on user roles and security levels, of course. It should also be intuitive enough to have a minimal learning curve for most users. Of course, there will be exceptions, but your rule of thumb should be that even the least technical users will find the interface reasonably intuitive.

Operational Efficiency

This principle is really a corollary to the principle of implementation efficiency. Once implemented, the data warehouse should be easy to support and facilitate rapid responses to business change requests. Errors and exceptions should also be easy to remedy, and support costs should be moderate over the life of the DW.

The reason I say that this principle is a corollary to the implementation efficiency principle is that operational efficiency can be achieved only with a DW design that is easy to implement and maintain. Again, a technically elegant solution might be beautiful, but a practical, easy-to-maintain solution can yield better results in the long run.

Scalability

Scalability is often a big problem with DW design. The solution is to build in scalability from the start. Choose toolsets and platforms that support future expansions of data volumes and types as well as changing business requirements. It's also a good idea to look at toolsets and platforms that support integration of, and reporting on, unstructured content and document repositories.

Compliance with IT Standards

Perhaps the most important IT principle to keep in mind is to not reinvent the wheel when you build your DW. That is, the toolsets and platforms you choose to implement your DW should conform to and leverage existing IT standards.

தினம் ஒரு திருக்குறள்

"கற்றதனால் ஆய பயனென்கொல் வாலறிவன்

நற்றாள் தொழாஅர் எனின்."

தன்னைவிட அறிவில் மூத்த பெருந்தகையாளரின் முன்னே வணங்கி நிற்கும் பண்பு இல்லாவிடில் ஒருவர் கற்றிருந்தாலும் அதனால் எந்த பயனும் இல்லை

That lore is vain which does not fall

At His good feet who knoweth all

Labels

Wednesday, February 27, 2013

Tuesday, February 5, 2013