Thursday, February 27, 2014

SQL Server Integration Services, Data Type Mapping

SQL Server Integration Services (SSIS) utilize several type systems – some depending on the different data providers supported by the SSIS, some depending on the environment where the service is used, etc. When designing SSIS packages, especially in heterogeneous environments, where different data management platforms and tools are used, appropriate data type mapping is highly critical – just think of quantities, precision and scale. This article should provide you with some of the basics of how data types of the several type systems supported by SSIS can be used when integrating data from diverse source and type systems. The article also mentions some of the type-related shortcomings of SSIS and tries to help you correct work around them.



The SQL Server Integration Services Data Type Map

The following table provides a mapping of all data types used in SQL Server Integration Services; the mapping is represented from the SQL Server perspective. Data types that are, in fact, supported by other type systems represented in the table, but which do not map to a corresponding SQL Server data type, are not shown here.

Obviously, when I say all, I actually mean all, except the type system used by the SQL Server Native Client ODBC provider; I plan to cover that in a later post.

Also note that, rather than providing a data type compatibility matrix (which, to be honest, could have been even more useful), I've instead tried to provide a simplified, one-to-one data type compatibility map, that you can use to safely cast a value using a data type of one type system to the nearest data type in another type system.

Data Type
SQL Server SSIS
Variables
SSIS
Pipeline Buffer
OLE DB ADO.NET
bigint Int64 DT_I8 LARGE_INTEGER Int64
binary Object* DT_BYTES n/a Binary
bit Boolean DT_BOOL VARIANT_BOOL Boolean
char String DT_STR VARCHAR StringFixedLength
date Object* DT_DBDATE DBDATE Date
datetime DateTime DT_DBTIMESTAMP DATE** DateTime
datetime2 Object* DT_DBTIMESTAMP2 DBTIME2 DateTime2
datetimeoffset Object* DT_DBTIMESTAMPOFFSET DBTIMESTAMPOFFSET DateTimeOffset
decimal Object***
(< SQL 2012)
Decimal
(>= SQL 2012)
DT_NUMERIC NUMERIC Decimal
float Double DT_R8 FLOAT Double
image Object* DT_IMAGE n/a Binary
int Int32 DT_I4 LONG Int32
money Object* DT_CY
(OLE DB)
DT_NUMERIC
(ADO.NET)
CURRENCY Currency
nchar String DT_WSTR NVARCHAR StringFixedLength
ntext String DT_NTEXT n/a String
numeric Object***
(< SQL 2012)
Decimal
(>= SQL 2012)
DT_NUMERIC NUMERIC Decimal
nvarchar String DT_WSTR NVARCHAR String
nvarchar(max) Object DT_NTEXT n/a n/a
real Single DT_R4 FLOAT, DOUBLE Single
rowversion Object* DT_BYTES n/a Binary
smalldatetime DateTime DT_DBTIMESTAMP DATE** DateTime
smallint Int16 DT_I2 SHORT Int16
smallmoney Object* DT_CY
(OLE DB)
DT_NUMERIC
(ADO.NET)
CURRENCY Currency
sql_variant Object* DT_WSTR****
(OLE DB)
DT_NTEXT****
(ADO.NET)
**** Object****
table Object* n/a *****
text Object* DT_TEXT n/a n/a
time Object* DT_DBTIME2 DBTIME2 Time
timestamp Object* DT_BYTES n/a Binary
tinyint Byte DT_UI1 BYTE Byte
uniqueidentifier String******
(OLE DB)
Object******
(ADO.NET)
DT_GUID GUID Guid
varbinary Object* DT_BYTES n/a Binary
varbinary(max) Object* DT_IMAGE n/a Binary
varchar String DT_STR VARCHAR String
varchar(max) Object* DT_TEXT n/a n/a
xml Object* DT_NTEXT *****


Type Systems

The following type systems are supported by SSIS:

  • SQL Server Data Types – these are the essential data types supported by SQL Server; also known as built-in or system data types. Custom, CLR-based data types, are not discussed in this article, mostly because they are specific to SQL Server, specific to a particular version of SQL Server, or could be represented using system data types (e.g. by character or binary data types, or by XML);
  • SSIS Variable Data Types – the data types of the underlying type system used by the SSIS service are exposed to the programming environment as .NET Framework data types. However, be aware that not all .NET Framework data types can be used by the SSIS variables. You can find a list of supported SSIS variable data types in the TypeCode Enumeration article on MSDN;
  • SSIS Pipeline Buffer Data Types – the data types used by the pipeline buffer, the essential element of the Data Flow task, are exposed using a dedicated type system. This type system is different from the one used by the SSIS variables. As long as the Data Flow definition metadata corresponds to the metadata of the data sources involved in the Data Flow, you should not experience any problems with this particular type system, regardless of the data providers used by the Data Source or the Data Destination components. For additional information on SSIS pipeline buffer data types consult the Integration Services Data Types article on MSDN;
  • OLE DB and ADO.NET Data Types – the data providers use their own type systems, which are exposed to the SSIS programming interface. These are also different from the rest of the type systems. When referring to these two data providers in this article, I specifically refer to their use in the Execute SQL Task. You can find more information on OLE DB data types and data type conversions on MSDN, beginning with Data Types in OLE DB (OLE DB). For more information about ADO.NET data types start with Data Type Mappings in ADO.NET.



The Base of All Bases

In the .NET Framework all data types are derived from Object. Therefore, all .NET data types are convertible to Object. In other words: wherever .NET data types are used and no other more appropriate data type is available, Object can be used instead.

SSIS variables utilize a subset of .NET data types, including Object; if the most appropriate data type is not available for your SSIS variable, use Object.

Why not use Object for just any SSIS variable? Think of debugging and logging. While most other data types implement at least one standard accessor that allows you to view or display the actual value, or to write it to a log, with Object you would have to add your own programmatic logic to convert the values to their actual type before they could be displayed or written to the log.



Exceptional Cases


DATE

Even though OLE DB documentation – for instance, the Data Type Support for OLE DB Date/Time Improvements article on MSDN (also available for SQL Server versions 2008 and 2008 R2) – suggests that the OLE DB DBTIMESTAMP be used for SQL Server DATETIME or SMALLDATETIME values, this data type does not seem to be supported in the Execute SQL Task. Attempts to pass SQL Server DATETIME or SMALLDATETIME values to (or from) an OLE DB DBTIMESTAMP parameter will result in the following error:

Executing the query "..." failed with the following error: "Invalid time format". 
Possible failure reasons: Problems with the query, "ResultSet" property not set 
correctly, parameters not set correctly, or connection not established correctly.

Alternatively, you can use the OLE DB DATE data type for DATETIME or SMALLDATETIME values. Be warned, though, that the precision of DATE is one second, which means that DATETIME values might be truncated. On the other hand, SMALLDATETIME values, whose precision is one minute, will not be affected.

ADO.NET is a more appropriate option in such cases, as it does not suffer from these limitations.




DECIMAL

Prior to SSIS 2012, DECIMAL was not a supported SSIS variable data type. Various workarounds have so far been proposed by other users, but to me the only sensible option for SSIS variables holding decimal values is the use of the Object data type; the values will be implicitly converted to the appropriate destination data type – for instance, when passed to an Execute SQL Task parameter. Starting with SQL Server 2012, DECIMAL is (finally) available for use with SSIS variables; so, at least this particular workaround is now a thing of the past.

The Decimal data type is available in OLE DB as well as ADO.NET data providers; however, only ADO.NET actually supports it in SSIS. The following exception is raised by SSIS when trying to use the DECIMAL data type in an Execute SQL Task using OLE DB (regardless of the fact that DECIMAL is listed as a supported data type in the Execute SQL Task editor in SSDT or BIDS):

The type is not supported.DBTYPE_DECIMAL

The OLE DB NUMERICAL data type is available and compatible; its precision and scale match the ones used for SQL Server DECIMAL (and NUMERIC) data types.




UNIQUEIDENTIFIER

Even though Guid is a native .NET data type, it is not part of the SSIS variable type system. You can create UNIQUEIDENTIFIER or Guid values using SSIS (e.g. using the Script Task or the Execute SQL Task), but in order to pass them to other SSIS components, you must resort to a "trick". For instance, to create a Guid value in a Script Task and then assign it to a SSIS variable of type String, use the following assignment:

Dts.Variables["my_guid_string_variable"].Value
 = "{" + Guid.NewGuid().ToString() + "}";

A Guid value cast to String and formatted this way can then be passed as a parameter to an Execute SQL Task, or to the SSIS Pipeline Buffer (e.g. in a Derived Column data flow component) – mapped to a Guid parameter, or a DT_GUID column. It will be implicitly cast to the appropriate type.

When UNIQUEIDENTIFIER values are returned from the Execute SQL Task, their type depends on the data provider used by the task. Attempts to assign a Guid value returned from ADO.NET to a String SSIS variable, will result in the usual type mismatch error:

The type of the value being assigned to variable "User::string_guid_variable" 
differs from the current variable type. Variables may not change type during 
execution. Variable types are strict, except for variables of type Object.

Again, you can prevent this by using Object as the SSIS variable data type.




SQL_VARIANT

It would be wrong to say that the SQL_VARIANT data type is not supported by the Execute SQL Task. On the other hand, you should be very careful with this; any OLE DB type, which is compatible with a SQL Server data type, which is in turn compatible with the SQL_VARIANT data type, can be used in the Execute SQL Task. However, having to know the type in advance contradicts the principal purpose of SQL_VARIANT – its ability to support a variety of data types in one variable, or in the same column. I would advise against using the OLE DB provider with the Execute SQL Task if you have to deal with SQL_VARIANT.

There is no equivalent data type for SQL_VARIANT in the type system used by the SSIS Pipeline Buffer. If you're using an OLE DB data source and destination, the Data Flow designer will automatically use the DT_WSTR type for SQL_VARIANT columns with the following warning:

The output "OLE DB Source Output" references an external data type that cannot 
be mapped to a Data Flow task data type. The Data Flow task data type DT_WSTR 
will be used instead.

Also note that the size of the column in the data flow metadata will be set by the editor based on the sample of rows provided at design time. At run time, you might encounter truncation errors, should the size of the actual data exceed the one set at design time.

If you're using an ADO.NET source, the Data Flow designer will automatically use the DT_NTEXT data type for SQL_VARIANT columns with the following warning:

The data type "System.Object" found on column "sql_variant_column" is not 
supported for the component "ADO NET Source" (201). This column will be 
converted to DT_NTEXT.

Personally, I would advise against using SQL_VARIANT columns in data flows at all, unless your own tests conclusively show that the data in your environment is read from the source and written to the destination correctly.




TABLE and XML

Even though the SQL Server TABLE type is supported by the .NET Framework, OLE DB, and ADO.NET – that is, via Table-valued Parameters – it is not supported as a data type in SSIS. Typically, in SSIS, in-flight set-oriented processing is performed using the Data Flow task; of course, if you can stage your data, and are not required to perform all data processing in a single data flow, you can move some of that set-oriented logic outside the SSIS process (for instance, updates or merges can be performed using the UPDATE and/or MERGE statements, executed from SSIS by using the Execute SQL Task).

XML is a complex native data type in SQL Server, and it can also be represented by a string. Unfortunately, in SSIS, things are not quite as simple as that. The .NET Framework implements several types based on the W3C XML Recommendation, ADO.NET also supports XML natively, and OLE DB supports the use of XML data – to some extent. I believe XML deserves special attention in SSIS, and I've covered it in more detail in my earlier post on SSIS and complex parameters.



Large Object Data

Large-object data types, such as VARCHAR(MAX), NVARCHAR(MAX) OR VARBINARY(MAX), are not fully supported by the Execute SQL Task, by neither the OLE DB, nor the ADO.NET, providers. By using the ADO.NET provider it is possible to pass IMAGE, NTEXT or VARBINARY(MAX) data to and from the Execute SQL Task, but neither VARCHAR(MAX), nor NVARCHAR(MAX), parameters are supported. It is, however, possible to work around these limitations by using more elaborate techniques, which I feel deserve a dedicated article, so keep watching this blog for more information on that particular subject.



Additional Notes

I have been meaning to write this article for years; mainly because I needed it (and still do), but also because I couldn't find a single online resource that would cover, in one place, all of the various type systems used in SSIS.

In the beginning, simply locating appropriate documentation quickly proved to be a daunting task; I've also tried to utilize online resources as much as possible – because, after all, this article too will be available online. I also wanted to test each combination, as life with IT has taught me not to take any statement, even if it's part of vendor documentation, for granted. Therefore, I created an SSIS solution in versions 2008 and 2012, which I have used to verify the compatibility of the different type systems. I've tried to be as careful as possible in my experimentation, making sure that I detected as many problems as possible before making any conclusions; however, I still feel there could be things that I might have overlooked.

Therefore, dear reader, if you happen to find any flaws in this (and any other of my texts dealing with data types), please, let me know.



ML

Wednesday, March 20, 2013

SQL Server 2012 FileTables, Text Files and Full-text Search

In July 2012, fellow SolidQ Mentor and Microsoft MVP, Davide Mauri (@mauridb), discovered an unexpected behavior of the SQL Server 2012 Full-text Search (FTS), and/or Statistical Semantic Search (SSS), while indexing documents placed in a SQL Server 2012 FileTable as TXT files. Instead of the language set in the definition of the full-text index (see CREATE FULLTEXT INDEX in SQL Server Books Online for details) the language determined by the Windows System Locale was used for full-text indexing.

In March 2013 the alleged bug was resolved by Microsoft as "Won't Fix", with the following explanation:

Posted by Microsoft on 6.3.2013 at 8:22
This is by design. In the attached script the Text IFilter would emit the System Locale for the work chunks. And we go by whatever the IFilter emits. The full-text index language will be used when we have a plain text column without the doctype column.

Regardless of the language set in the full-text index definition, the language emitted by the IFilter is used – both for the full-text index as well as for the statistical semantic indexes. Because TXT files do not store any additional metadata, such as the language used in the text, the IFilter resorts to the System Locale in order to "determine" the language. This means that if the language used in the files is different from the one used as the System Locale, the resulting full-text index might not contain correct, or complete, data, and if the language of the System Locale is not supported by the Statistical Semantic Search then the statistical semantic indexes will also not be populated. In other words: the actual language must be determined by other means.



The Solution

To allow multi-lingual documents, or simply documents written in a language different from the one set by the System Locale, that are stored in a FileTable, to be indexed in accordance with the language(s) used in them, these documents need to be stored using a format that supports additional metadata, such as the language of the text, which the corresponding IFilter must be able to retrieve during the full-text index population. For instance, Microsoft Word files, PDF files, or even XML files, allow you to set the language of either the entire document, or of its individual parts. The corresponding IFilters will respect these settings, prompting the use of appropriate word breakers and stemmers for FTS/SSS processing.

In the above Connect item, Davide used a Transact-SQL script to create a couple of text files, insert them into a FileTable, and then index them using both the "regular" full-text indexing as well as the statistical semantic indexing. The files contained English text, the language set for the full-text index was also English; however, in his environment the System Locale was set to Italian. The latter was then actually used for full-text index population, which is not what Davide (or me, for that matter) would have expected.

In the amended version of Davide's script I have used the XML format to store the documents as files, and inside the XML I specified the correct language of each document by using the xml:lang attribute. Download the script (right-click, then Save target as...) and try it out yourself.



ML

Tuesday, March 19, 2013

SSIS 2012 Bug with Windows Group Permissions

In October 2012 a bug was discovered in SQL Server 2012 Integration Services, specifically in the SSISDB Catalog, where permissions assigned to an SSISDB user mapped to a Windows Group login are not determined correctly at run time.

The SSISDB Catalog security model extends the native SQL Server security model to allow the permissions to be managed at various SSISDB object levels (e.g. folders, projects, packages, etc.). This extension is implemented inside the SSISDB database and is not fully integrated with the SQL Server security model. As a result, the actual SSISDB object permissions are determined per user at run time.

Fellow Microsoft MVP Phil Brammer (@PhilBrammer) pinpointed the source of the problem to an SSISDB catalog view, which he proposed be changed accordingly. As you can observe in the Connect item, the solution is very simple, but I am sure you can agree that implementing it would improve the usability of the SSISDB Catalog quite significantly. The item is still active, and if you are already using the new SSIS project deployment model, or are planning to use it at any time in the future, I urge you to vote on Connect for the problem to be corrected.



Is There a Workaround?

Until Microsoft resolves the issue, you can work around it by using the following approach, which I also described on Connect:

  1. Create a new database role (for instance, named ssis_user) in the SSISDB database.
  2. Add the login, based on the Windows NT group that you want to assign the permissions to, to this newly created database role.
  3. Assign the appropriate permissions to the ssis_user SSISDB database role by using the SSISDB DCL procedures.

For instance, you can use the following SSMS Transact-SQL template to create the database role:

use SSISDB;
go

create role ssis_user
 authorization dbo;

alter role ssis_user
 add member <database_principal, sysname, Database principal>;
go

In SSMS, use the Ctrl + Shift + M keyboard shortcut to complete the script. You can find more information about SSMS Transact-SQL Templates in SQL Server Books Online.



ML