Wednesday, March 20, 2013

SQL Server 2012 FileTables, Text Files and Full-text Search

In July 2012, fellow SolidQ Mentor and Microsoft MVP, Davide Mauri (@mauridb), discovered an unexpected behavior of the SQL Server 2012 Full-text Search (FTS), and/or Statistical Semantic Search (SSS), while indexing documents placed in a SQL Server 2012 FileTable as TXT files. Instead of the language set in the definition of the full-text index (see CREATE FULLTEXT INDEX in SQL Server Books Online for details) the language determined by the Windows System Locale was used for full-text indexing.

In March 2013 the alleged bug was resolved by Microsoft as "Won't Fix", with the following explanation:

Posted by Microsoft on 6.3.2013 at 8:22
This is by design. In the attached script the Text IFilter would emit the System Locale for the work chunks. And we go by whatever the IFilter emits. The full-text index language will be used when we have a plain text column without the doctype column.

Regardless of the language set in the full-text index definition, the language emitted by the IFilter is used – both for the full-text index as well as for the statistical semantic indexes. Because TXT files do not store any additional metadata, such as the language used in the text, the IFilter resorts to the System Locale in order to "determine" the language. This means that if the language used in the files is different from the one used as the System Locale, the resulting full-text index might not contain correct, or complete, data, and if the language of the System Locale is not supported by the Statistical Semantic Search then the statistical semantic indexes will also not be populated. In other words: the actual language must be determined by other means.



The Solution

To allow multi-lingual documents, or simply documents written in a language different from the one set by the System Locale, that are stored in a FileTable, to be indexed in accordance with the language(s) used in them, these documents need to be stored using a format that supports additional metadata, such as the language of the text, which the corresponding IFilter must be able to retrieve during the full-text index population. For instance, Microsoft Word files, PDF files, or even XML files, allow you to set the language of either the entire document, or of its individual parts. The corresponding IFilters will respect these settings, prompting the use of appropriate word breakers and stemmers for FTS/SSS processing.

In the above Connect item, Davide used a Transact-SQL script to create a couple of text files, insert them into a FileTable, and then index them using both the "regular" full-text indexing as well as the statistical semantic indexing. The files contained English text, the language set for the full-text index was also English; however, in his environment the System Locale was set to Italian. The latter was then actually used for full-text index population, which is not what Davide (or me, for that matter) would have expected.

In the amended version of Davide's script I have used the XML format to store the documents as files, and inside the XML I specified the correct language of each document by using the xml:lang attribute. Download the script (right-click, then Save target as...) and try it out yourself.



ML

Tuesday, March 19, 2013

SSIS 2012 Bug with Windows Group Permissions

In October 2012 a bug was discovered in SQL Server 2012 Integration Services, specifically in the SSISDB Catalog, where permissions assigned to an SSISDB user mapped to a Windows Group login are not determined correctly at run time.

The SSISDB Catalog security model extends the native SQL Server security model to allow the permissions to be managed at various SSISDB object levels (e.g. folders, projects, packages, etc.). This extension is implemented inside the SSISDB database and is not fully integrated with the SQL Server security model. As a result, the actual SSISDB object permissions are determined per user at run time.

Fellow Microsoft MVP Phil Brammer (@PhilBrammer) pinpointed the source of the problem to an SSISDB catalog view, which he proposed be changed accordingly. As you can observe in the Connect item, the solution is very simple, but I am sure you can agree that implementing it would improve the usability of the SSISDB Catalog quite significantly. The item is still active, and if you are already using the new SSIS project deployment model, or are planning to use it at any time in the future, I urge you to vote on Connect for the problem to be corrected.



Is There a Workaround?

Until Microsoft resolves the issue, you can work around it by using the following approach, which I also described on Connect:

  1. Create a new database role (for instance, named ssis_user) in the SSISDB database.
  2. Add the login, based on the Windows NT group that you want to assign the permissions to, to this newly created database role.
  3. Assign the appropriate permissions to the ssis_user SSISDB database role by using the SSISDB DCL procedures.

For instance, you can use the following SSMS Transact-SQL template to create the database role:

use SSISDB;
go

create role ssis_user
 authorization dbo;

alter role ssis_user
 add member <database_principal, sysname, Database principal>;
go

In SSMS, use the Ctrl + Shift + M keyboard shortcut to complete the script. You can find more information about SSMS Transact-SQL Templates in SQL Server Books Online.



ML