本文转自:
Introduction
With the 2008 release, SQL Server Integration Services (SSIS) continues its advance in the enterprise data integration arena. Integration Services offers an entire architecture that combines the required elements for building solutions that provide high scalability and performance.
A well-architected data extraction, transformation and load (ETL) system should be able to respond to changes in the environment or to other external factors, ideally without editing the source code. Typical examples of these changing factors are:
- Differences between development and deployment environments. Typically, you develop SSIS packages on a client machine and then deploy them to a server. These two environments are not necessarily identical, as they may use different server names, connection strings, folders, and so forth.
- Conditional logic that needs to be controlled externally. For example, a package that performs either a full or incremental processing of a SQL Server Analysis Services cube based on an unscheduled event or upon user request.
- Execution of parameterized scripts or tasks. For example, the WHERE clause in a SQL statement is based on date range that is driven by variables.
Unfortunately, many times data integration practitioners fail to bring such flexibility into their solutions, affecting the reliability and the cost of maintenance and deployment of the system.
The good news is that SSIS comes with a range of options that help you when building solutions that respond favorably to these scenarios. These options are:
- The /SET option of the DTExec command prompt utility.
- Property expressions.
- Package Configurations.
In essence, these three methods let you modify the values of package properties (like connection strings, variable values, network drive paths, etc) each time you run the package, without the need to edit the package in Business Intelligence Development Studio (BIDS). Since the package configuration is the most versatile, but perhaps the most complex of these methods, the remainder of this article focuses on explaining the main concepts and considerations for its implementation.
Package Configurations: What They Are and What They Aren’t
When you add configurations to a package, you are basically exposing package properties and allowing them to be updated with a new value that comes from a file, table, environment variable, registry key or parent package variable. Package configurations are disabled by default, and have to be enabled on each package in which you want to use them. You can enable package configurations by choosing the Package Configurations option on the SSIS menu as shown in Figure 1.
Figure 1. The SSIS menu and the Package Configurations option.
After the Package Configuration Organizer is visible, you have to check the box next to the Enable Package Configurations option. See Figure 2 for more details.
Figure 2. The Package Configurations Organizer.
From the Package Configurations Organizer, you can launch the Package Configuration Wizard to create and edit configurations; you can also remove and set the order in which the configurations should be applied. The functionality of the Package Configuration Wizard is covered in the following section.
Package configurations are available in Integration Services in both SQL Server 2005 and 2008. While they offer identical options, they differ in the order in which configurations are applied to the package at execution time.
In SQL Server 2005 Integration Services, all package configurations, except Parent Package Variable configurations, are applied before applying the options specified with the DTExec command. Parent Package Variables are applied after applying DTExec options.
In SQL Server 2008 Integration Services, package configurations are applied twice, before and after applying the options of the DTExec command prompt utility. This should be seen as an improvement, since you can now use the /SET, /CONF or /CONN options of the DTExec utility to alter the original definition of the configurations. For example, you can now use the /CONN option to alter the connection manager being used in SQL Server configurations - something that was not possible with the 2005 version.
Package configurations can fail when they are being applied.When that happens warning messages are generated, the values available inside of the package (design time values) are used, and the package is executed. Including logging capabilities in your packages is a good way to capture failures in package configurations.
You can save some development time by using package templates. If you anticipate creating a large number of packages that use a common set of package configurations, you can simply include the required package configurations in a template, and then use the template when creating new packages.
The data flow task in SSIS does not allow changes to the pipeline structure at run time (number of columns, column names, and data types), and package configurations should not be used to attempt such changes in dataflow pipeline metadata, since it would cause validation errors in the package.
Types of Package Configurations
There are five different types of configurations you can use in SSIS, and while all of them serve the same purpose of updating the value of package properties, their actual behavior and implementation differ from each other. The package configuration types are:
- XML configuration file
- Environment variable
- Registry entry
- Parent package variable
- SQL Server
Each configuration type, except for the Environment variable, provides a direct and indirect method to provide the location of the configuration information.
XML File Configurations
With this type of configuration, the configuration value and the path of the property being configured are saved in a XML file. Here is a sample of a very simple XML configuration file:
-- - - Run time Value from config file - direct Method
An XML configuration file has two parts. The header contains metadata about the file itself, like creator, the name and ID of the package that was used when creating the file, and the creation date and time. The configuration section is where the path to the properties being updated and the configuration values to be used are stored. In the example above, there is only one configuration entry, whose Path attribute points to the Value property of a variable declared at the package level called [User:: ConfigTarget]:
Path="\Package.Variables[User::ConfigTarget].Properties[Value]"
And the value of the variable will be updated using the string contained in the ConfiguredValue element:
Run time Value from config file - direct Method
The good news is that you do not need to create the XML file manually, since the Package Configuration Wizard can create it for you.
Let’s walk through an example to demonstrate how you can use package configurations to change the value of variable.
First, create a new package, add a variable of string type called ConfigTargetObject and assign the string “Development time Value” as its value as shown in Figure 3.
Figure 3. Creating a new package variable.
Then, enable package configurations by going to the SSIS menu, and choosing the Package Configurations option. Then check the Enable Package Configurations option after the Package Configurations Organizer opens, and click the Add button to open the Package Configuration Wizard as shown in figure 4.
Figure 4. The Select Configuration Type page of the Package Configuration Wizard.
On the first page of the wizard, you can choose the configuration type, which in this case is an XML configuration file. Then you have two options for providing the location of the configuration file.
- The first option is known as the direct method, where the path to the file is stored inside of the package definition. This option works fine if you don’t have to deploy the package to different machines, or if the path provided is valid on every machine the package will be deploy to.
- The second method is called indirect, where you can pick the name of a Windows environment variable that would hold the path to the configuration file. This method is ideal when you have to deploy packages to different servers, and when the location of the XML file is subject to change, since you would only need to change the value of the environment variable where the package is deployed. Notice that the environment variable must exist prior to creating the package configuration, and in some instances you may have to close and re-open the Business Intelligence Development Studio (BIDS) session in order to get the name of the environment variable in the dropdown list.
For this example we have chosen the direct method.
In the next page of the wizard, look for the variable name we created at the beginning of this example, and check the box next to the Value property as illustrated in Figure 5.
Figure 5. The Select Properties to Export page of the Package Configuration Wizard.
On this screen, you have the ability to select multiple properties. This is because XML configuration files can contain multiple configuration entries. Click the next button one more time to see a summary of the configuration entry and provide a name as in Figure 6.
Figure 6. The Completing the Wizard page of the Package Configuration Wizard.
When you press the Finish button, the configuration file is created and you should be able to see a new entry in the package configuration organizer as showed in Figure 7. Notice that the file is created by exporting the values inside of the package at the time the wizard starts. This means that you still have to edit the XML file and adjust the configuration values according to your needs. In this example, we decide to provide “Run-time value” as new value for the variable being configured. For that we open the file in a text editor and change the line:
Development-time Value
To:
Run-time value from config file
Figure 7. The Package Configurations Organizer after creating an XML configuration file.
Now that you have seen how to create an XML file configuration, let’s list some important aspects that will help you in getting the most out of this type of configurations:
- Consider using the indirect method whenever possible, as you won’t need to edit the package if the location or name of the configuration file change. With this method, the path and file name are stored in an environment variable.
- If you find that creating environment variables is too intrusive for your scenario, consider using the direct method. But be aware that changes to the file location may trigger changes to the package, or force you to usether options, such asthe /CONF switch on DTExec command line.
- You can override the configuration file path and name from the DTExec utility command line by using the /CONFIGURATION option. This option only affects the package being called from the command line; it does not affect packages executed via an Execute package task from a parent package.
- You can configure multiple properties inside of a single XML file.
- Multiple packages can use the same XML configuration file.
- The package configuration wizard can create the XML file only when the direct method is used. If you decide to use the indirect method, consider creating the configuration by using the direct method first, and once the file is created, change the method to the indirect one.
- If you use the direct method and provide an existing configuration file, the wizard prompts you to either reuse the configuration information in the file, or to override it with new configuration information from the current package. A message similar to the one in figure 8 is displayed.Make sure you understand this behavior, as choosing the wrong option could change the content of the configuration file and produce unexpected execution results.
Figure 8. The prompt seen when reusing an existing configuration file.
- If you chose the indirect method, the wizard will not display the ‘Select properties to Export’ page. The wizard assumes that the configuration file already exists and contains all required configuration information (target properties and values).
- When creating the file, the wizard does not include any sensitive information (suchas passwords in connection strings) in the configuration file, and you have to edit the configuration file to add such information manually..
- Consider the adoption of strong security policies to restrict access to the configuration files and their content, as they may include highly sensitive data.
- The Package Configurations Organizer does not show any value for Target Object and Target Property columns for XML file based configurations, as the file may contain multiple configurations.
Environment Variable Configurations
With this type of package configuration, you have to create an environment variable for each package property you intend to update, and place the configuration value as the value of the environment variable. As you can see in Figure 9, the indirect method is not available when use this type of configuration; which it would not make too much sense as the indirect method is based on an environment variables.
Figure 9. Creating a new Environment variable configuration.
In the first page of the wizard, you have to choose the environment variable to be used from a drop down list. Then, the next page, shown in Figure 10, lets you choose which object property is going to be affected by the configuration you are creating. Notice that selecting multiple properties is disallowed for this type of configuration.
Figure 10. Setting a single value for an environment variable configuration.
Now, let’s review some considerations you should keep in mind when using this type of package configurations:
- The environment variable(s) must exist on all machines where the package is going to be deployed.This is because the name of the environment variable is stored inside of the package.
- You have to create an environment variable for each configuration value that you need. This is afactor to consider when setting multiple configurations is required. This type of package configuration works better when combined with SQL Server configuration or XML file types.
- The configuration wizard does not create the environment variables, and they are visible to the wizard only if they were created prior to opening the current BIDS session.
- You can use either system or user environment variables, but user environment variables are not visible to other users; hence, the package has to be executed by the user that owns the environment variable in order for it to work.
- Multiple package configurations from multiple packages can reference the same system environment variable.
- The environment variable configuration type is different than the indirect method available with other configurations types (which are based on environment variables).
Registry Entry Configurations
This configuration type lets you store configuration values in Windows registry entries in a similar fashion than environment variable configurations do. Figure 11 shows the first page of the configuration wizard when you select registry entry as the type to be used.
Figure 11. Creating a new Registry entry configuration.
After you select a registry key configuration type from the dropdown list, you have to choose the method to be used. The first option is the direct method, where the wizard expects a valid registry key name that exists under the Windows registry HKEY_CURRENT_USER key. The second option is the indirect method, where you provide the name of an environment variable that in turns contains the registry key name to be used by the configuration. The indirect method gives you the flexibility to change the name of the registry key or to pointto a different one by updating the environment variable value.
Let’s see a couple of examples of the value expected by the package configuration wizard in the registry entry field. If you want to use a registry key that exists directly under HKEY_CURRENT_USER, as in Figure 12, the expected value is:
SSISPkgConfig
Figure 12. A sample registry entry configuration.
If you create a registry key to be used by the configuration that is not directly under HKEY_CURRENT_USER key, as shown in Figure 13, then the wizard expects this value:
SSISPkgConfig\config1
Figure 13. A sample of a nested registry entry configuration.
In the next page of the wizard, select the property object you want to update trough the configuration, as shown in figure 14.
Figure 14. Setting a single value for a registry entry configuration.
There are a few other things you need to keep in mind when using this type of configurations:
- The wizard does not create or modify the registry keys; the registry keys must be manually created and edited.
- The registry keys must exist, either directly or indirectly, under HKEY_CURRENT_USER, and the configuration value has to be store in an entry called ‘value’.
- Registry keys under HKEY_CURRENT_USER are visible only to the user that created them, hence, the package can apply the configurations only when it is executed using the credentials of that user. If you execute the package via SQL Server Agent, a workaround is to set up a proxy account in SQL Server Agent. Make sure that you consider all the implications of this behavior, as this may complicate your deployment scenarios.
- You have to use one registry key per configuration value. If you are configuring multiple properties, consider using SQL Server or XML file configurations instead.
- You do not need to close and reopen the BIDS session each time you create or make changes to the registry keys being used by the configuration. This is an advantage over environment variable configuration type.
- You need access to modify the Windows registry settings on each machine were you intend to deploy the package. This makes this configuration type risky.
- Be extremely careful when using the registry editor, as any error could cause serious problems to the system. Always create a backup of the registry before making changes.
Parent Package Variable Configurations
When you execute a package (the child) from another package (the parent) via the Execute Package task, you can use Parent Package Variable configurations in the child package to pass variable values from the parent.
Regardless of its name, this configuration type has to be set up in the child package. In the Package Configuration Wizard, in the child package, you have to specify the name of the variable (that exists in the parent) that holds the desired configuration value, as shown in Figure 15.
Figure 15. Creating a new Parent package variable configuration.
Notice that the child package is unaware of the existence of the parent package, and the name of the variable that you enter is not validated when you create the configuration. When using the direct method, you have to type the variable name exactly as it appears in the parent package. Alternatively, you can select an environment variable that contains the name of the parent package variable, thus adding the flexibility the indirect method offers.
The next page of the wizard allows you to select the property to be updated, in the same way as when using registry entry or environment variable configurations.
Finally, let’s review some considerations and facts that are relevant when working with parent package variable configurations:
- By design, parent package variables are always applied last, regardless of where you place them in the configurations organizer. This is not necessarily a bad thing, but because of this, this type of configuration may not suitable in every scenario.
- This type of configuration is applied only when the child package is executed via the Execute Package task.
- SSIS variable names are case-sensitive. Always make sure that both spelling and casing of the variable name in the configuration entry are identical to the ones in the parent package. For example, “Myvariable” and “MyVariable” are considered different variables.
- Standalone execution and unit testing of child packages using this type of configuration may be challenging when the child package does not run in the context of the parent. A simple but effective workaround is to always apply this type of configurations to variables in the child package, and then use expressions in the objects that need to be updated (connection managers, server names, and so forth) based on those variables. This way, if you need to run the child package without the parent, you can manually edit the variables in the child package in BIDS, or via the /SET option of the DTExec utility.
- Each package configuration entry can only affect one property in the package being configured. Only XML file and SQL Server based package configurations have the ability to hold multiple configuration property/value pairs.
SQL Server Configurations
This configuration type offers almost the same level of flexibility and functionality as XML configuration files, with the difference that configuration information is stored in a SQL Server table. The table can be created in any database that is accessible by the package at execution time. You can use the Package Configuration Wizard to create the table. This is the default structure of the table:
CREATE TABLE [dbo].[SSIS Configurations](ConfigurationFilter NVARCHAR(255) NOT NULL,ConfiguredValue NVARCHAR(255) NULL,PackagePath NVARCHAR(255) NOT NULL,ConfiguredValueType NVARCHAR(20) NOT NULL)
These fields are used as follows:
- ConfigurationFilter: This field is used by SSIS to indentify a set of property/values pairs that are part of the same configuration entry in the Package Configurations Organizer.
- ConfigurationValue: It stores the value that is used to update the package property specified in PackagePath column.
- PackagePath: The path that point to the property being configured.
- ConfiguredValueType: the SSIS data type of the property being configured.
When you create SQL Server package configurations, you first have to choose which method you would use to provide the connection information to the configuration table, as the Figure 16 shows.
Figure 16. Creating a new SQL Server configuration.
With the direct method, the connection information, configuration table and filter are stored inside of the package. The indirect method instead allows storing that information in an environment variable. Notice that both methods use an SSIS connection manager, and its connection string is hard-coded inside of the package. Therefore, if you use this configuration type, it is a good practice to ensure that the connection string in this connection manager can be updated from an external source. A common approach is to use a separate configuration (XML, Registry Key or environment variable) to update the connection manager when required.
The next page in the configuration wizard lets you choose the set of properties to be targeted by the configuration being created. As you can see in figure 17, selecting multiple properties is allowed.
Figure 17. Setting multiple values for SQL Server configurations.
Now, let’s go through some important considerations that will help you to understand this configuration type better, and to avoid common implementation issues:
- SQL Server configurations are very flexible, as you can store multiple configuration properties/values in the same table.
- Multiple packages can use the same configuration table and filter.
- You can have a configuration filter for each entry in the configuration table, or you can use the same filter to group a set of entries. If you use the second approach, make sure that all properties under the selected filter exist in the package being configured. Otherwise warning messages are generated for those properties in the configuration filter that do not exist in the target package.
- You can use existing database security and backup policies to protect the configuration table contents.
- A drawback of this configuration type is that it relies on an SSIS connection manager to get access to the configuration table, but does not have built-in support for updating its connection string. In other words, the connection string that points to the configuration table is hard-coded inside of the package, and you have to take extra steps in order to update it (for example, through an extra package configuration placed at the top of the Package Configurations Organizer, or in SQL Server 2008, by using the /CONN option of DTExec).
- The /CONN option of DTExec utility can be used to override the connection string used by the configuration only in SSIS 2008. This is because SQL Server 2008 Integration Services reloads configurations after applying DTExec command line options.
- The configuration wizard can create the table and insert the required rows when they do not exist, but only if the direct method is chosen. The wizard never includes any sensitive data in the ConfiguredValue column of the table, and you would need to manually update the table to add the required sensitive data.
- If you use the direct method, and provide an existing configuration filter, the wizard will prompt you to either reuse the configuration information in the table or to override it with the property values from the current package. Make sure that you understand this behavior, since choosing the wrong option could change the contents of the configuration table and produce unexpected execution results.
- When you use the indirect method, the environment variable must contain the name of the configuration connection manager, the name of the configuration table and the configuration filter. The syntax should be:
“ConfigurationManagerName”;”Schema.ConfigurationTableName”;’ConfigurationFilter”
As with the direct method, you have to account for extra logic if you need to modify the connection string inside of “ConfigurationManagerName”
Conclusion
Package configuration is the natural way to parameterize Integration Services packages and to put your ETL solution in a better position to seamlessly respond to possible changes in the environment. With five types and two methods available, package configuration is a sophisticated mechanism that can be combined in a number of ways, and the time invested in understanding its behavior and the options available is well worth it.
About the author. Rafael Salas is a Senior Consultant at Mariner, a BI focus consulting firm, where he specializes in helping organizations to improve performance through Business Intelligence and Data Warehousing solutions. He has been a SQL Server evangelist since he started using the 2005 CTP. He is a SQL Server MVP, MCTS, and an active member of the user communities, where he provides guidance on the use of the SQL Server tools.