Monday, May 28, 2012

Internal Identifiers (IDs) in odi

Internal Identifiers (IDs)

Before performing export and import operations, it is essential to understand in detail the concept of internal identifiers (ID) in Oracle Data Integrator.
To ensure object uniqueness across several work repositories, ODI uses a specific mechanism to generate unique IDs for objects (such as technologies, data servers, Models, Projects, Integration Interfaces, KMs, etc.). Every object in Oracle Data Integrator is identified by an internal ID. The internal ID appears on the Version tab of each object.
ODI Master and Work Repositories are identified by their unique internal IDs. This RepositoryID of 3 digits must be unique across all work repositories of an ODI installation and is used to compute the internal ID of an object.
The internal ID of an object is calculated by appending the value of the RepositoryID to an automatically incremented number: <UniqueNumber><RepositoryID>
If the Repository ID is shorter than 3 digits, the missing digits are completed with "0". For example, if a repository has the ID 5, possible internal IDs of the objects in this repository could be: 1005, 2005, 3005, ..., 1234567005. Note that all objects created within the same repository have the same three last digits, in this example 005.
This internal ID is unique for the object type within the repository and also unique between repositories for the object type because it contains the repository unique ID. This mechanism allows to:
  • Avoid any ID conflicts when exporting and importing from one repository to another
  • Understand the provenance of every object, simply by looking at its Internal ID. The 3 last digits always refer to the repository where it was created.

Introduction to Oracle Data Quality Products

Introduction to Oracle Data Quality Products

Oracle Data Profiling and Oracle Data Quality for Data Integrator (also referred to as Oracle Data Quality Products) extend the inline Data Quality features of Oracle Data Integrator to provide more advanced data governance capabilities.
A complete Data Quality system includes data profiling, integrity and quality:
  • Profiling makes possible data investigation and quality assessment. It allows business users to get a clear picture of their data quality challenges, to monitor and track the quality of their data over time. Profiling is handled by Oracle Data Profiling. It allows business users to assess the quality of their data through metrics, to discover or infer rules based on this data, and finally to monitor over time the evolution of the data quality.
  • Integrity control is essential in ensuring the overall consistency of the data in your information system's applications. Application data is not always valid for the constraints and declarative rules imposed by the information system. You may, for instance, find orders with no customer, or order lines with no product, and so forth. Oracle Data Integrator provides built-in working environment to detect these constraint violation and store them for recycling or reporting purposes. Static and Flow checks in Oracle Data Integrator are integrity checks.
  • Quality includes integrity and extends to more complex quality processing. A rule-based engine apply data quality standards as part of an integration process to cleanse, standardize, enrich, match and de-duplicate any type of data, including names and addresses. Oracle Data Quality for Data Integrator places data quality as well as name and address cleansing at the heart of the enterprise integration strategy.

Web Services in Oracle Data Integrator




























It shows a simple example with the Data Services, Run-Time Web services (Public Web Service and Agent Web Service) and the OdiInvokeWebService tool.
The Data Services and Run-Time Web Services components are invoked by a third-party application, whereas the OdiInvokeWebService tool invokes a third-party Web service:
  • The Data Services provides access to data in data stores (both source and target data stores), as well as changes trapped by the Changed Data Capture framework. This web service is generated by Oracle Data Integrator and deployed in a Java EE application server.
  • The Public Web Service connects to the repository to retrieve a list of context and scenarios. This web service is deployed in a Java EE application server.
  • The Agent Web Service commands the Oracle Data Integrator Agent to start and monitor a scenario and to restart a session. Note that this web service is built-in the Java EE or Standalone Agent.
  • The OdiInvokeWebService tool is used in a package and invokes a specific operation on a port of the third-party Web service, for example to trigger a BPEL process.
Oracle Data Integrator Run-Time Web services and Data Services are two different types of Web services. Oracle Data Integrator Run-Time Web services enable you to access the Oracle Data Integrator features through Web services, whereas the Data Services are generated by Oracle Data Integrator to give you access to your data through Web services.

Define the Source Datastores and Lookups in odi

Define the Source Datastores and Lookups

The source datastores contain data used to load the target datastore. Two types of datastores can be used as an interface source: datastores from the models and temporary datastores that are the target of an interface.
When using a temporary datastore that is the target of another interface as a source or as a lookup table, you can choose:
  • To use a persistent temporary datastore: You will run a first interface creating and loading the temporary datastore, and then a second interface sourcing from it. In this case, you would typically sequence the two interfaces in a package.
  • Not to use a persistent datastore: The second interface generates a sub-select corresponding to the loading of the temporary datastore. This option is not always available as it requires all datastores of the source interface to belong to the same data server (for example, the source interface must not have any source sets). You activate this option by selecting Use Temporary Interface as Derived Table on the source. Note the following when using a temporary interface as derived table:
    • The generated sub-select syntax can be either a standard sub-select syntax (default behavior) or the customized syntax from the IKM used in the first interface.
    • All IKM commands except the one that defines the derived-table statement option Use current command for Derived Table sub-select statement are ignored. This limitation causes, for example, that temporary index management is not supported.
The source datastores of an interface can be filtered during the loading process and must be put in relation through joins. Joins and filters can be automatically copied from the model definitions and can also be defined for the interface.
A lookup is a datastore (from a model or the target datastore of an interface) - called the lookup table - associated to a source datastore - the driving table - via a join expression and from which data can be fetched and used into mappings.
The lookup data is used in the mapping expressions. Lookup tables are added with the Lookup Wizard. Depending on the database, two syntaxes can be used for a lookup:
  • SQL Left-Outer Join in the FROM clause: The lookup is processed as a regular source and a left-outer join expression is generated to associate it with its driving table.
  • SQL expression in the SELECT clause: The lookup is performed within the select clause that fetches the data from the lookup table. This second syntax may sometimes be more efficient for small lookup tables.

Define the Datasets in odi

Define the Datasets

A dataset represents the data flow coming from a group of datastores. Several datasets can be merged into the interface target datastore using set-based operators such as Union and Intersect. The support for datasets as well as the set-based operators supported depend on the capabilities of the staging area's technology.
You can add, remove, and order the datasets of an interface and define the operators between them in the DataSets Configuration dialog. Note that the set-based operators are always executed on the staging area.
When designing the integration interface, the mappings for each dataset must be consistent, this means that each dataset must have the same number of target columns mapped.
To create a new dataset:
  1. In the Source Diagram toolbar click Add /Remove DataSet... to display the DataSet Configuration dialog.
  2. Click Add New DataSet... A new line is added for the new dataset at the bottom of the list.
  3. In the DataSet Name field, give the name of the new dataset. This name will be displayed in the dataset tab.
  4. In the Operator field, select the set-based operator for your dataset. Repeat steps 2 to 4 if you wish to add more datasets.
  5. Click Close.
To arrange the order of the datasets:
  1. Select a dataset in the DataSet Configuration dialog.
  2. Click the Up and Down arrows to move the dataset up or down in the list.
To delete a dataset:
  1. Select a dataset in the DataSet Configuration dialog.
  2. Click Delete.

Temporary Target Datastore in odi

Temporary Target Datastore

To add a temporary target datastore:
  1. In the Target Datastore panel, select the title of the target datastore <Temporary Target Datastore> to display the Property Inspector for the target datastore.
  2. On the Diagram Property tab of Property Inspector, type in a Name for this datastore.
  3. Select the Context for this datastore if you want to target this datastore in a predefined context. By default, the datastore is targeted on the context into which the interface is executed. This is an optional step.
  4. Specify the Temporary Datastore Location. Select Work Schema or Data Schema if you wish to create the temporary datastore in the work or data schema of the physical schema that will act as the staging area.The temporary target datastore will be created only if you activate the IKM option CREATE_TARG_TABLE when defining the flow.

  5. Go to the Overview tab and select the logical schema into which this temporary target datastore is created.
The temporary target datastore is created without columns. They must be added to define its structure.
To add a column to a temporary target datastore:
  1. In the Target Datastore panel, right-click the title bar that shows the name of the target datastore.
  2. Select Add Column.
  3. A new empty column appears in the Target Datastore panel. Select this new column.
  4. In Diagram Property tab of the Target Mapping Property Inspector give the new column definition in the Target Column field group. You must define the column Name, Datatype, Length and Scale.
To delete a column from a temporary target datastore:
  1. Right-click the column to be deleted In the Target Datastore panel.
  2. Select Delete.
To add one or several columns from a source datastore to a temporary target datastore:
  1. Add the source datastore as described in
  2. In the Source Diagram, select the source datastore columns you wish to add.
  3. Right-click and select Add Column to Target Table.
  4. The columns are added to the target datastore. Data types are set automatically.
To add all of the columns from a source datastore to a temporary target datastore:
  1. Add the source datastore.
  2. In the Source Diagram, select the title of the entity representing the source datastore.
  3. Right-click and select Add to Target.
  4. The columns are added to the Target Datastore. Data types are set automatically.

Permanent Target Datastore in odi

Permanent Target Datastore

To insert the permanent target datastore in an interface:
  1. In the Designer Navigator, expand the Models tree and expand the model or sub-model containing the datastore to be inserted as the target.
  2. Select this datastore, then drag it into the Target Datastore panel. The target datastore appears.
  3. In the Property Inspector, select the Context for this datastore if you want to target this datastore in a fixed context. By default, the datastore is targeted on the context into which the interface is executed. This is an optional step.
  4. If you want to target a specific partition of this target datastore, select in the Property Inspector the partition or sub-partition defined for this datastore from the list. This is an optional step.
Once you have defined your target datastore you may wish to view its data.
To display the data of the permanent target datastore of an interface:
  1. Right-click the title of the target datastore in the Target Datastore panel.
  2. Select Data...
The Data Editor containing the data of the target datastore appears. Data in a temporary target datastore cannot be displayed since this datastore is created by the interface.

Components of an Integration Interface in odi

Components of an Integration Interface

An integration interface is made up of and defined by the following components:
  • Target Datastore
    The target datastore is the element that will be loaded by the interface. This datastore may be permanent (defined in a model) or temporary (created by the interface).
  • Datasets
    One target is loaded with data coming from several datasets. Set-based operators (Union, Intersect, etc) are used to merge the different datasets into the target datastore.
    Each Dataset corresponds to one diagram of source datastores and the mappings used to load the target datastore from these source datastores.
  • Diagram of Source Datastores
    A diagram of sources is made of source datastores - possibly filtered - related using joins. The source diagram also includes lookups to fetch additional information for loading the target.
    Two types of objects can be used as a source of an interface: datastores from the models and interfaces. If an interface is used, its target datastore -temporary or not- will be taken as a source.
    The source datastores of an interface can be filtered during the loading process, and must be put in relation through joins. Joins and filters are either copied from the models or can be defined for the interface. Join and filters are implemented in the form of SQL expressions.
  • Mapping
    A mapping defines the transformations performed on one or several source columns to load one target column. These transformations are implemented in the form of SQL expressions. Each target column has one mapping per dataset. If a mapping is executed on the target, the same mapping applies for all datasets.
  • Staging Area
    The staging area is a logical schema into which some of the transformations (joins, filters and mappings) take place. It is by default the same schema as the target's logical schema.
    It is possible to locate the staging area on a different location (including one of the sources). It is the case if the target's logical schema is not suitable for this role. For example, if the target is a file datastore, as the file technology has no transformation capability.
    Mappings can be executed either on the source, target or staging area. Filters and joins can be executed either on the source or staging area.
  • Flow
    The flow describes how the data flows between the sources, the staging area if it is different from the target, and the target as well as where joins and filters take place. The flow also includes the loading and integration methods used by this interface. These are selected by choosing Loading and Integration Knowledge Modules (LKM, IKM).
  • Control
    An interface implements two points of control. Flow control checks the flow of data before it is integrated into the target, Post-Integration control performs a static check on the target table at the end of the interface. The check strategy for Flow and Post-Integration Control is defined by a Check Knowledge Module (CKM).

Running the Package in oracle data integrator

Running the Package

To run a Package:
  1. In the Project tree in Designer Navigator, select the Package you want to execute.
  2. Right-click and select Execute.
  3. In the Execution window, select the execution parameters:
    • Select the Context into which the Package will be executed.
    • Select the Logical Agent that will run the step.
  4. Click OK.
  5. The Session Started Window appears.
  6. Click OK.
You can review the Package execution in the Operator Navigator.

Defining the Sequence in oracle data integrator

Defining the Sequence
To define the first step of the Package:
  1. In the Package toolbar tab, select the Free Choice tool.
  2. Select the step to set as the first one in the diagram.
  3. Right-click and then select First Step.
The first step symbol appears on the step's icon.
To define the next step upon success:
  1. In the Package toolbar tab, select the Next Step on Success tool.
  2. Select one step in the diagram.
  3. Keep the mouse button pressed and move the cursor to the icon of the step that must follow in case of a success, then release the mouse button.
  4. Repeat this operation to link all your steps in a success path sequence. This sequence should start from the step defined as the First Step.
Green arrows representing the success path between the steps, with a ok labels on these arrows. In the case of an evaluate variable step, the label is true.
To define the next step upon failure:
  1. In the Package toolbar tab, select the Next Step on Failure tool.
  2. Select one step in the diagram.
  3. Keep the mouse button pressed and move the cursor to the icon of the step that must follow in case of a failure, then release the mouse button.
  4. Repeat this operation to link steps according to your workflow logic.
Red arrows representing the failure path between the steps, with a ko labels on these arrows. In the case of an evaluate variable step, the arrow is green and the label is false.
To define the end of the Package upon failure:
By default, a step that is linked to no other step after a success or failure condition will terminate the Package when this success or failure condition is met. You can set this behavior by editing the step's behavior.
  1. In the Package toolbar tab, select the Free Choice tool.
  2. Select the step to edit.
  3. In the properties panel, select the Advanced tab.
  4. Select End in Processing after failure or Processing after success. The links after the step disappear from the diagram.
  5. You can optionally set a Number of attempts and an Time between attempts for the step to retry a number of times with an interval between the retries.

Running a Step in odi

Running a Step

To run a step:
  1. In the Package toolbar tab, select the Free Choice tool.
  2. Select the step to run in the diagram.
  3. Right-click and then select Execute Step.
  4. In the Execution window, select the execution parameters:
    • Select the Context into which the step must be executed.
    • Select the Logical Agent that will run the step.
  5. Click OK.
  6. The Session Started Window appears.
  7. Click OK.
You can review the step execution in the Operator Navigator.

Editing a Step's Linked Object in odi

Editing a Step's Linked Object

The step's linked object is the interface, procedure, variable, and so forth from which the step is created. You can edit this object from the Package diagram.
To edit a step's linked object:
  1. In the Package toolbar tab, select the Free Choice tool.
  2. Select the step to edit in the diagram.
  3. Right-click and then select Edit Linked Object.
The Editor for the linked object opens.

Reverse-Engineering a Model in oracle data integrator

Reverse-Engineering a Model

To insert a reverse-engineering step:
  1. Open the Package editor and go to the Diagram tab.
  2. In Designer Navigator, select the model to reverse-engineer from the Models tree.
  3. Drag and drop this model into the diagram.
  4. In the General tab or the properties panel, fill in the Step Name field. Select Model Reverse in the step type.
  5. From the File menu, click Save.

Journalizing a Model or a Datastore in oracle data integrator

Journalizing a Model or a Datastore

To insert a journalizing step:
Note:
It is necessary to define the JKM in the model to perform the journalizing operations.
  1. Open the Package editor and go to the Diagram tab.
  2. In Designer Navigator, select the model or datastore to check from the Models tree.
  3. Drag and drop this model or datastore into the diagram.
  4. In the General tab or the properties panel, fill in the Step Name field. Select Journalizing Model or Journalizing Datastore in the step type.
  5. Set the journalizing options.
  6. From the File menu, click Save.

Checking a Model, Sub-Model or Datastore in oracle data integrator

Checking a Model, Sub-Model or Datastore

To insert a check step in a Package:
Note:
It is necessary to define the CKM in the model to perform this static check.
  1. Open the Package editor and go to the Diagram tab.
  2. In Designer Navigator, select the model, sub-model or datastore to check from the Models tree.
  3. Drag and drop this model, sub-model or datastore into the diagram.
  4. In the General tab or the properties panel, fill in the Step Name field. Select Model, Datastore or Sub-Model Check in the step type.
  5. Select Delete Error from the Checked Tables if you want this static check to remove erroneous rows from the tables checked in this process.
  6. From the File menu, click Save.

Executing a Procedure in oracle data integrator

Executing a Procedure

To insert a Procedure step:
  1. Open the Package editor and go to the Diagram tab.
  2. Select the procedure to add to the Package from the Designer Navigator Projects tree.
  3. Drag and drop the procedure into the diagram. A procedure step appears.
  4. Click the step icon in the diagram. The properties panel opens.
  5. In the General tab, edit the Step Name field.
  6. In the Options tab, set the procedure options if needed.
  7. From the File menu, click Save.

Executing an Interface in oracle data integrator

Executing an Interface

To insert a Flow (Interface) step:
  1. Open the Package editor and go to the Diagram tab.
  2. Select the interface to add to the Package from the Designer Navigator Projects tree.
  3. Drag and drop the interface into the diagram. A flow Step appears.
  4. Click the step icon in the diagram. The properties panel opens.
  5. In the General tab, edit the Step Name field.
  6. From the File menu, click Save.

ODI variables types and Variable Steps

Variable Steps

There are different variable steps within Oracle Data Integrator:
  • Declare Variable: When a variable is used in a Package (or in elements of the topology which are used in the Package), it is strongly recommended that you insert a Declare Variable step in the Package. This step explicitly declares the variable in the Package.
  • Refresh Variable: This step refreshes the variable by running the query specified in the variable definition.
  • Set Variable: There are two functions for this step:
    • Assign sets the current value of a variable.
    • Increment increases or decreases a numeric value by the specified amount.
  • Evaluate Variable: This step compares the value of the variable with a given value according to an operator. If the condition is met, then the evaluation step is true, otherwise it is false. This step allows for branching in Packages.
Declaring a Variable
To insert a Declare Variable step:
  1. Open the Package editor and go to the Diagram tab.
  2. In Designer Navigator, select the variable to add to the Package from the Projects tree for a project variable or from the Others tree for a global variable.
  3. Drag and drop the variable into the diagram. A variable step appears.
  4. Click on the step icon in the diagram. The properties panel opens.
  5. In the General tab, fill in the Step Name field. Select Declare Variable in the Step Type.
  6. From the File menu, click Save.
Refreshing a Variable
To insert a Refresh Variable step:
  1. Open the Package editor and go to the Diagram tab.
  2. In Designer Navigator, select the variable to add to the Package from the Projects tree for a project variable or from the Others tree for a global variable.
  3. Drag and drop the variable into the diagram. A variable step appears.
  4. Click on the step icon in the diagram. The properties panel opens.
  5. In the General tab, fill in the Step Name field. Select Refresh Variable in the Step Type.
  6. From the File menu, click Save.
Setting a Variable
To insert a Set Variable step:
  1. Open the Package editor and go to the Diagram tab.
  2. In Designer Navigator, select the variable to add to the Package from the Projects tree for a project variable or from the Others tree for a global variable.
  3. Drag and drop the variable into the diagram. A variable step appears.
  4. Click on the step icon in the diagram. The properties panel opens.
  5. In the General tab, fill in the Step Name field. Select Set Variable in the Step Type.
  6. Select Assign or Increment depending on the operation you want to perform on the variable value.
  7. Type in the Value field the value to set or the increment. This value may be another variable.
  8. From the File menu, click Save.
Evaluating a Variable
To insert an Evaluate Variable step:
  1. Open the Package editor and go to the Diagram tab.
  2. In Designer Navigator, select the variable to add to the Package from the Projects tree for a project variable or from the Others tree for a global variable.
  3. Drag and drop the variable into the diagram. A variable step appears.
  4. Click on the step icon in the diagram. The properties panel opens.
  5. In the General tab, fill in the Step Name field. Select Evaluate Variable in the Step Type.
  6. Select the Operator used to compare the variable value.
  7. Type in the Value field the value to compare with your variable. This value may be another variable.
    Note:
    You can specify a list of values in the Value field. When using the IN operator, use the semicolon character (;) to separate the values of a list.
  8. From the File menu, click Save.

Saturday, May 26, 2012

Working with Integration Interfaces

Working with Integration Interfaces

 Components of an Integration Interface

An integration interface is made up of and defined by the following components:
  • Target Datastore
    The target datastore is the element that will be loaded by the interface. This datastore may be permanent (defined in a model) or temporary (created by the interface).
  • Datasets
    One target is loaded with data coming from several datasets. Set-based operators (Union, Intersect, etc) are used to merge the different datasets into the target datastore.
    Each Dataset corresponds to one diagram of source datastores and the mappings used to load the target datastore from these source datastores.
  • Diagram of Source Datastores
    A diagram of sources is made of source datastores - possibly filtered - related using joins. The source diagram also includes lookups to fetch additional information for loading the target.
    Two types of objects can be used as a source of an interface: datastores from the models and interfaces. If an interface is used, its target datastore -temporary or not- will be taken as a source.
    The source datastores of an interface can be filtered during the loading process, and must be put in relation through joins. Joins and filters are either copied from the models or can be defined for the interface. Join and filters are implemented in the form of SQL expressions.
  • Mapping
    A mapping defines the transformations performed on one or several source columns to load one target column. These transformations are implemented in the form of SQL expressions. Each target column has one mapping per dataset. If a mapping is executed on the target, the same mapping applies for all datasets.
  • Staging Area
    The staging area is a logical schema into which some of the transformations (joins, filters and mappings) take place. It is by default the same schema as the target's logical schema.
    It is possible to locate the staging area on a different location (including one of the sources). It is the case if the target's logical schema is not suitable for this role. For example, if the target is a file datastore, as the file technology has no transformation capability.
    Mappings can be executed either on the source, target or staging area. Filters and joins can be executed either on the source or staging area.
  • Flow
    The flow describes how the data flows between the sources, the staging area if it is different from the target, and the target as well as where joins and filters take place. The flow also includes the loading and integration methods used by this interface. These are selected by choosing Loading and Integration Knowledge Modules (LKM, IKM).
  • Control
    An interface implements two points of control. Flow control checks the flow of data before it is integrated into the target, Post-Integration control performs a static check on the target table at the end of the interface. The check strategy for Flow and Post-Integration Control is defined by a Check Knowledge Module (CKM).
The interfaces use the following components that should be created before the interface:
  • Datastores that will be used as sources and target of the loading process must be populated into the data models.
  • The correct physical and logical schemas along with their mapping in the interface's execution context must be defined prior to creating the interface, if the staging area is to be defined in a schema different than any of the sources or the target.
  • Knowledge Modules (IKM, LKM, CKM) that will be selected in the flow must be imported into the project.
  • Variables, Sequence and User Functions that will be used in the mapping, filter or join expressions must be created in the project.

11.2 Introduction to the Interface Editor

The interface Editor provides a single environment for designing integration interfaces. The interface Editor enables you to create and edit integration interfaces.
Figure 11-1 Interface Editor
Description of Figure 11-1 follows

 

Working with Packages

Working with Packages

Introduction to Packages

The Package is the largest unit of execution in Oracle Data Integrator. A Package is made up of a sequence of steps organized into an execution diagram.
Each step can either succeed or fail its execution. Depending on the execution result (success or failure), a step can branch to another step.

Introduction to Steps


Type Description See Section
Flow (Interface)
Executes an Interface.
Section 10.3.1.1, "Executing an Interface"
Procedure
Executes a Procedure.
Section 10.3.1.2, "Executing a Procedure"
Variable
Declares, sets, refreshes or evaluates the value of a variable.
Section 10.3.1.3, "Variable Steps"
Oracle Data Integrator Tools
These tools, available in the Toolbox, enable to access all Oracle Data Integrator API commands, or perform operating system calls
Section 10.3.1.4, "Adding Oracle Data Integrator Tool Steps"
Models, Sub-models, and Datastores
Performs journalizing, static check or reverse-engineering operations on these objects
Section 10.3.1.5, "Model, Sub-Models and Datastore Related Steps"
Description of Figure 10-1 follows
Description of "Figure 10-1 Sample Package"
For example, the "Load Customers and Invoice" Package example shown in Figure 10-1 performs the following actions:
  1. Execute procedure "System Backup" that runs some backup operations.
  2. Execute interface "Customer Group" that loads the customer group datastore.
  3. Execute interface "Customer" that loads the customer datastore.
  4. Execute interface "Product" that loads the product datastore.
  5. Refresh variable "Last Invoice ID" step to set the value of this variable for use later in the Package.
  6. Execute interface "Invoice Header" that load the invoice header datastore.
  7. Execute interface "Invoice Lines" that load the invoices datastore.
  8. If any of the steps above fails, then the Package runs the "Send Alert" step that sends an email to the administrator using an Oracle Data Integrator tool.

 

Introduction to the Package editor

The Package editor provides a single environment for designing Packages. Figure 10-2 gives an overview of the Package editor.
Figure 10-2 Package editor
Description of Figure 10-2 follows


Creating a new Package

To create a new Package:
  1. In the Project tree in Designer Navigator, click the Packages node in the folder where you want to create the Package.
  2. Right-click and select New Package.
  3. Type in the Name of the Package.
  4. Go to the Diagram tab.
  5. Add steps as described in Section 10.3, "Working with Steps"
  6. From the File menu, click Save.

10.3 Working with Steps

Packages are an organized sequence of steps. Designing a Package consists mainly in working with the steps of this Package.

10.3.1 Adding a Step

Adding a step depends on the nature of the steps being inserted. See Table 10-1 for more information on the different types of steps. The procedures for adding the different type of steps are given below.

10.3.1.1 Executing an Interface

To insert a Flow (Interface) step:
  1. Open the Package editor and go to the Diagram tab.
  2. Select the interface to add to the Package from the Designer Navigator Projects tree.
  3. Drag and drop the interface into the diagram. A flow Step appears.
  4. Click the step icon in the diagram. The properties panel opens.
  5. In the General tab, edit the Step Name field.
  6. From the File menu, click Save.

10.3.1.2 Executing a Procedure

To insert a Procedure step:
  1. Open the Package editor and go to the Diagram tab.
  2. Select the procedure to add to the Package from the Designer Navigator Projects tree.
  3. Drag and drop the procedure into the diagram. A procedure step appears.
  4. Click the step icon in the diagram. The properties panel opens.
  5. In the General tab, edit the Step Name field.
  6. In the Options tab, set the procedure options if needed.
  7. From the File menu, click Save.

10.3.1.3 Variable Steps

There are different variable steps within Oracle Data Integrator:
  • Declare Variable: When a variable is used in a Package (or in elements of the topology which are used in the Package), it is strongly recommended that you insert a Declare Variable step in the Package. This step explicitly declares the variable in the Package.
  • Refresh Variable: This step refreshes the variable by running the query specified in the variable definition.
  • Set Variable: There are two functions for this step:
    • Assign sets the current value of a variable.
    • Increment increases or decreases a numeric value by the specified amount.
  • Evaluate Variable: This step compares the value of the variable with a given value according to an operator. If the condition is met, then the evaluation step is true, otherwise it is false. This step allows for branching in Packages.
Declaring a Variable
To insert a Declare Variable step:
  1. Open the Package editor and go to the Diagram tab.
  2. In Designer Navigator, select the variable to add to the Package from the Projects tree for a project variable or from the Others tree for a global variable.
  3. Drag and drop the variable into the diagram. A variable step appears.
  4. Click on the step icon in the diagram. The properties panel opens.
  5. In the General tab, fill in the Step Name field. Select Declare Variable in the Step Type.
  6. From the File menu, click Save.
Refreshing a Variable
To insert a Refresh Variable step:
  1. Open the Package editor and go to the Diagram tab.
  2. In Designer Navigator, select the variable to add to the Package from the Projects tree for a project variable or from the Others tree for a global variable.
  3. Drag and drop the variable into the diagram. A variable step appears.
  4. Click on the step icon in the diagram. The properties panel opens.
  5. In the General tab, fill in the Step Name field. Select Refresh Variable in the Step Type.
  6. From the File menu, click Save.
Setting a Variable
To insert a Set Variable step:
  1. Open the Package editor and go to the Diagram tab.
  2. In Designer Navigator, select the variable to add to the Package from the Projects tree for a project variable or from the Others tree for a global variable.
  3. Drag and drop the variable into the diagram. A variable step appears.
  4. Click on the step icon in the diagram. The properties panel opens.
  5. In the General tab, fill in the Step Name field. Select Set Variable in the Step Type.
  6. Select Assign or Increment depending on the operation you want to perform on the variable value.
  7. Type in the Value field the value to set or the increment. This value may be another variable.
  8. From the File menu, click Save.
Evaluating a Variable
To insert an Evaluate Variable step:
  1. Open the Package editor and go to the Diagram tab.
  2. In Designer Navigator, select the variable to add to the Package from the Projects tree for a project variable or from the Others tree for a global variable.
  3. Drag and drop the variable into the diagram. A variable step appears.
  4. Click on the step icon in the diagram. The properties panel opens.
  5. In the General tab, fill in the Step Name field. Select Evaluate Variable in the Step Type.
  6. Select the Operator used to compare the variable value.
  7. Type in the Value field the value to compare with your variable. This value may be another variable.
    Note:
    You can specify a list of values in the Value field. When using the IN operator, use the semicolon character (;) to separate the values of a list.
  8. From the File menu, click Save.

Oracle Data Integrator Project Components

Oracle Data Integrator Project Components

The following components are stored into a project. The appear in the in the Project accordion in the Designer Navigator, under the project's node.

Folder
Folders are components that help organizing the work into a project. Sub-folders can be inserted into folders. Folders contain Packages, Interfaces and Procedure.

Packages
The package is the largest unit of execution in Oracle Data Integrator. A package is a workflow, made up of a sequence of steps organized into an execution diagram. Packages assemble and reference other components from a project such as interfaces, procedure or variable.

Interface
An interface is a reusable dataflow. It is set of declarative rules that describe the loading of a datastore or a temporary target structure from one or more source datastores.

Procedure
A Procedure is a reusable component that groups a sequence of operations that do not fit in the interface concept.
Examples of procedures:
  • wait and unzip a file
  • send a batch of files via FTP
  • receive emails
  • purge a database
Variable
A variable's value is stored in Oracle Data Integrator. This value may change during the execution.

Sequence
A sequence is an variable automatically incremented when used. Between two uses the value is persistent.

User Functions
User functions enable to define customized functions or "functions aliases", for which you will define technology-dependant implementations. They are usable in the interfaces and procedures.


Knowledge Modules
Oracle Data Integrator uses Knowledge Modules at several points of a project design. a Knowledge Module is a code template related to a given technology that provides a specific function (loading data, reverse-engineering, journalizing).
Marker
Component of a project may be flagged in order to reflect a methodology or organization. Flags are defined using the markers. These markers are organized into groups, and can be applied to most objects in a project.

When a package, interface, procedure or variable component is finished, it is compiled in a scenario. A scenario is the execution unit for production. Scenarios can be scheduled for automated execution.

Global Components

Global components are similar to the project objects. The main different is their scope. They have a global scope and can be used in any project. Global objects include Variables, Sequences, Markers and User Functions.

Project Life Cycle

The project life cycle depends on the methods and organization of your development team. The following steps must be considered as guidelines for creating, working with and maintaining an integration project.
  1. Create a new project and import Knowledge Modules for this project.
  2. Define the project organization and practises using folders, markers and documentation.
  3. Create reusable components: interfaces, procedures, variables, sequences. Perform unitary tests.
  4. Assemble these components into packages. Perform integration tests.
  5. Release the work in scenarios

Introduction to the Oracle Data Integrator Topology

Introduction to the Oracle Data Integrator Topology

 

  1, "Physical Architecture"

 2, "Contexts"

3, "Logical Architecture"

4, "Agents"

5, "Languages"

6, "Repositories"
 

 

Physical Architecture

The physical architecture defines the different elements of the information system, as well as their characteristics taken into account by Oracle Data Integrator. Each type of database (Oracle, DB2, etc.), file format (XML, Flat File), or application software is represented in Oracle Data Integrator by a technology.
A technology handles formatted data. Therefore, each technology is associated with one or more data types that allow Oracle Data Integrator to generate data handling scripts.
The physical components that store and expose structured data are defined as data servers. A data server is always linked to a single technology. A data server stores information according to a specific technical logic which is declared into physical schemas attached to this data server. Every database server, JMS message file, group of flat files, and so forth, that is used in Oracle Data Integrator, must be declared as a data server. Every schema, database, JMS Topic, etc., used in Oracle Data Integrator, must be declared as a physical schema.
Finally, the physical architecture includes the definition of the Physical Agents. These are the Java software components that run Oracle Data Integrator jobs.


Contexts

Contexts bring together components of the physical architecture (the real Architecture) of the information system with components of the Oracle Data Integrator logical architecture (the Architecture on which the user works).
For example, contexts may correspond to different execution environments (Development, Test and Production) or different execution locations (Boston Site, New-York Site, and so forth.) where similar physical resource exist.
Note that during installation the default GLOBAL context is created.



Logical Architecture

The logical architecture allows a user to identify as a single Logical Schema a group of similar physical schemas - that is containing datastores that are structurally identical - but located in different physical locations. Logical Schemas, like their physical counterpart, are attached to a technology.
Context allow to resolve logical schemas into physical schemas. In a given context, one logical schema resolves in a single physical schema.
For example, the Oracle logical schema Accounting may correspond to two Oracle physical schemas:
  • Accounting Sample used in the Development context
  • Accounting Corporate used in the Production context
These two physical schemas are structurally identical (they contain accounting data), but are located in different physical locations. These locations are two different Oracle schemas (Physical Schemas), possibly located on two different Oracle instances (Data Servers).
All the components developed in Oracle Data Integrator are designed on top of the logical architecture. For example, a data model is always attached to logical schema, and data flows are defined with this model. By specifying a context at run-time, the model's logical schema resolves to a single physical schema, and the data contained in this schema in the data server can be accessed by the integration processes.



Agents

Oracle Data Integrator run-time Agents orchestrate the execution of jobs. These agents are Java components.
The run-time agent functions as a listener and a scheduler agent. The agent executes jobs on demand (model reverses, packages, scenarios, interfaces, and so forth), for example when the job is manually launched from a user interface or from a command line. The agent is also to start the execution of scenarios according to a schedule defined in Oracle Data Integrator.
Third party scheduling systems can also trigger executions on the agent. See Section 20.9.2, "Scheduling a Scenario or a Load Plan with an External Scheduler" for more information.
Typical projects will require a single Agent in production; however, Section 4.3.3, "Load Balancing Agents"describes how to set up several load-balanced agents.'

Agent Lifecycle
The lifecycle of an agent is as follows:
  1. When the agent starts it connects to the master repository.
  2. Through the master repository it connects to any work repository attached to the Master repository and performs the following tasks at startup:
    • Clean stale sessions in each work repository. These are the sessions left incorrectly in a running state after an agent or repository crash.
    • Retrieve its list of scheduled scenarios in each work repository, and compute its schedule.
  3. The agent starts listening on its port.
    • When an execution request arrives on the agent, the agent acknowledges this request and starts the session.
    • The agent launches the sessions start according to the schedule.
    • The agent is also able to process other administrative requests in order to update its schedule, stop a session, respond to a ping or clean stale sessions. The standalone agent can also process a stop signal to terminate its lifecycle.


      Standalone and Java EE Agents
      The Oracle Data Integrator agents exists in two flavors: standalone agent and Java EE agent.
      A standalone agent runs in a separate Java Virtual Machine (JVM) process. It connects to the work repository and to the source and target data servers via JDBC. Standalone agents can be installed on any server with a Java Machine installed. This type of agent is more appropriate when you need to use a resource that is local to one of your data servers (for example, the file system or a loader utility installed with the database instance), and you do not want to install a Java EE application server on this machine.
      A Java EE agent is deployed as a web application in a Java EE application server (for example Oracle WebLogic Server). The Java EE agent can benefit from all the features of the application server (for example, JDBC data sources or clustering for Oracle WebLogic Server). This type of agent is more appropriate when there is a need for centralizing the deployment and management of all applications in an enterprise application server, or when you have requirements for high availability.
      It is possible to mix in a single environment standalone and Java EE agents in a distributed environment.

      Physical and Logical Agents
      A physical agent corresponds to a single standalone agent or a Java EE agent. A physical agent should have a unique name in the Topology.
      Similarly to schemas, physical agents having an identical role in different environments can be grouped under the same logical agent. A logical agent is related to physical agents through contexts. When starting an execution, you indicate the logical agent and the context. Oracle Data Integrator will translate this information into a single physical agent that will receive the execution request.






 

Slowly Changing Dimensions in odi

Slowly Changing Dimensions
Type 2 Slowly Changing Dimension (SCD) is a strategy used for loading data warehouses. It is often used for loading dimension tables, in order to keep track of changes on specific columns. A typical slowly changing dimension table would contain the flowing columns:
  • A surrogate key. This is usually a numeric column containing an automatically-generated number (using an identity column, a rank function or a sequence).
  • A natural key. This is the list of columns that represent the primary key of the operational system.
  • Columns that one must overwritten on change.
  • Columns that require to add row on change.
  • A starting timestamp column indicating when the record was created in the data warehouse
  • An ending timestamp column indicating when the record became obsolete (closing date)
  • A current record flag indicating whether the record is the actual one (1) or an old one (0)
The following example illustrate the Slowly Changing Dimension behavior.
In the operational system, a product is defined by its ID that acts as a primary key. Every product has a name, a size, a supplier and a family. In the Data Warehouse a new version of this product is stored whenever the supplier or the family is updated in the operational system.
Figure 6-1 Type 2 Slow Changing Dimensions Example
Description of Figure 6-1 follows
Description of "Figure 6-1 Type 2 Slow Changing Dimensions Example"
In this example, the product dimension is first initialized in the Data Warehouse on March 12, 2006. All the records are inserted and are assigned a calculated surrogate key as well as a fake ending date set to January 1, 2400. As these records represent the current state of the operational system, their current record flag is set to 1. After the first load, the following changes happen in the operational system:
  1. The supplier is updated for product P1
  2. The family is updated for product P2
  3. The name is updated for product P3
  4. Product P5 is added
These updates have the following impact on the data warehouse dimension:
  • The update of the supplier of P1 is translated into the creation of a new current record (Surrogate Key 5) and the closing of the previous record (Surrogate Key 1)
  • The update of the family of P2 is translated into the creation of a new current record (Surrogate Key 6) and the closing of the previous record (Surrogate Key 2)
  • The update of the name of P3 simply updates the target record with Surrogate Key 3
  • The new product P5 is translated into the creation of a new current record (Surrogate Key 7).
To create a Knowledge Module that implements this behavior, it is necessary to know which columns act as a surrogate key, a natural key, a start date etc. Oracle Data Integrator stores this information in Slowly Changing Dimension Behavior field in the Description tab for every column in the model.
When populating such a datastore in an interface, the IKM has access to this metadata using the SCD_xx selectors on the getColList() substitution method.
The way Oracle Data Integrator implements Type 2 Slowly Changing Dimensions is described below:
  1. Drop (if it exists) and create the integration table in the staging area.
  2. Insert the flow data in the integration table using only mappings that apply to the natural key, overwrite on change and add row on change columns. Set the starting timestamp to the current date and the ending timestamp to a constant.
  3. Recycle previous rejected records
  4. Call the CKM to perform a data quality check on the flow
  5. Flag the records in the integration table to 'U' when the natural key and the add row on change columns have not changed compared to the current records of the target.
  6. Update the target with the columns flagged overwrite on change by using the integration table content filtered on the 'U' flag.
  7. Close old records - those for which the natural key exists in the integration table, and set their current record flag to 0 and their ending timestamp to the current date
  8. Insert the new changing records with their current record flag set to 1
  9. Drop the integration table.
Again, this approach can be adapted. There may be some cases where the SQL produced requires further tuning and optimization.

Strategies with Staging Area on the Target

Strategies with Staging Area on the Target

These strategies are used when the staging area schema is located in the same data server as the target table schema. In this configuration, complex integration strategies can take place
6.1.2.1.1 Append
This strategy simply inserts the incoming data flow into the target datastore, possibly deleting the content of the target beforehand.
This integration strategy includes the following steps:
  1. Delete (or truncate) all records from the target table. This step usually depends on a KM option.
  2. Transform and insert data from sources located on the same server and from loading tables in the staging area. When dealing with remote source data, LKMs will have already prepared loading tables. Sources on the same server can be read directly. The integration operation will be a direct INSERT/SELECT statement leveraging containing all the transformations performed on the staging area in the SELECT clause and on all the transformation on the target in the INSERT clause.
  3. Commit the Transaction. The operations performed on the target should be done within a transaction and committed after they are all complete. Note that committing is typically triggered by a KM option called COMMIT.
The same integration strategy can be obtained by using the Control Append strategy and not choosing to activate flow control.
6.1.2.1.2 Control Append
In the Append strategy, flow data is simply inserted in the target table without any flow control. This approach can be improved by adding extra steps that will store the flow data in an integration table ("I$"), then call the CKM to isolate erroneous records in the error table ("E$").
This integration strategy includes the following steps:
  1. Drop (if it exists) and create the integration table in the staging area. This is created with the same columns as the target table so that it can be passed to the CKM for flow control.
  2. Insert data in the loading table from the sources and loading tables using a single INSERT/SELECT statement similar to the one loading the target in the append strategy.
  3. Call the CKM for flow control. The CKM will evaluate every constraint defined for the target table on the integration table data. It will create an error table and insert the erroneous records into this table. It will also remove erroneous records from the integration table.
    After the CKM completes, the integration table will only contain valid records. Inserting them in the target table can then be done safely.
  4. Remove all records from the target table. This step can be made dependent on an option value set by the designer of the interface
  5. Append the records from the integration table to the target table in a single INSERT/SELECT statement.
  6. Commit the transaction.
  7. Drop the temporary integration table.
Error Recycling
In some cases, it is useful to recycle errors from previous runs so that they are added to the flow and applied again to the target. This method can be useful for example when receiving daily sales transactions that reference product IDs that may not exist. Suppose that a sales record is rejected in the error table because the referenced product ID does not exist in the product table. This happens during the first run of the interface. In the meantime the missing product ID is created by the data administrator. Therefore the rejected record becomes valid and should be re-applied to the target during the next execution of the interface.
This mechanism is implement IKMs by an extra task that inserts all the rejected records of the previous executions of this interface from the error table into integration table. This operation is made prior to calling the CKM to check the data quality, and is conditioned by a KM option usually called RECYCLE_ERRORS.
6.1.2.1.3 Incremental Update
The Incremental Update strategy is used to integrate data in the target table by comparing the records of the flow with existing records in the target according to a set of columns called the "update key". Records that have the same update key are updated when their associated data is not the same. Those that don't yet exist in the target are inserted. This strategy is often used for dimension tables when there is no need to keep track of the records that have changed.
The challenge with such IKMs is to use set-oriented SQL based programming to perform all operations rather than using a row-by-row approach that often leads to performance issues. The most common method to build such strategies often relies on the integration table ("I$") which stores the transformed source sets. This method is described below:
  1. Drop (if it exists) and create the integration table in the staging area. This is created with the same columns as the target table so that it can be passed to the CKM for flow control. It also contains an IND_UPDATE column that is used to flag the records that should be inserted ("I") and those that should be updated ("U").
  2. Transform and insert data in the loading table from the sources and loading tables using a single INSERT/SELECT statement. The IND_UPDATE column is set by default to "I".
  3. Recycle the rejected records from the previous run to the integration table if the RECYCLE_ERROR KM option is selected.
  4. Call the CKM for flow control. The CKM will evaluate every constraint defined for the target table on the integration table data. It will create an error table and insert the erroneous records into this table. It will also remove erroneous records from the integration table.
  5. Update the integration table to set the IND_UPDATE flag to "U" for all the records that have the same update key values as the target ones. Therefore, records that already exist in the target will have a "U" flag. This step is usually an UPDATE/SELECT statement.
  6. Update the integration table again to set the IND_UPDATE column to "N" for all records that are already flagged as "U" and for which the column values are exactly the same as the target ones. As these flow records match exactly the target records, they don't need to be used to update the target data.
    After this step, the integration table is ready for applying the changes to the target as it contains records that are flagged:
    • "I": these records should be inserted into the target.
    • "U": these records should be used to update the target.
    • "N": these records already exist in the target and should be ignored.
  7. Update the target with records from the integration table that are flagged "U". Note that the update statement is typically executed prior to the INSERT statement to minimize the volume of data manipulated.
  8. Insert records in the integration table that are flagged "I" into the target.
  9. Commit the transaction.
  10. Drop the temporary integration table.
Optimization
This approach can be optimized depending on the underlying database. The following examples illustrate such optimizations:
  • With Teradata, it may be more efficient to use a left outer join between the flow data and the target table to populate the integration table with the IND_UPDATE column already set properly.
  • With Oracle, it may be more efficient in some cases to use a MERGE INTO statement on the target table instead of an UPDATE then INSERT.
Update Key
The update key should always be unique. In most cases, the primary key will be used as an update key. The primary key cannot be used, however, when it is automatically calculated using an increment such as an identity column, a rank function, or a sequence. In this case an update key based on columns present in the source must be used.
Comparing Nulls
When comparing data values to determine what should not be updated, the join between the integration table and the target table is expressed on each column as follow:
<target_table>.ColumnN = <loading_table>.ColumnN or (<target_table> is null and <loading_table>.ColumnN is null)
This is done to allow comparison between null values, so that a null value matches another null value. A more elegant way of writing it would be to use the coalesce function. Therefore the WHERE predicate could be written this way:
<%=odiRef.getColList("","coalesce(" + odiRef.getTable("L", "INT_NAME", "A") + ".[COL_NAME], 0) = coalesce(T.[COL_NAME], 0)", " \nand\t", "", "((UPD and !TRG) and !UK) ")%>
Column-Level Insert/Update Behavior
Columns updated by the UPDATE statement are not the same as the ones used in the INSERT statement. The UPDATE statement uses selector "UPD and not UK" to filter only mappings marked as "Update" in the interface and that do not belong to the update key. The INSERT statement uses selector "INS" to retrieve mappings marked as "insert" in the interface.
Transaction
It is important that the UPDATE and the INSERT statements on the target belong to the same transaction. Should any of them fail, no data will be inserted or updated in the target.

Integration Process Overview in odi

Integration Process Overview

The integration process depends strongly on the strategy being used.
The following elements are used in the integration process:
  • An integration table (also known as the flow table) is sometimes needed to stage data after all staging area transformation are made. This loading table is named after the target table, prefixed with I$. This integration table is the image of the target table with extra fields required for the strategy to be implemented. The data in this table is flagged, transformed or checked before being integrated into the target table.
  • The source and/or loading tables (created by the LKM). The integration process loads data from these tables into the integration table or directly into the target tables.
  • Check Knowledge Module. The IKM may initiate a flow check phase to check the data in the integration table against some of the constraints of the target table. Invalid data is removed from the integration table (removed from the flow).
  • Interface metadata, such as Insert, Update, UD1, etc., or model metadata such as the Slowly Changing Dimension behavior are used at integration phase to parameterize column-level behavior in the integration strategies.
A typical integration process works in the following way:
  1. Create a temporary integration table if needed. For example, an update flag taking value I or U to identify which of the rows are to be inserted or updated.
  2. Load data from the source and loading tables into this integration table, executing those of the transformations (joins, filters, mapping) specified on the staging area.
  3. Perform some transformation on the integration table to implement the integration strategy. For example, compare the content of the integration table with the target table to set the update flag.
  4. Modify the content Load data from the integration table into the target table.

Integration Process in odi

Integration Process

An integration process is always needed in an interface. This process integrates data from the source or loading tables into the target datastore, using a temporary integration table.
An integration process uses an integration strategy which defines the steps required in the integration process. Example of integration strategies are:
  • Append: Optionally delete all records in the target datastore and insert all the flow into the target.
  • Control Append: Optionally delete all records in the target datastore and insert all the flow into the target. This strategy includes an optional flow control.
  • Incremental Update: Optionally delete all records in the target datastore. Identify new and existing records by comparing the flow with the target, then insert new records and update existing records in the target. This strategy includes an optional flow control.
  • Slowly Changing Dimension: Implement a Type 2 Slowly Changing Dimension, identifying fields that require a simple update in the target record when change, fields that require to historize the previous record state.

Loading Process and Loading Table Structure in odi

Loading Process

A loading process is required when source data needs to be loaded into the staging area. This loading is needed when some transformation take place in the staging area and the source schema is not located in the same server as the staging area. The staging area is the target of the loading phase.

Loading Process Overview

A typical loading process works in the following way:
  1. A temporary loading table is dropped (if it exists) and then created in the staging area
  2. Data is loaded from the source into this loading table using a loading method.
    Action 1 and 2 are repeated for all the source data that needs to be moved to the staging area.
    The data is used in the integration phase to load the integration table.
  3. After the integration phase, before the interface completes, the temporary loading table is dropped.



    Loading Table Structure

    The loading process creates in the staging area a loading table. This loading table is typically prefixed with a C$.
    A loading table represent a source set and not a source datastore. There is no direct mapping between the sources datastore and the loading table. Source sets appear in the flow tab of the interface editor.
    The following cases illustrate the notion of source set:
  4. If a source CUSTOMER table has only 2 column CUST_NAME, CUST_ID used in mapping and joins on the staging area, then the loading table will only contain an image of these two columns. Columns not needed for the rest of the integration flow not appear the loading table.
  5. If a is CUSTOMER table is filtered on CUST_AGE on the source, and CUST_AGE is not used afterwards, then the loading table will not include CUST_AGE. The loading process will process the filter in the source data server, and the loading table will contain the filtered records.
  6. If two table CUSTOMER and SALES_REPS are combined using a join on the source and the resulting source set is used in transformations taking place in the staging area, the loading table will contain the combined columns from CUSTOMER and SALES_REPS.
  7. If all the columns of a source datastore are mapped and this datastore is not joined on the source, then the source set is the whole source datastore. In that case, the loading table is the exact image of the source datastore. This is the case for source technologies with no transformation capabilities such as File.

Loading Using the Agent in odi

Loading Using the Agent

The run-time agent is able to read a result set using JDBC on a source server and write this result set using JDBC to the loading table in the staging area. To use this method, the knowledge module needs to include a command with a SELECT on the source with a corresponding INSERT on the target.
This method may not be suited for large volumes as data is read row-by-row in arrays, using the array fetch feature, and written row-by-row, using the batch update feature.

ODI Loading Using RDBMS-Specific Strategies

Loading Using RDBMS-Specific Strategies

Certain RDBMSs have a mechanism for transferring data across servers. For example:
  • Oracle: database links
  • Microsoft SQL Server: linked servers
  • IBM DB2 400: DRDA file transfer
Other databases implement specific mechanisms for loading files into a table, such as Oracle's External Table feature.
These loading strategies are implemented into specific KM that create the appropriate objects (views, dblinks, etc.) and implement the appropriate commands for using these features.

ODI Loading Using Unload/Load

Loading Using Unload/Load

When the source result set is on a remote database server, an alternate solution to using the agent to transfer the data is to unload it to a file and then load that file into the staging area.
This is usually the most efficient method when dealing with large volumes across heterogeneous technologies. For example, you can unload data from a Microsoft SQL Server source using bcp and load this data into an Oracle staging area using SQL*Loader.
The steps of LKMs that follow this strategy are often as follows:
  1. Drop and create the loading table in the staging area
  2. Unload the data from the source to a temporary flat file using either a source database unload utility (such as Microsoft SQL Server bcp or DB2 unload) or the built-in OdiSqlUnload tool.
  3. Generate the script required by the loading utility to load the temporary file to the loading table.
  4. Execute the appropriate operating system command to start the load and check its return code.
  5. Possibly analyze any log files produced by the utility for error handling.
  6. Drop the loading table once the integration KM has terminated, and drop the temporary file.
When using an unload/load strategy, data needs to be staged twice: once in the temporary file and a second time in the loading table, resulting in extra disk space usage and potential efficiency issues. A more efficient alternative would be to use pipelines between the "unload" and the "load" utility. Unfortunately, not all the operating systems support file-based pipelines (FIFOs).

ODI Loading File Using Loaders

Loading File Using Loaders

When the interface contains a flat file as a source, you may want to use a strategy that leverages the most efficient loading utility available for the staging area technology, rather than the standard LKM File to SQL that uses the ODI built-in driver for files. Most RDBMSs have fast loading utilities to load flat files into tables, such as Oracle's SQL*Loader, Microsoft SQL Server bcp, Teradata FastLoad or MultiLoad.
Such LKM will load the source file into the staging area, and all transformations will take place in the staging area afterwards.
A typical LKM using a loading utility will include the following sequence of steps:
  1. Drop and create the loading table in the staging area
  2. Generate the script required by the loading utility to load the file to the loading table.
  3. Execute the appropriate operating system command to start the load and check its return code.
  4. Possibly analyze any log files produced by the utility for error handling.
  5. Drop the loading table once the integration phase has completed.

ODI User Interfaces

User Interfaces

Administrators, Developers and Operators use the Oracle Data Integrator Studio to access the repositories. This Fusion Client Platform (FCP) based UI is used for administering the infrastructure (security and topology), reverse-engineering the metadata, developing projects, scheduling, operating and monitoring executions.
Business users (as well as developers, administrators and operators), can have read access to the repository, perform topology configuration and production operations through a web based UI called Oracle Data Integrator Console. This Web application can deployed in a Java EE application server such as Oracle WebLogic.
ODI Studio provides four Navigators for managing the different aspects and steps of an ODI integration project:
Topology Navigator
Topology Navigator is used to manage the data describing the information system's physical and logical architecture. Through Topology Navigator you can manage the topology of your information system, the technologies and their datatypes, the data servers linked to these technologies and the schemas they contain, the contexts, the language and the agents, as well as the repositories. The site, machine, and data server descriptions will enable Oracle Data Integrator to execute the same interfaces in different environments.
Designer Navigator
Designer Navigator is used to design data integrity checks and to build transformations such as for example:
  • Automatic reverse-engineering of existing applications or databases
  • Graphical development and maintenance of transformation and integration interfaces
  • Visualization of data flows in the interfaces
  • Automatic documentation generation
  • Customization of the generated code
The main objects you handle through Designer Navigator are Models and Projects.
Operator Navigator
Operator Navigator is the production management and monitoring tool. It is designed for IT production operators. Through Operator Navigator, you can manage your interface executions in the sessions, as well as the scenarios in production.
Security Navigator
Security Navigator is the tool for managing the security information in Oracle Data Integrator. Through Security Navigator you can create users and profiles and assign user rights for methods (edit, delete, etc) on generic objects (data server, datatypes, etc), and fine-tune these rights on the object instances (Server 1, Server 2, etc).