Informatica All Objects Overview
Informatica is a powerful Extraction,
Transformation, and Loading tool and is been deployed for data warehouse
development in the Business Intelligence Team. Informatica comes with the
following clients to perform various tasks.
Designer? – used to develop
transformations/mappings
Workflow Manager? / Workflow
Monitor replace the Server Manager - used to create sessions / workflows/
worklets to run, schedule, and monitor mappings for data movement
Repository Manager? – used to
maintain folders, users, permissions, locks, and repositories.
Integration Services? – the “workhorse” of
the domain. Informatica Server is the component responsible for the actual work
of moving data according to the mappings developed and placed into operation.
It contains several distinct parts such as the Load Manager, Data
Transformation Manager, Reader and Writer.
Repository Services?- Informatica
client tools and Informatica Server connect to the repository database
over the network through the Repository Server.
Dimension?:A dimension is an organized hierarchy of categories, known as
levels, that describes data in data warehouse fact tables
The various types of dimensions are :
1) Shared and Private Dimensions: Describes the basic differences between shared and private
dimensions and their uses
2) Regular Dimensions: Provides information about regular dimensions and their
variations
3) Parent-Child Dimensions: Describes the creation of parent-child dimensions and
identifies their advantages and restrictions
4) Data Mining Dimensions: Describes the creation of data mining dimensions and
identifies advantages and restrictions to their use
5) Virtual Dimensions:Describes the creation of virtual dimensions and their
advantages and restrictions
6) Dependent Dimensions: Describes the creation of dependent dimensions and
identifies their advantages and restrictions
7) Write-Enabled Dimensions: Describes the creation of write-enabled dimensions and
identifies their advantages and restrictions
Source System?
A database, application, file, or other
storage facility from which the data in a data warehouse is derived.
The definition of the relationship and data
flow between source and target objects.
Data that describes data and other structures,
such as objects, business rules, and processes. For example, the schema design
of a data warehouse is typically stored in a repository as meta data, which is
used to generate scripts used to build and populate the data warehouse. A
repository contains meta data.
A place where data is processed before
entering the warehouse.
The process of resolving inconsistencies and
fixing the anomalies in source data, typically as part of the ETL process.
The process of manipulating data. Any
manipulation beyond copying is a transformation. Examples include cleansing,
aggregating, and integrating data from multiple sources.
The process of moving copied or transformed
data from a source to a data warehouse.
A database, application, file, or other
storage facility to which the “transformed source data” is loaded in a data
warehouse.
System Variables:?
$$$SessStartTime returns the initial system
date value on the machine hosting the Integration Service when the server
initializes a session. $$$SessStartTime returns the session start time as a
string value. The format of the string depends on the database you are using.
Session:? A session is a set of instructions that tells
informatica Server how to move data from sources to targets.
WorkFlow?: A workflow is a set of instructions that
tells Informatica Server how to execute tasks such as sessions, email
notifications and commands. In a workflow multiple sessions can be included to
run in parallel or sequential manner.
Source Definition?: The Source Definition is used to logically
represent database table or Flat files.
Target Definition?: The Target Definition is used to logically
represent a database table or file in the Data Warehouse / Data Mart.
Aggregator?: The Aggregator transformation is used to
perform Aggregate calculations on group basis.
Expression:? The Expression transformation is used to
perform the arithmetic calculation on row by row basis and also used to convert
string to integer vis and concatenate two columns.
Filter?: The Filter transformation is used to filter
the data based on single condition and pass through next transformation.
Router?: The router transformation is used to route
the data based on multiple conditions and pass through next transformations.
It has three groups
1) Input group
2) User defined group
3) Default group
Joiner: The Joiner transformation is used to join two
sources residing in different databases or different locations like flat file
and oracle sources or two relational tables existing in different databases.
Source Qualifier: The Source Qualifier transformation is used
to describe in SQL the method by which data is to be retrieved from a source
application system and also
used to
join two relational sources residing in same databases.
Parameters and
Variables?
Parameter file it will supply the values to
session level variables and mapping level variables.
Variables are of two
types:?
·
Session level variables
·
Mapping level variables
Session level variables are of four types:
$DBConnection_Source
$DBConnection_Target
$InputFile
$OutputFile
Mapping level variables are of two types:
Variable
Parameter
What is the difference between mapping level
and session level variables?
Mapping level variables always starts with $$.
A session level variable always starts with $.
Flat File?
Flat file is a collection of data in a file in
the specific format.
Informatica can support two types of files
·
Delimiter
·
Fixed Width
In delimiter we need to specify the separator.
In fixed width we need to known about the
format first. Means how many character to read for particular column.
In delimiter also it is necessary to know
about the structure of the delimiter. Because to know about the headers.
If the file contains the header then in
definition we need to skip the first row.
List file:
If you want to process multiple files with
same structure. We don’t need multiple mapping and multiple sessions.
We can use one mapping one session using list
file option.
First we need to create the list file for all
the files. Then we can use this file in the main mapping.
Informatica
Transformations?
Mapping: Mapping is the Informatica Object which
contains set of transformations including source and target. Its look like
pipeline.
Mapplet:?
Mapplet is a set of reusable transformations.
We can use this mapplet in any mapping within the Folder.
A mapplet can be active or passive depending
on the transformations in the mapplet. Active mapplets contain one or more
active transformations. Passive mapplets contain only passive transformations.
When you add transformations to a mapplet,
keep the following restrictions in mind:
If you use a Sequence Generator
transformation, you must use a reusable Sequence Generator transformation.
If you use a Stored Procedure transformation,
you must configure the Stored Procedure Type to be Normal.
You cannot include the following objects in a
mapplet:
Normalizer transformations
COBOL sources
XML Source Qualifier transformations
XML sources
Target definitions
Other mapplets?
The mapplet contains Input transformations
and/or source definitions with at least one port connected to a transformation
in the mapplet.
The mapplet contains at least one Output
transformation with at least one port connected to a transformation in the mapplet.?
Input Transformation: Input transformations
are used to create a logical interface to a mapplet in order to allow data to
pass into the mapplet.
Output Transformation: Output transformations
are used to create a logical interface from a mapplet in order to allow data to
pass out of a mapplet.
Types of facts?
There are three types of facts:
Additive: Additive facts are facts that can be summed
up through all of the dimensions in the fact table.
Semi-Additive: Semi-additive facts are facts that can be
summed up for some of the dimensions in the fact table, but not the others.
Non-Additive: Non-additive facts are facts that cannot be
summed up for any of the dimensions present in the fact table.
What is Factless Fact
Table?
Factless fact table captures the many-to-many
relationships between dimensions, but contains no numeric or textual facts.
They are often used to record events or coverage information.
Common examples of factless fact tables
include:
Identifying product promotion events (to
determine promoted products that didn’t sell)
Tracking student attendance or registration
events
Tracking insurance-related accident events
Fact Table?
A Fact Table in a dimensional model consists
of one or more numeric facts of importance to a business. Examples of facts are as follows:
·
the number of products sold
·
the value of products sold
·
the number of products produced
the number of service calls received
Fact ?
A "fact" is a numeric value that a
business wishes to count or sum. A
"dimension" is essentially an entry point for getting at the facts.
Dimensions are things of interest to the business.
A set of level properties that describe a
specific aspect of a business, used for analyzing the factual measures.
snowflake
schema?
Unlike Star-Schema, Snowflake schema contain normalized dimension tables in
a tree like structure with many nesting levels.
Snowflake schema is easier to maintain but
queries require more joins
star schema?
Star schema is a data warehouse schema where
there is only one "fact table"
and many denormalized dimension tables.
Fact table contains primary keys from all the
dimension tables and other numeric columns columns of additive, numeric facts.
Schema?
Graphical Representation of the data
structure. First Phase in implementation of Universe or Data Warehouse
DataMart?
Datamart is usually sponsored at the
department level and developed with a specific
details or subject in mind, a Data Mart is a subset of data warehouse with a focused
objective.
Hierarchy?
A logical structure that uses ordered levels
as a means of organizing data. A hierarchy can be used to define data
aggregation; for example, in a time dimension, a hierarchy might be used to
aggregate data from the Month level to the Quarter level, from the Quarter
level to the Year level. A hierarchy can also be used to define a navigational
drill path, regardless of whether the levels in the hierarchy represent
aggregated totals or not.
Level?
A position in a hierarchy. For example, a time
dimension might have a hierarchy that represents data at the Month, Quarter,
and Year levels.
No comments:
Post a Comment