Thursday, July 3, 2014

Informatica All Objects Overview

 Informatica All Objects Overview 
Informatica is a powerful Extraction, Transformation, and Loading tool and is been deployed for data warehouse development in the Business Intelligence Team. Informatica comes with the following clients to perform various tasks.
       Designer? – used to develop transformations/mappings
        Workflow Manager? / Workflow Monitor replace the Server Manager - used to create sessions / workflows/ worklets to run, schedule, and monitor mappings for data movement
        Repository Manager? – used to maintain folders, users, permissions, locks, and repositories.
        Integration Services? – the “workhorse” of the domain. Informatica Server is the component responsible for the actual work of moving data according to the mappings developed and placed into operation. It contains several distinct parts such as the Load Manager, Data Transformation Manager, Reader and Writer.
        Repository Services?-  Informatica client tools and  Informatica Server connect to the repository database over the network through the Repository Server.
Dimension?:A dimension is an organized hierarchy of categories, known as levels, that describes data in data warehouse fact tables

The various types of dimensions are :
1) Shared and Private Dimensions: Describes the basic differences between shared and private dimensions and their uses
2) Regular Dimensions: Provides information about regular dimensions and their variations
3) Parent-Child Dimensions: Describes the creation of parent-child dimensions and identifies their advantages and restrictions
4) Data Mining Dimensions: Describes the creation of data mining dimensions and identifies advantages and restrictions to their use
5) Virtual Dimensions:Describes the creation of virtual dimensions and their advantages and restrictions
6) Dependent Dimensions: Describes the creation of dependent dimensions and identifies their advantages and restrictions
7) Write-Enabled Dimensions: Describes the creation of write-enabled dimensions and identifies their advantages and restrictions
Source System?
A database, application, file, or other storage facility from which the data in a data warehouse is derived.
Mapping?
The definition of the relationship and data flow between source and target objects.
Meta data?
Data that describes data and other structures, such as objects, business rules, and processes. For example, the schema design of a data warehouse is typically stored in a repository as meta data, which is used to generate scripts used to build and populate the data warehouse. A repository contains meta data.
Staging Area?
A place where data is processed before entering the warehouse.
Cleansing?
The process of resolving inconsistencies and fixing the anomalies in source data, typically as part of the ETL process.
Transformation?
The process of manipulating data. Any manipulation beyond copying is a transformation. Examples include cleansing, aggregating, and integrating data from multiple sources.
Transportation?
The process of moving copied or transformed data from a source to a data warehouse.
Target System?
A database, application, file, or other storage facility to which the “transformed source data” is loaded in a data warehouse.

System Variables:?
$$$SessStartTime returns the initial system date value on the machine hosting the Integration Service when the server initializes a session. $$$SessStartTime returns the session start time as a string value. The format of the string depends on the database you are using.

Session:? A session is a set of instructions that tells informatica Server how to move data from sources to targets.

WorkFlow?: A workflow is a set of instructions that tells Informatica Server how to execute tasks such as sessions, email notifications and commands. In a workflow multiple sessions can be included to run in parallel or sequential manner.
Source Definition?: The Source Definition is used to logically represent database table or Flat files.
Target Definition?: The Target Definition is used to logically represent a database table or file in the Data Warehouse / Data Mart.
Aggregator?: The Aggregator transformation is used to perform Aggregate calculations on group basis.
Expression:? The Expression transformation is used to perform the arithmetic calculation on row by row basis and also used to convert string to integer vis and concatenate two columns.
Filter?: The Filter transformation is used to filter the data based on single condition and pass through next transformation.

Router?: The router transformation is used to route the data based on multiple conditions and pass through next transformations.
It has three groups
1) Input group
2) User defined group
3) Default group
Joiner: The Joiner transformation is used to join two sources residing in different databases or different locations like flat file and oracle sources or two relational tables existing in different databases.
Source Qualifier: The Source Qualifier transformation is used to describe in SQL the method by which data is to be retrieved from a source application system and also
 used to join two relational sources residing in same databases.
Parameters and Variables?
Parameter file it will supply the values to session level variables and mapping level variables.
Variables are of two types:?
·        Session level variables
·        Mapping level variables
Session level variables are of four types:
        $DBConnection_Source
        $DBConnection_Target
        $InputFile
        $OutputFile
Mapping level variables are of two types:
        Variable
        Parameter
What is the difference between mapping level and session level variables?
Mapping level variables always starts with $$.
A session level variable always starts with $.
Flat File?
Flat file is a collection of data in a file in the specific format.
Informatica can support two types of files
·        Delimiter
·        Fixed Width
In delimiter we need to specify the separator.
In fixed width we need to known about the format first. Means how many character to read for particular column.
In delimiter also it is necessary to know about the structure of the delimiter. Because to know about the headers.
If the file contains the header then in definition we need to skip the first row.
List file:
If you want to process multiple files with same structure. We don’t need multiple mapping and multiple sessions.
We can use one mapping one session using list file option.
First we need to create the list file for all the files. Then we can use this file in the main mapping.
Informatica Transformations?
Mapping: Mapping is the Informatica Object which contains set of transformations including source and target. Its look like pipeline.
Mapplet:?
Mapplet is a set of reusable transformations. We can use this mapplet in any mapping within the Folder.
A mapplet can be active or passive depending on the transformations in the mapplet. Active mapplets contain one or more active transformations. Passive mapplets contain only passive transformations.
When you add transformations to a mapplet, keep the following restrictions in mind:
If you use a Sequence Generator transformation, you must use a reusable Sequence Generator transformation.
If you use a Stored Procedure transformation, you must configure the Stored Procedure Type to be Normal.
You cannot include the following objects in a mapplet:
Normalizer  transformations
COBOL sources
XML Source Qualifier transformations
XML sources
Target definitions
Other mapplets?
The mapplet contains Input transformations and/or source definitions with at least one port connected to a transformation in the mapplet.
The mapplet contains at least one Output transformation with at least one port connected to a transformation in the mapplet.?
Input Transformation: Input transformations are used to create a logical interface to a mapplet in order to allow data to pass into the mapplet.
Output Transformation: Output transformations are used to create a logical interface from a mapplet in order to allow data to pass out of a mapplet.
Types of facts?
There are three types of facts:
Additive: Additive facts are facts that can be summed up through all of the dimensions in the fact table.
Semi-Additive: Semi-additive facts are facts that can be summed up for some of the dimensions in the fact table, but not the others.
Non-Additive: Non-additive facts are facts that cannot be summed up for any of the dimensions present in the fact table.
What is Factless Fact Table?
Factless fact table captures the many-to-many relationships between dimensions, but contains no numeric or textual facts. They are often used to record events or coverage information.
Common examples of factless fact tables include:
Identifying product promotion events (to determine promoted products that didn’t sell)
Tracking student attendance or registration events
Tracking insurance-related accident events
Fact Table?
A Fact Table in a dimensional model consists of one or more numeric facts of importance to a business.  Examples of facts are as follows:
·         the number of products sold
·         the value of products sold
·         the number of products produced
the number of service calls received
 Fact ?
A "fact" is a numeric value that a business wishes to count or sum.  A "dimension" is essentially an entry point for getting at the facts. Dimensions are things of interest to the business.
A set of level properties that describe a specific aspect of a business, used for analyzing the factual measures.
snowflake schema?
Unlike Star-Schema, Snowflake  schema contain normalized dimension tables in a tree like structure with many nesting levels.
Snowflake schema is easier to maintain but queries require more joins
star schema?
Star schema is a data warehouse schema where there is only one  "fact table" and many denormalized dimension tables.
Fact table contains primary keys from all the dimension tables and other numeric columns columns of additive, numeric facts.
Schema?
Graphical Representation of the data structure. First Phase in implementation of Universe or Data Warehouse
DataMart?
Datamart is usually sponsored at the department level and developed with a specific  details or subject in mind, a Data Mart is a  subset of data warehouse with a focused objective.
Hierarchy?
A logical structure that uses ordered levels as a means of organizing data. A hierarchy can be used to define data aggregation; for example, in a time dimension, a hierarchy might be used to aggregate data from the Month level to the Quarter level, from the Quarter level to the Year level. A hierarchy can also be used to define a navigational drill path, regardless of whether the levels in the hierarchy represent aggregated totals or not.
Level?
A position in a hierarchy. For example, a time dimension might have a hierarchy that represents data at the Month, Quarter, and Year levels.





No comments:

Post a Comment