Friday, March 29, 2013

Bitmap Index Overview

Bitmap Index

Overview

A bitmap index is an indexing method that can provide both performance benefits and storage savings. Bitmap indexes are particularly useful for data warehousing environments because data is usually updated less frequently and ad hoc queries are more common.
While B-tree indexes are the most effective method for “high-cardinality” data (data where there are many possible values, such as customer_name or phone_number), bit-mapped indexes are best for “low-cardinality” data (such as a column to indicate a person’s gender, which contains only two possible values: MALE and FEMALE).

Building a bitmap index

A portion of a sample company’s customer data is shown below:

customer#     marital_status     region     gender     income_level
101           single             east       male       bracket_1
102           married            central    female     bracket_4
103           married            west       female     bracket_2
104           divorced           west       male       bracket_4
105           single             central    female     bracket_2
106           married            central    female     bracket_3

Since MARITAL_STATUS, REGION, GENDER, and INCOME_LEVEL are all “low-cardinality” columns (there are only three possible marital status’s, three possible regions, two possible genders, and four possible income levels), it’s appropriate to create bitmap indexes on these columns. A bitmap index should not be created on CUSTOMER#, because this is a “high-cardinality” column, in fact it is the primary key of this table which means each value is unique. Instead, an ordinary, unique B-tree index should be created on this column in order to provide the most efficient representation and retrieval.
The bitmap index for the REGION column consists of three separate bit-maps, one for each region. These bit-maps are shown below:

region='east'   region='central'   region='west'
      1               0                  0
      0               1                  0
      0               0                  1
      0               0                  1
      0               1                  0
      0               1                  0

Each entry (or “bit”) in the bitmap corresponds to a single row of the CUSTOMER table. The value of each bit depends upon the values of the corresponding row in the table. For example, the bitmap REGION=‘east’ contains a one as its first bit; this is due to the fact that the REGION is ‘east’ in the first row of the CUSTOMER table. The bitmap REGION=‘east’ has a zero for its other bits because none of the other rows of the CUSTOMER table contain ‘east’ as their value for REGION. With this scheme, it is impossible for a single row to have a bit set in more than one bitmap for that column.

Using a bitmap index

Suppose you want to look at the demographic trends of the company’s customers. A sample question might be: “How many of our married customers live in the central or west regions?” which could be coded in an SQL query as:

select count(*) from customer 
 where marital_status = 'married' 
   and region in ('central','west');

Bitmap indexes can be used to process this query. One bitmap is used to evaluate each predicate, and the bitmaps are logically combined to evaluate the AND’s and the OR’s. For example:

By counting the number of 1’s in the resulting bitmap above, the result of this query can be easily computed. If you further wanted to know the specific customers that satisfied the criteria, instead of simply the number of customers, the resulting bitmap would be used to access the table. In this manner, bitmap indexes are said to cooperate with each other. The bitmap indexes for multiple columns can be AND’ed or OR’ed together to determine the one bitmap that will satisfy the query. All of this is done before the table is even accessed. B-tree indexes do not cooperate with each other. Multiple bitmap indexes can be used to satisfy this query. Only one B-tree index can be used for this query, even if multiple B-tree indexes exist.

When to use a bitmap index

There are three things to consider when choosing an index method:

performance
storage
maintainability

The advantage for using bitmap indexes are their performance impact for certain queries and their relatively small storage requirements. Note however that bitmap indexes are not applicable to every query and bitmap indexes, like B-tree indexes, can impact the performance of insert, update, and delete statements. When you modify table that has a bitmap index, the table must be locked to update the bitmaps. This can lead to poor performance in situations where the table has a high degree of modification concurrency.

Performance Considerations

Bitmap indexes can provide very impressive performance improvements. Execution times of certain queries may improve by several orders of magnitude. The queries that benefit the most from bitmap indexes have the following characteristics:

the WHERE-clause contains multiple predicates on low-cardinality columns
the individual predicates on these low-cardinality columns select a large number of rows
bitmap indexes have been created on some or all of these low-cardinality columns
the tables being queried contain many rows

An advantage of bitmap indexes is that multiple bitmap indexes can be used to evaluate the conditions on a single table. Thus, bitmap indexes are very useful for complex ad hoc queries that contain lengthy WHERE-clauses.

Storage considerations

Bitmap indexes incur a small storage cost and have a significant storage savings over B-tree indexes. A bitmap index can require 100 times less space than a B-tree index for a low-cardinality column.
Important Note:
A strict comparison of the relative sizes of B-tree and bitmap indexes is not an accurate measure for selecting bitmap over B-tree indexes. Because of the performance characteristics of bitmap indexes and B-tree indexes, you should continue to maintain B-tree indexes on your high-cardinality data. Bitmap indexes should be considered primarily for your low-cardinality data.
With bitmap indexes, the problems associated with multiple-column B-tree indexes is solved because bitmap indexes can be efficiently combined during query execution; they cooperate with each other. Note that while the bitmap indexes may not be quite as efficient during execution as the appropriate concatenated B-tree indexes, the space savings provided by bitmap indexes can often more than justify their utilization.
The net storage savings will depend upon a database’s current usage of B-tree indexes:

A database that relies on single-column B-tree indexes on high-cardinality columns will not observe significant space savings (but should see significant performance increases).
A database containing a significant number of concatenated B-tree indexes could reduce its index storage usage by 50% or more, while maintaining similar performance characteristics.
A database that lacks concatenated B-tree indexes because of storage constraints will be able to use bitmap indexes and increase performance with minimal storage costs.

Bitmap indexes are best for read-only or light OLTP environments. Because there is no effective method for locking a single bit, row-level locking is not available for bitmap indexes. Instead, locking for bitmap indexes is effectively at the block level which can impact heavy OLTP environments. Note also that like other types of indexes, updating bitmap indexes is a costly operation.
Though bitmap indexes are not appropriate for databases with a heavy load of insert, update, and delete operations, their effectiveness in a data warehousing environment is not diminished. In such environments, data is usually maintained via bulk inserts and updates. For these bulk operations, rebuilding or refreshing bitmap indexes is an efficient operation. The storage savings and performance gains provided by bitmap indexes can provide tremendous benefits to data warehouse users.

Performance/storage characteristics

In preliminary testing of bitmap indexes, certain queries ran up to 100 times faster. The bitmap indexes on low cardinality columns were also about ten times smaller than B-tree indexes on the same columns. In these tests, the queries containing multiple predicates on low-cardinality data experienced the most significant speedups. Queries that did not conform to this characteristic were not assisted by bitmap indexes.

Example CREATE Syntax

CREATE BITMAP INDEX myindex ON mytable(col1);

Example scenarios

The following sample queries on the CUSTOMERS table demonstrate the variety of query-processing techniques that are necessary for optimal performance.
Example #1: Single predicate on a low-cardinality attribute.

select * from customers 
 where gender = 'male';

Best approach: parallel table scan
This query will return approximately 50% of the data. Since we will be accessing such a large number of rows, it is more efficient to scan the entire table rather than use either bitmap indexes or B-tree indexes. To minimize elapsed time, the Server should execute this scan in parallel.
Example #2: Single predicate on a high-cardinality attribute.

select * from customers 
 where customer# = 101;

Best approach: conventional unique index
This query will retrieve at most one record from the employee table. A B-tree index or hash cluster index is always appropriate for retrieving a small number of records based upon criteria on the indexed columns.
Example #3: Multiple predicates on low-cardinality attributes

select * from customers 
 where gender = 'male' 
   and region in ('central','west') 
   and marital_status in ('married', 'divorced');

Best approach: bit-mapped index
Though each individual predicate specifies a large number of rows, the combination of all three predicates will return a relatively small number of rows. In this scenario, bitmap indexes provide substantial performance benefits.
Example #4: Multiple predicates on both high-cardinality and low-cardinality attributes.

select * from customers 
 where gender = 'male' 
   and customer# < 100;

Best approach: B-tree index on CUSTOMER#
This query returns a small number of rows because of the highly selective predicate on CUSTOMER#. It is more efficient to use a B-tree index on CUSTOMER# than to use a bitmap index on GENDER.
In each of the previous examples, the Oracle cost-based optimizer transparently determines the most efficient query-processing technique.

A special note on cardinality

As has been noted, bitmap indexes work best with low cardinality. The most often examples of columns with low cardinality are gender columns (Male/Female) and truth columns (True/False). In some situtations, these columns should not even be candidates for any index, let alone bitmap indexes. If you have a company database and there is a table named EMPLOYEES, then you might have a column called GENDER. If your company is representative of the general population distribution, then approximately have of your company will be male and the other half will be female. If your query is looking for all of the female employees in your table, then approximately half of the rows of data in that table would be returned. When you return that many rows from a table, a full table scan would be faster than using any index. So while bitmap indexes should be used for low cardinality columns, there is a danger in columns that have too low of a cardinality.
That being said, bitmap indexes, even on such low cardinality columns, can still help with queries. As has been shown above, bitmap indexes can cooperate with each other. When you issue a query such as the follows:

select * from customers 
 where gender = 'male' 
   and region in ('central','west');

This query can use the bitmap index on the GENDER column to filter out approximately half of the rows. The bitmap index on the REGION column can be used to further filter the rows down. In cooperating together, bitmap indexes achieve their greatest benefits.

Bitmap Join Indexes

Bitmap indexes were introduced back in Oracle 8. Now with Oracle 9i, we can create bitmap join indexes. Bitmap join indexes represent the join of columns in two or more tables. With a bitmap join index, the value of a column in one table (generally a dimension table) is stored with the associated ROWIDs of the like value in the other tables that the index is defined on. This provides fast join access between the tables-if that query uses the columns of the bitmap join index. In a data-warehouse environment, a bitmap join index might be a more efficient way of accessing data than a materialized view join. When using a bitmap join index in a warehouse or EIS environment, you would create the join using an equi-inner join between the primary key column(s) of the dimension tables and the foreign key column(s) of the fact table.
You create a bitmap join index using the CREATE BITMAP INDEX command. Here is an example of the creation of a bitmap index:

CREATE BITMAP INDEX my_bitmap_index
ON fact_table(dimension_table.col_2)
FROM dimension_table, fact_table
WHERE dimension_table.col1=fact_table.col1;

There are a few restrictions on bitmap join indexes that you should be aware of. These include the following:

The bitmap join index is built on a single table. In the previous example, the bitmap join index is built on the FACT_TABLE table.
Oracle will allow only one of the tables of a bitmap join index to be updated, inserted, or deleted from at a time.
You cannot join the same table to itself, that is, no table can appear twice in the FROM clause.
A bitmap join index cannot be created on an index-organized table or a temporary table.
Every column in the bitmap join index must be present in one of the associated dimension tables. The joins in the bitmap index must form either a star or snowflake schema.
Either primary key columns or unique constraints must be created on the columns that will be join columns in the bitmap join index.
All the primary key columns of the dimension table must be part of the join criteria of the bitmap join index.
All restrictions on normal bitmap indexes apply to bitmap join indexes.
You cannot create a bitmap join index on an IOT.

SCD Type 2 error: ODI-1228:ORA-01747: invalid user.table.column, table.column, or column specification

ODI-1228: Caused By: java.sql.SQLSyntaxErrorException: ORA-01747: invalid user.table.column, table.column, or column specification

This is common issue.

While enabling your SCD table column SCD properties if you not enable "Overwrite On Change"
this property simply

Goto =>SCD IKM=>select Ignore error option on Update Existing Rows.

Oracle Job Schedulers examples

Oracle Job Schedulers examples

The following code examples rely on the previously defined programs and schedules to show how the overloads of the create_job procedure are used.
BEGIN
-- Job defined entirely by the CREATE JOB procedure.
DBMS_SCHEDULER.create_job (
    job_name        => 'test_full_job_definition',
    job_type        => 'PLSQL_BLOCK',
    job_action      => 'BEGIN my_job_proc(''CREATE_PROGRAM (BLOCK)''); END;',
    start_date      => SYSTIMESTAMP,
    repeat_interval => 'freq=hourly; byminute=0',
    end_date        => NULL,
    enabled         => TRUE,
    comments        => 'Job defined entirely by the CREATE JOB procedure.');
END;
/
BEGIN
-- Job defined by an existing program and schedule.
DBMS_SCHEDULER.create_job (
    job_name      => 'test_prog_sched_job_definition',
    program_name => 'test_plsql_block_prog',
    schedule_name => 'test_hourly_schedule',
    enabled       => TRUE,
    comments      => 'Job defined by an existing program and schedule.');
END;
/

Get a list of MViews depending on table using DBMS_MVIEW.GET_MV_DEPENDENCIES

DECLARE
   v_output   VARCHAR2 (2000);
   v_owner    VARCHAR2 (30);
   v_table    VARCHAR2 (30);
BEGIN
   v_owner := UPPER ('&&ENTER_OWNER_NAME');
   v_table := UPPER ('&&ENTER_TABLE_NAME');
   DBMS_MVIEW.GET_MV_DEPENDENCIES (v_owner || '.' || v_table, v_output);
   DBMS_OUTPUT.put_line (CHR (13));
   DBMS_OUTPUT.put_line
      ('Materialized Views Dependent on table      &&ENTER_OWNER_NAME..&&ENTER_TABLE_NAME'
      );
   DBMS_OUTPUT.put_line (CHR (13));
   DBMS_OUTPUT.put_line (v_output);
END;
/

Thursday, March 28, 2013

B-Tree Index

A B-tree is a special data structure format for an index that allows rapid access of the data in the index. One of the properties of this data structure is that the index is always balanced. That means that each node at the lowest level is equidistant from the top most node, or the root node of the tree. And each side of the index has the same number of nodes. The nodes at the lowest levels are known as leaf nodes. All other nodes are known as branch nodes. Branches point to other branches or leaf nodes. Leaf nodes store value-rowid pairs, or the values of the indexed columns and the rowid that points to a distinct row that has those values. The actual distribution will depend on the number of data values in each range of values in a B-tree with the overall goal to reduce the number of required levels that must be traversed to get to a specific value. The advantages of a B-tree structure are:

All leaf blocks are of the same depth (number of values).
The height of the B-tree, or the total number of nodes from the root to any leaf, is typically pretty small. In some cases, the root node is the only leaf node and the height is 1. As the table gets more rows inserted into it, the index must grow to accommodate this. But even in tables with over 1 million rows, the B-tree index typically has a height of 3. In the very largest of tables, the height may only be 4. This means that for even the largest of tables, it only takes 4 blocks to find the rowid of the row you are looking for. This is extremely efficient.
In the case of randomly entered data, the B-tree stays balanced automatically. In fact, the B-tree stays balanced no matter what data is entered into it.
All blocks of a B-tree index are three-quarters full (on the average), allowing insertion without rebuild.
B-trees provide excellent performance for all types of selects.
Insert, update, and deletes tend to be efficient in a B-tree structure.
B-tree performance stays optimal even when tables vary from small to large.
OLAP databases we will B-tree indexes

World of ETL Tools Available ... sooooooomany etl tools.....

I found the below list available ETL tools on google. sooooooomany etl tools.....

World of ETL Tools Available
ETL Product Name	Vendor
ActaWorks	Acta Technologies
Amadea	ISoft
ASG-XPATH	Allen Systems Group
AT Sigma W-Import	Advanced Technologies
AutoImport	White Crane Systems
Automatic Data Warehouse Builder	Gilbert Babin
Blue Data Miner	Blue Data
Catalyst	Synectics Solutions
CDB/Superload	CDB Software
Cerebellum Portal Integrator	Cerebellum Software
Checkmate	BitbyBit International Ltd.
Chyfo	Ispirer Systems
CMS TextMap	Cornerstone Management Systems
Compleo	Symtrax
Content Connect	One Page
Convert /IDMS-DB, Convert/VSAM	Forecross Corporation
Conversions Plus	DataViz
Copy Manager	Information Builders, Inc.
CoSORT	Innovative Routines International, Inc.
CrossXpress	Cross Access Corporation
Cubeware Importer	CubeWare
Cyklop	Tokab Software AB
Data Cycle	APE Software Components S.L.
Data Exchange	XSB
Data EXTRactor	DogHouse Enterprises
Data Flow Manager	Peter's Software
Data Junction, Content Extractor	Data Junction
Data Manager	Joe Spanicek
Data Mapper	Applied Database Technology
Data Migration Tools	Friedman & Associates
Data Migrator for SAP, PeopleSoft	Information Builders, Inc.
Data Propagation System	Treehouse Software
Data Warehouse Tools	Javacorporate
Data³	Inform Information Systems
DataBlaster 2	Bus-Tech, Inc.
DataBrix Data Manager	Lakeview Technology
DataConvert	Metadata Information Partners
DataDigger	Donnell Systems
DataExchanger SRV	CrossDataBase Technology
Datagration	Paladyne
DataImport	Spalding Software
DataLoad	Software Technologies Corporation
DataManager	Joe Spanicek
DataMIG	Dulcian, Inc.
DataMiner	Placer Group
DataMirror Constellar Hub	DataMirror Corporation
DataMirror Transformation Server	DataMirror Corporation
DataProF	IT Consultancy Group BV
DataPropagator	IBM
DataProvider	Order Software Company
DataPump for SAP R/3	Transcope AG
DataStage XE	Ascential Software
DataSuite	Pathlight Data Systems
Datawhere	Miab Systems Ltd.
DataX	Data Migrators
DataXPress	EPIQ Systems
DB/Access	Datastructure
DBMS/Copy	Conceptual Software, Inc.
DBridge	Software Conversion House
DEAP I	DEAP Systems
DecisionBase	Computer Associates
DecisionStream	Cognos
DECISIVE Advantage	InfoSAGE, Inc.
Departmental Suite 2000	Analytical Tools Inc.
DETAIL	Striva Technology
Distribution Agent for MVS	Sybase
DocuAnalyzer	Mobius Management
DQtransform	Metagon Technologies
DT/Studio	Embarcadero Technologies
DTS	Microsoft
e-Sense Gather	Vigil Technologies
e-zMigrate	e-zdata.net
eIntegration Suite	Taviz Technology
Environment Manager	WhiteLight Technology
ETI Extract	Evolutionary Technologies, Inc.
ETL Engine	FireSprout
ETL Manager	iWay Software
eWorker Portal, eWorker Legacy	entrinsic.com
EZ-Pickin's	ExcelSystems
FastCopy	SoftLink
File-AID/Express	CompuWare
FileSpeed	Computer Network Technology
Formware	Captiva Software
FOXTROT	EnableSoft, Inc.
Fusion FTMS	Proginet
Gate/1	Selesta
Génio	Hummingbird Communications Ltd.
Gladstone Conversion Package	Gladstone Computer Services
GoHunter	Gordian Data
Graphical Performance Series	Vanguard Solutions
Harvester	Object Technology UK
HIREL	SWS Software Services
iManageData	BioComp Systems
iMergence	iMergence Technologies
InfluX	Network Software Associates, Inc.
InfoLink/400	Batcom
InfoManager	InfoManager Oy
InfoRefiner, InfoTransport, InfoHub, InfoPump	Computer Associates
Information Discovery Platform	Cymfony
Information Logistics Network	D2K
InformEnt	Fiserv
InfoScanner	WisoSoftCom
InScribe	Critical Path
InTouch/2000	Blue Isle Software, Inc.
ISIE	Artaud, Courthéoux & Associés
John Henry	Acme Software
KM.Studio	Knowmadic
LiveTransfer	Intellicorp
LOADPLUS	BMC Software
Mainframe Data Engine	Flatiron Solutions
Manheim	PowerShift
Mercator	TSI International
Meta Integration Works	Meta Integration Technologies
MetaSuite	Minerva Softcare
MetaTrans	Metagenix
MineWorks/400	Computer Professional Systems
MinePoint	MinePoint
MITS	Management Information Tools
Monarch	Datawatch Corporation
Mozart	Magma Solutions
mpower	Ab Initio
MRE	SolutionsIQ
NatQuery	NatWorks, Inc
netConvert	The Workstation Group, Ltd.
NGS-IQ	New Generation Software
NSX Data Stager	NSX Software
ODBCFace	System Tech Consulting
OLAP Data Migrator	Legacy to Web Solutions
OmniReplicator	Lakeview Technology
OpalisRendezVous	Opalis
Open Exchange	IST
OpenMigrator	PrismTech
OpenWizard Professional	OpenData Systems
OptiLoad	Leveraged Solutions, Inc.
Oracle Warehouse Builder	Oracle Corporation
Orchestrate	Torrent Systems Inc.
Outbound	Firesign Computer Company
Parse-O-Matic	Pinnacle Software
ParseRat	Guy Software
pcMainframe	cfSOFTWARE
PinnPoint Plus	Pinnacle Decision Systems
PL/Loader	Hanlon Consulting
PointOut	mSE GmbH
Power*Loader Suite	SQL Power Group
PowerDesigner WarehouseArchitect	Powersoft
PowerMart	Informatica
PowerStage	Sybase
Rapid Data	Open Universal Software
Relational DataBridge	Liant Software Corporation
Relational Tools	Princeton Softech
ReTarGet	Tominy
Rodin	Coglin Mill Pty Ltd.
Roll-Up	Ironbridge Software
Sagent Solution	Sagent Technology, Inc.
SAS/Warehouse Adminstrator	SAS Institute
Schemer Advanced	Appligator.com
Scribe Integrate	Scribe Software Corporation
Scriptoria	Bunker Hill
SERdistiller	SER Solutions
Signiant	Signiant
SpeedLoader	Benchmark Consulting
SPINA PRO	Diagnos
SRTransport	Schema Research Corp.
StarQuest Data Replicator	StarQuest Software
StarTools	StarQuest
Stat/Transfer	Circle Systems
Strategy	SPSS
Sunopsis (ODI 10G & 11G)	Sunopsis (from 2006 Oracle corporation)
SyncSort Unix	Syncsort
TableTrans	PPD Informatics
Text Agent	Tasc, Inc.
TextPipe	Crystal Software Australia
TextProc2000	LVRA
Textractor	Textkernel
Tilion	Tilion
Transporter Fountain	Digital Fountain
TransportIT	Computer Associates
ViewShark	infoShark
Vignette Business Integration Studio	Vignette
Visual Warehouse	IBM
Volantia	Volantia
vTag Web	Connotate Technologies
Waha	Beacon Information Technology
Warehouse	Taurus Software
Warehouse Executive	Ardent Software
Warehouse Plus	eNVy Systems
Warehouse Workbench	systemfabrik
Web Automation	webMethods
Web Data Kit	LOTONtech
Web Mining	Blossom Software
Web Replicator	Media Consulting
WebQL	Caesius Software
WhizBang! Extraction Library	WhizBang! Labs
Wizport	Turning Point
Xentis	GrayMatter Software Corporation

Source...
http://etutorials.org/SQL/oracle+dba+guide+to+data+warehousing+and+star+schemas/Chapter+6.+Loading+the+Warehouse/What+About+ETL+Tools/

Thanks.

HOW TO Kill The Running Job & session in Oracle

1. Check the running Job (From here you will get the SID that running the Job)

SQL> SELECT * FROM DBA_JOBS_RUNNING;

2. Make Job become Broken/offline

BEGIN
SYS.DBMS_JOB.BROKEN(job#,TRUE);
END;

SQL>
BEGIN
SYS.DBMS_IJOB.BROKEN('136451',TRUE);
END;

3. Kill the Oracle’s Session

SELECT SID,SERIAL# FROM v$session where sid in (SELECT sid from dba_jobs_running)

SQL> ALTER SYSTEM KILL SESSION 'sid,serial#';

4. Kill the O/S Process ID (PID)

SELECT p.spid FROM v$session s, v$process p
WHERE s.paddr = p.addr
AND s.sid = :sid;

5. Check if the Job is Still Running

SQL> SELECT * FROM DBA_JOBS_RUNNING;

6. Determine the Current Number of Job Queue Processes

SQL> col value for a10
SQL> select name,value from v$parameter where name = 'job_queue_processes';

7. Alter the Job Queue to Zero

SQL> ALTER SYSTEM SET job_queue_processes = 0;

This will bring down the entire job queue processes.

8. Validate that No Processes are Using the Job Queue

SQL> SELECT * FROM DBA_JOBS_RUNNING;

9. Mark the DBMS_JOB as Not Broken

BEGIN SYS.DBMS_IJOB.BROKEN(job#,FALSE); END;

10. Alter the Job Queue to Original Value

SQL> ALTER SYSTEM SET job_queue_processes = original_value;

Source.
http://levicorp.com/2009/05/22/how-to-kill-the-running-job/

Thanks.

What is Fact ? and Types of Fact?

Types Dimensions in Dimensional Modeling

What is Factless Fact table? When we will use this?

Difference between Datamart and Data Warehouse

Dimensional Hierarchy example

Dimensional Hierarchy example Data Warehouse Concepts

Galaxy Schema Data Warehouse Concepts

Snowflake Schema Data Warehouse Concepts

Start Schema Data Warehouse Concepts

Differences between Star Schema and Snowflake Schema Data Warehouse Concepts

SCD - Slowly Changing Dimention - Type 1, Type -2 , Type -3 Data Warehouse Concepts

If you are using This at ODI we have separate IKM for SCD type -2

Oracle IKM Slowly Changing Dimensions

And Below Properties we need to set for each column.

DataStore=>OLAP Type=> Slowly Changing Dimension.

1) Surrogate Key or Natual Key
2) Overwrite on Change ( No History)
3) Add row on change ( With History as a new record)
4) Starting Timestamp ( Record starting date)
5) Ending Timestamp ( Record Ending date lifetime)
6) Record status flag ( Records status Active or Inactive (1-Active, 0-Inactive)

Top-down versus bottom-up design methodologies in Datawarehousing

Bottom-up design

Ralph Kimball, a well-known author on data warehousing, is a proponent of an approach to data warehouse design which he describes as bottom-up.
In the bottom-up approach, data marts are first created to provide reporting and analytical capabilities for specific business processes. It is important to note that in Kimball methodology, the bottom-up process is the result of an initial business-oriented top-down analysis of the relevant business processes to be modelled.
Data marts contain, primarily, dimensions and facts. Facts can contain either atomic data and, if necessary, summarized data. The single data mart often models a specific business area such as "Sales" or "Production." These data marts can eventually be integrated to create a comprehensive data warehouse. The integration of data marts is managed through the implementation of what Kimball calls "a data warehouse bus architecture" The data warehouse bus architecture is primarily an implementation of "the bus", a collection of conformed dimensions and conformed facts, which are dimensions that are shared (in a specific way) between facts in two or more data marts.
The integration of the data marts in the data warehouse is centered on the conformed dimensions (residing in "the bus") that define the possible integration "points" between data marts. The actual integration of two or more data marts is then done by a process known as "Drill across". A drill-across works by grouping (summarizing) the data along the keys of the (shared) conformed dimensions of each fact participating in the "drill across" followed by a join on the keys of these grouped (summarized) facts.
Maintaining tight management over the data warehouse bus architecture is fundamental to maintaining the integrity of the data warehouse. The most important management task is making sure dimensions among data marts are consistent. In Kimball's words, this means that the dimensions "conform".
Some consider it an advantage of the Kimball method, that the data warehouse ends up being "segmented" into a number of logically self-contained (up to and including The Bus) and consistent data marts, rather than a big and often complex centralized model. Business value can be returned as quickly as the first data marts can be created, and the method gives itself well to an exploratory and iterative approach to building data warehouses. For example, the data warehousing effort might start in the "Sales" department, by building a Sales-data mart. Upon completion of the Sales-data mart, the business might then decide to expand the warehousing activities into the, say, "Production department" resulting in a Production data mart. The requirement for the Sales data mart and the Production data mart to be integrable, is that they share the same "Bus", that will be, that the data warehousing team has made the effort to identify and implement the conformed dimensions in the bus, and that the individual data marts links that information from the bus. Note that this does not require 100% awareness from the onset of the data warehousing effort, no master plan is required upfront. The Sales-data mart is good as it is (assuming that the bus is complete) and the Production-data mart can be constructed virtually independent of the Sales-data mart (but not independent of the Bus).
If integration via the bus is achieved, the data warehouse, through its two data marts, will not only be able to deliver the specific information that the individual data marts are designed to do, in this example either "Sales" or "Production" information, but can deliver integrated Sales-Production information, which, often, is of critical business value.

Top-down design

Bill Inmon, one of the first authors on the subject of data warehousing, has defined a data warehouse as a centralized repository for the entire enterprise. Inmon is one of the leading proponents of the top-down approach to data warehouse design, in which the data warehouse is designed using a normalized enterprise data model. "Atomic" data, that is, data at the lowest level of detail, are stored in the data warehouse. Dimensional data marts containing data needed for specific business processes or specific departments are created from the data warehouse. In the Inmon vision, the data warehouse is at the center of the "Corporate Information Factory" (CIF), which provides a logical framework for delivering business intelligence (BI) and business management capabilities.
Inmon states that the data warehouse is:

Subject-oriented: The data in the data warehouse is organized so that all the data elements relating to the same real-world event or object are linked together.
Non-volatile: Data in the data warehouse are never over-written or deleted — once committed, the data are static, read-only, and retained for future reporting.
Integrated: The data warehouse contains data from most or all of an organization's operational systems and these data are made consistent.
Time-variant: For An operational system, the stored data contains the current value.

The top-down design methodology generates highly consistent dimensional views of data across data marts since all data marts are loaded from the centralized repository. Top-down design has also proven to be robust against business changes. Generating new dimensional data marts against the data stored in the data warehouse is a relatively simple task. The main disadvantage to the top-down methodology is that it represents a very large project with a very broad scope. The up-front cost for implementing a data warehouse using the top-down methodology is significant, and the duration of time from the start of project to the point that end users experience initial benefits can be substantial. In addition, the top-down methodology can be inflexible and unresponsive to changing departmental needs during the implementation phases.

Hybrid design

Data warehouse (DW) solutions often resemble the hub and spokes architecture. Legacy systems feeding the DW/BI solution often include customer relationship management (CRM) and enterprise resource planning solutions (ERP), generating large amounts of data. To consolidate these various data models, and facilitate the extract transform load (ETL) process, DW solutions often make use of an operational data store (ODS). The information from the ODS is then parsed into the actual DW. To reduce data redundancy, larger systems will often store the data in a normalized way. Data marts for specific reports can then be built on top of the DW solution.
It is important to note that the DW database in a hybrid solution is kept on third normal form to eliminate data redundancy. A normal relational database however, is not efficient for business intelligence reports where dimensional modelling is prevalent. Small data marts can shop for data from the consolidated warehouse and use the filtered, specific data for the fact tables and dimensions required. The DW effectively provides a single source of information from which the data marts can read, creating a highly flexible solution from a BI point of view. The hybrid architecture allows a DW to be replaced with a master data management solution where operational, not static information could reside.
The Data Vault Modeling components follow hub and spokes architecture. This modeling style is a hybrid design, consisting of the best practices from both 3rd normal form and star schema. The Data Vault model is not a true 3rd normal form, and breaks some of the rules that 3NF dictates be followed. It is however, a top-down architecture with a bottom up design. The Data Vault model is geared to be strictly a data warehouse. It is not geared to be end-user accessible, which when built, still requires the use of a data mart or star schema based release area for business purposes.

Physical Data Model - Design Step -3

This is third step for Designing for DWH Design

Physical Data Model is the actual model which will be created in the database to store the data. It is the most detailed data model in Data Warehouse data modeling. It includes

Tables names
All column names of the table along with data type and size
Primary keys, Foreign Keys of a table
Constraints

Physical Data Model can be converted to actual SQL DDL statement by using different tools. ERWIN is the famous tool to do this.

Logical Data Model - Design Step -2

This is Second step for Designing

Good Logical data model in data warehouse implementation is very important. Logical data model has to be detailed (though some might not agree) as it represents the entire business in one shot and shows relationship between business entities. Logical Model should have following things to make it detailed and self explanatory.

All entities to be included in data warehouse
All possible attributes of each entity
Primary keys of each entity ( Natural Keys as well as Surrogate Keys )
Relationships between each every entity

Characteristics of Data Warehouse Logical Data Model

It has all the entities which will be used in data warehouse
It shows all possible attributes of all entities
It depicts the relationships between all entities

Conceptual Data Model - Design Step 1

This is First step for DWH Designing.

Conceptual Data Model is the first step in Data Warehouse design. In conceptual data model, very high level relationships between dimension and fact table is depicted. Conceptual data model not necessarily includes keys, attributes of tables. Conceptual data model gives a very high level idea of proposed Data Warehouse design including possible fact and dimension table. Conceptual data model is the stepping stone to design logical data model of Data Warehouse.

Characteristics of Data Warehouse Conceptual Data Model

It shows only high level relationship between tables.
It does not show primary key or column names
It is the stepping stone of Logical Data Model

LINUX and Shell Scripting

1) How do you find out what’s your shell?

2) What’s the command to find out today’s date?

3) What’s the command to find out users on the system?

4) How do you find out the current directory you’re in?

5) How do you remove a file?

6) How do you find out your own username?

7) How do you search for a string inside a given file?

8) What are PIDs?

9) How do you list currently running process?

10) How do you stop a process?

11) How do you stop all the processes, except the shell window?

12) What’s the conditional statement in shell scripting?

13) How do you do number comparison in shell scripts?

14) How do you test for file properties in shell scripts?

15) How do you edit a file in shell scripts?

SQL * LOADER

1) What is SQL * Loader and why we are using SQL * LOADER?

2) What are the files we are using in SQL * Loader?

3) What is difference between Badfile, Logfile and Discard files.

4) How to Insert two tables using one data file?

5) What is FILLER and SKIP Option in SQL * LOADER?

6) What are the options available in SQL * LOADER?

7) How to insert data in one table using two data files?

8) SQL*Loader is auto commit or Not?

Oracle Tuning:

1) What are the tuning methodologies you are suing?

2) What is SQL TRACE and How to Enable SQL TRACE?

3) What is explain plan. how do you read it.

4) What are the things trace file contain?

5) Where we can find a trace file and what is the trace file extension?

6) What is TKPROF?

7) How to reduce a cost in Cost based tuning and what are the things you are following?

PLSQL Interview Questions

1) What is a Cursor and Cursor Types?

2) What is FOR UPDATE and WHERE CURRENT Clauses?

3) Differences between Procedure and Function?

4) Can we create Package Specification without package Body?

5) What is Forward Declaration in packages?

6) What are the packages you are using?

7) What are the main advantages of Packages?

8) What is the use NOCOPY?

9) Can we use OUT and INOUT Parameter in Functions?

10) What is BULK COLLECT and BULK BIND?

11) What are the cursors attributes?

12) What is save Exception?

13) What is Exception and types of Exceptions?

14) What is Raise_application_error?

15) What is PRAGMA EXCEPTION_INIT?

16) What is PRAGMA Autonomous transaction?

17) What is Global Temporary Table?

18) What are the types of Collections and Differences between all these?

19) What is Insead of Trigger?

20) What is Mutating Table Error?

21) How to debug a procedure?

22) What are the differences between BINARY_INTEGER and PLS_INTEGER?

23) Can we use TCL ( Commit, ROLLBAK) in Triggers?

24) What are the types of Triggers?

25) What are the types of System Triggers?