Friday, March 29, 2013

Bitmap Index Overview


Bitmap Index

Overview 

A bitmap index is an indexing method that can provide both performance benefits and storage savings. Bitmap indexes are particularly useful for data warehousing environments because data is usually updated less frequently and ad hoc queries are more common.
While B-tree indexes are the most effective method for “high-cardinality” data (data where there are many possible values, such as customer_name or phone_number), bit-mapped indexes are best for “low-cardinality” data (such as a column to indicate a person’s gender, which contains only two possible values: MALE and FEMALE).

Building a bitmap index

A portion of a sample company’s customer data is shown below:
customer#     marital_status     region     gender     income_level
101           single             east       male       bracket_1
102           married            central    female     bracket_4
103           married            west       female     bracket_2
104           divorced           west       male       bracket_4
105           single             central    female     bracket_2
106           married            central    female     bracket_3
Since MARITAL_STATUS, REGION, GENDER, and INCOME_LEVEL are all “low-cardinality” columns (there are only three possible marital status’s, three possible regions, two possible genders, and four possible income levels), it’s appropriate to create bitmap indexes on these columns. A bitmap index should not be created on CUSTOMER#, because this is a “high-cardinality” column, in fact it is the primary key of this table which means each value is unique. Instead, an ordinary, unique B-tree index should be created on this column in order to provide the most efficient representation and retrieval.
The bitmap index for the REGION column consists of three separate bit-maps, one for each region. These bit-maps are shown below:
region='east'   region='central'   region='west'
      1               0                  0
      0               1                  0
      0               0                  1
      0               0                  1
      0               1                  0
      0               1                  0
Each entry (or “bit”) in the bitmap corresponds to a single row of the CUSTOMER table. The value of each bit depends upon the values of the corresponding row in the table. For example, the bitmap REGION=‘east’ contains a one as its first bit; this is due to the fact that the REGION is ‘east’ in the first row of the CUSTOMER table. The bitmap REGION=‘east’ has a zero for its other bits because none of the other rows of the CUSTOMER table contain ‘east’ as their value for REGION. With this scheme, it is impossible for a single row to have a bit set in more than one bitmap for that column.

Using a bitmap index

Suppose you want to look at the demographic trends of the company’s customers. A sample question might be: “How many of our married customers live in the central or west regions?” which could be coded in an SQL query as:
select count(*) from customer 
 where marital_status = 'married' 
   and region in ('central','west');
Bitmap indexes can be used to process this query. One bitmap is used to evaluate each predicate, and the bitmaps are logically combined to evaluate the AND’s and the OR’s. For example:

By counting the number of 1’s in the resulting bitmap above, the result of this query can be easily computed. If you further wanted to know the specific customers that satisfied the criteria, instead of simply the number of customers, the resulting bitmap would be used to access the table. In this manner, bitmap indexes are said to cooperate with each other. The bitmap indexes for multiple columns can be AND’ed or OR’ed together to determine the one bitmap that will satisfy the query. All of this is done before the table is even accessed. B-tree indexes do not cooperate with each other. Multiple bitmap indexes can be used to satisfy this query. Only one B-tree index can be used for this query, even if multiple B-tree indexes exist. 

When to use a bitmap index

There are three things to consider when choosing an index method: 
  • performance
  • storage
  • maintainability
The advantage for using bitmap indexes are their performance impact for certain queries and their relatively small storage requirements. Note however that bitmap indexes are not applicable to every query and bitmap indexes, like B-tree indexes, can impact the performance of insert, update, and delete statements. When you modify table that has a bitmap index, the table must be locked to update the bitmaps. This can lead to poor performance in situations where the table has a high degree of modification concurrency. 

Performance Considerations

Bitmap indexes can provide very impressive performance improvements. Execution times of certain queries may improve by several orders of magnitude. The queries that benefit the most from bitmap indexes have the following characteristics:
  • the WHERE-clause contains multiple predicates on low-cardinality columns
  • the individual predicates on these low-cardinality columns select a large number of rows
  • bitmap indexes have been created on some or all of these low-cardinality columns
  • the tables being queried contain many rows
An advantage of bitmap indexes is that multiple bitmap indexes can be used to evaluate the conditions on a single table. Thus, bitmap indexes are very useful for complex ad hoc queries that contain lengthy WHERE-clauses.

Storage considerations

Bitmap indexes incur a small storage cost and have a significant storage savings over B-tree indexes. A bitmap index can require 100 times less space than a B-tree index for a low-cardinality column.
Important Note: 
A strict comparison of the relative sizes of B-tree and bitmap indexes is not an accurate measure for selecting bitmap over B-tree indexes. Because of the performance characteristics of bitmap indexes and B-tree indexes, you should continue to maintain B-tree indexes on your high-cardinality data. Bitmap indexes should be considered primarily for your low-cardinality data.
With bitmap indexes, the problems associated with multiple-column B-tree indexes is solved because bitmap indexes can be efficiently combined during query execution; they cooperate with each other. Note that while the bitmap indexes may not be quite as efficient during execution as the appropriate concatenated B-tree indexes, the space savings provided by bitmap indexes can often more than justify their utilization.
The net storage savings will depend upon a database’s current usage of B-tree indexes:
  • A database that relies on single-column B-tree indexes on high-cardinality columns will not observe significant space savings (but should see significant performance increases). 
     
  • A database containing a significant number of concatenated B-tree indexes could reduce its index storage usage by 50% or more, while maintaining similar performance characteristics.
     
  • A database that lacks concatenated B-tree indexes because of storage constraints will be able to use bitmap indexes and increase performance with minimal storage costs.
Bitmap indexes are best for read-only or light OLTP environments. Because there is no effective method for locking a single bit, row-level locking is not available for bitmap indexes. Instead, locking for bitmap indexes is effectively at the block level which can impact heavy OLTP environments. Note also that like other types of indexes, updating bitmap indexes is a costly operation.
Though bitmap indexes are not appropriate for databases with a heavy load of insert, update, and delete operations, their effectiveness in a data warehousing environment is not diminished. In such environments, data is usually maintained via bulk inserts and updates. For these bulk operations, rebuilding or refreshing bitmap indexes is an efficient operation. The storage savings and performance gains provided by bitmap indexes can provide tremendous benefits to data warehouse users.

Performance/storage characteristics

In preliminary testing of bitmap indexes, certain queries ran up to 100 times faster. The bitmap indexes on low cardinality columns were also about ten times smaller than B-tree indexes on the same columns. In these tests, the queries containing multiple predicates on low-cardinality data experienced the most significant speedups. Queries that did not conform to this characteristic were not assisted by bitmap indexes.

Example CREATE Syntax

CREATE BITMAP INDEX myindex ON mytable(col1);

Example scenarios

The following sample queries on the CUSTOMERS table demonstrate the variety of query-processing techniques that are necessary for optimal performance.
  Example #1:   Single predicate on a low-cardinality attribute.
select * from customers 
 where gender = 'male';
Best approach: parallel table scan
This query will return approximately 50% of the data. Since we will be accessing such a large number of rows, it is more efficient to scan the entire table rather than use either bitmap indexes or B-tree indexes. To minimize elapsed time, the Server should execute this scan in parallel.
  Example #2:   Single predicate on a high-cardinality attribute.
select * from customers 
 where customer# = 101;
Best approach: conventional unique index
This query will retrieve at most one record from the employee table. A B-tree index or hash cluster index is always appropriate for retrieving a small number of records based upon criteria on the indexed columns.
  Example #3:   Multiple predicates on low-cardinality attributes
select * from customers 
 where gender = 'male' 
   and region in ('central','west') 
   and marital_status in ('married', 'divorced');
Best approach: bit-mapped index
Though each individual predicate specifies a large number of rows, the combination of all three predicates will return a relatively small number of rows. In this scenario, bitmap indexes provide substantial performance benefits.
  Example #4:   Multiple predicates on both high-cardinality and low-cardinality attributes.
select * from customers 
 where gender = 'male' 
   and customer# < 100;
Best approach: B-tree index on CUSTOMER#
This query returns a small number of rows because of the highly selective predicate on CUSTOMER#. It is more efficient to use a B-tree index on CUSTOMER# than to use a bitmap index on GENDER.
In each of the previous examples, the Oracle cost-based optimizer transparently determines the most efficient query-processing technique.

A special note on cardinality

As has been noted, bitmap indexes work best with low cardinality. The most often examples of columns with low cardinality are gender columns (Male/Female) and truth columns (True/False). In some situtations, these columns should not even be candidates for any index, let alone bitmap indexes. If you have a company database and there is a table named EMPLOYEES, then you might have a column called GENDER. If your company is representative of the general population distribution, then approximately have of your company will be male and the other half will be female. If your query is looking for all of the female employees in your table, then approximately half of the rows of data in that table would be returned. When you return that many rows from a table, a full table scan would be faster than using any index. So while bitmap indexes should be used for low cardinality columns, there is a danger in columns that have too low of a cardinality.
That being said, bitmap indexes, even on such low cardinality columns, can still help with queries. As has been shown above, bitmap indexes can cooperate with each other. When you issue a query such as the follows:
select * from customers 
 where gender = 'male' 
   and region in ('central','west');
This query can use the bitmap index on the GENDER column to filter out approximately half of the rows. The bitmap index on the REGION column can be used to further filter the rows down. In cooperating together, bitmap indexes achieve their greatest benefits.

No comments:

Post a Comment