Data Warehousing Interview Questions 2019
1. What is Data warehouse?
A data warehouse is an electronic storage of an Organization’s historical data for Data Analytics, such as reporting, analysis and other knowledge discovery activities.
Other than Data Analytics, a data warehouse can also be used for data integration, master data management etc.
2. Compare Database & Data warehouse
3. What is the purpose of cluster analysis in Data Warehousing?
Purpose of cluster analysis:
- Scalability
- Ability to deal with different kinds of attributes
- Discovery of clusters with attribute shape
- High dimensionality
- Ability to deal with noisy
- Interpretability
4. What are the benefits of data warehouse?
A data warehouse helps to integrate data (see Data integration) and store them historically so that we can analyze different aspects of the business including, performance analysis, trend, prediction etc. over a given time frame and use the result of our analysis to improve the efficiency of business processes.
5. What is Virtual Data Warehousing?
A virtual data warehouse provides a collective view of the completed data. A virtual data warehouse has no historical data. It can be considered as a logical data model of the containing metadata.
Virtual data warehousing is a ‘de facto’ information system strategy for supporting analytical decision making. It is one of the best ways for translating raw data and presenting it in the form that can be used by decision makers. It provides a semantic map – which allows the end user for viewing as virtualized.
6. What is the difference between OLTP and OLAP?
OLTP is the transaction system that collects business data. Whereas OLAP is the reporting and analysis system on that data.
OLTP systems are optimized for INSERT, UPDATE operations and therefore highly normalized. On the other hand, OLAP systems are deliberately denormalized for fast data retrieval through SELECT operations.
7. What is data mart?
Data marts are generally designed for a single subject area. An organization may have data pertaining to different departments like Finance, HR, Marketing etc. stored in a data warehouse and each department may have separate data marts. These data marts can be built on top of the data warehouse.
8. What is dimensional modeling?
Dimensional model consists of dimension and fact tables. Fact tables store different transactional measurements and the foreign keys from dimension tables that qualify the data. The goal of the Dimensional model is not to achieve a high degree of normalization but to facilitate easy and faster data retrieval.
Ralph Kimball is one of the strongest proponents of this very popular data modeling technique which is often used in many enterprise-level data warehouses.
9. What is Start-schema?
This schema is used in data warehouse models where one centralized fact table references a number of dimension tables so as the keys (primary key) from all the dimension tables flow into the fact table (as a foreign key) where measures are stored. This entity-relationship diagram looks like a star, hence the name.
Consider a fact table that stores sales quantity for each product and customer on a certain time. Sales quantity will be the measure here and keys from customer, product and time dimension tables will flow into the fact table.
10. Why is the chameleon method used in data warehousing?
Chameleon is a hierarchical clustering algorithm that overcomes the limitations of the existing models and the methods present in the data warehousing. This method operates on the sparse graph having nodes: that represent the data items, and edges: representing the weights of the data items.
This representation allows a large dataset to be created and operated successfully. The method finds the clusters that are used in the dataset using two-phase algorithms.
- The first phase consists of the graph partitioning that allows the clustering of the data items into a large number of sub-clusters.
- Second phase uses an agglomerative hierarchical clustering algorithm to search for the clusters that are genuine and can be combined with the sub-clusters that are produced.
11. What is active data warehousing?
- An active data warehouse represents a single state of the business. Active data warehousing considers the analytic perspectives of customers and SUPPLIERS. It helps to deliver the updated data through reports.
- A form of a repository of captured transactional data is known as ‘active data warehousing’. Using this concept, trends and patterns are found to be used for future decision making. The active data warehouse has a feature which can integrate the changes of data while scheduled cycles refresh. Enterprises utilize an active data warehouse in drawing the company’s image in a statistical manner.
12. Which one is faster, Multidimensional OLAP or Relational OLAP?
Multidimensional OLAP is faster than Relational OLAP.
MOLAP: Multi-dimensional OLAP
Data is stored in a multidimensional cube. The storage is not in the relational database, but in proprietary formats (one example is PowerOLAP’s .olp file). MOLAP products can be compatible with Excel, which can make data interactions easy to learn.
ROLAP: Relational OLAP
ROLAP products access a relational database by using SQL (structured query language), which is the standard language that is used to define and manipulate data in an RDBMS. Subsequent processing may occur in the RDBMS or within a mid-tier server, which accepts requests from clients, translates them into SQL statements, and passes them on to the RDBMS.
13. What is VLDB?
A very large database, or VLDB, is a database that contains an extremely large number of tuples (database rows) or occupies an extremely large physical file system storage space. A one terabyte database would normally be a VLDB.
14. List the type of OLAP Servers
- Relational OLAP
- Multidimensional OLAP
- Hybrid OLAP
- Specialized SQL Servers
15. What is data aggregation?
Data aggregation is the broad definition for any process that enables information gathering expression in a summary form, for statistical analysis.
Criteria | Database | Data Warehouse |
---|---|---|
Type of Data | Relational or Object-oriented data | Large volume with multiple data types |
Data Operations | Transaction processing | Data modeling and analysis |
Dimensions of data | Two dimensional | Multi-dimensional |
Data design | ER based and application-oriented | Star/Snowflake schema and subject-oriented |
Size of data | Small (in GB) | Large (in TB) |
Functionality | High availability & performance | High flexibility and user autonomy |