Basic Data Warehouse concepts question with answer for Beginners in #Data-Science.

         Hello,Everyone I am back with Basic Data Warehouse concepts for Beginners in #Data-Science with important interviews question with answer as follows:
What is data warehouse?
A data warehouse is a electronically storage of an Organization's historical data for the purpose of analysis and reporting. According to Kimpball, a datawarehouse should be subject-oriented, non-volatile, integrated and time-variant.
Explanatory Note
Non-volatile means that the data once loaded in the warehouse will not get deleted later. Time-variant means the data will change with respect to time.
What are the benefits of data warehouse?
Historical data stored in data warehouse helps to analyze different aspects of business including, performance analysis, trend analysis, trend prediction etc. which ultimately increases efficiency of business processes.
Why Data Warehouse is used?
Data warehouse facilitates reporting on different key business processes known as KPI. Data warehouse can be further used for data mining which helps trend prediction, forecasts, pattern recognition etc.
What is the difference between OLTP and OLAP?
OLTP is the transaction system that collects business data. Whereas OLAP is the reporting and analysis system on that data.
OLTP systems are optimized for INSERT, UPDATE operations and therefore highly normalized. On the other hand, OLAP systems are deliberately deformalized for fast data retrieval through SELECT operations.
Explanatory Note:
In a departmental shop, when we pay the prices at the check-out counter, the sales person at the counter keys-in all the data into a "Point-Of-Sales" machine. That data is transaction data and the related system is a OLTP system. On the other hand, the manager of the store might want to view a report on out-of-stock materials, so that he can place purchase order for them. Such report will come out from OLAP system
What is data mart?
Data marts are generally designed for a single subject area. An organisation may have data pertaining to different departments like Finance, HR, Marketing etc. stored in data warehouse and each department may have separate data marts. These data marts can be built on top of the data warehouse.
What is ER model?
ER model is entity-relationship model which is designed with a goal of normalising the data.
What is dimensional modelling?
Dimensional model consists of dimension and fact tables. Fact tables store different transnational measurements and the foreign keys from dimension tables that qualifies the data. The goal of Dimensional model is not to achieve high degree of normalisation but to facilitate easy and faster data retrieval.
What is dimension?
A dimension is something that qualifies a quantity (measure).
If I just say“20kg”, it does not mean anything. But 20kg of Rice (Product) is sold to Ramesh (customer) on 5th April (date), gives a meaningful sense. These product, customer and dates are some dimension that qualified the measure. Dimensions are mutually independent.
Technically speaking, a dimension is a data element that categorizes each item in a data set into non-overlapping regions.
What is fact?

A fact is something that is quantifiable (Or measurable). Facts are typically (but not always) numerical values that can be aggregated.
What are the different types of dimension?
In a data warehouse model, dimension can be of following types,
1. Conformed Dimension
2. Junk Dimension
3. Degenerated Dimension
4. Role Playing Dimension
Based on how frequently the data inside a dimension changes, we can further classify dimension as
1. Unchanging or static dimension (UCD)
2. Slowly changing dimension (SCD)
3. Rapidly changing Dimension (RCD)
What is aggregation and what is the benefit of aggregation?
A data warehouse usually captures data with same degree of details as available in source. The "degree of detail" is termed as granularity. But all reporting requirements from that data warehouse do not need the same degree of details.
To understand this, let's consider an example from retail business. A certain retail chain has 500 shops across Europe. All the shops record detail level transactions regarding the products they sale and those data are captured in a data warehouse.
Each shop manager can access the data warehouse and they can see which products are sold by whom and in what quantity on any given date. Thus the data warehouse helps the shop managers with the detail level data that can be used for inventory management, trend prediction etc.
Now think about the CEO of that retail chain. He does not really care about which certain sales girl in London sold the highest number of chopsticks or which shop is the best seller of 'brown breads'. All he is interested is, perhaps to check the percentage increase of his revenue margin accross Europe. Or may be year to year sales growth on eastern Europe. Such data is aggregated in nature. Because Sales of goods in East Europe is derived by summing up the individual sales data from each shop in East Europe.
Therefore, to support different levels of data warehouse users, data aggregation is needed.
What is slicing-dicing?
Slicing means showing the slice of a data, given a certain set of dimension (e.g. Product) and value (e.g. Brown Bread) and measures (e.g. sales).
              Finally thank you for sparing your valuable time on reading my blog and stay connected for my next blog.Thank You.Have a good time.
Stay Connected & comments,share post:Twitter  linkedin  Facebook

Comments