Basic Data Warehouse concepts question with answer for Beginners in #Data-Science.
Hello,Everyone I am back with Basic Data Warehouse concepts for Beginners in #Data-Science with important interviews question with answer as follows:
What is
data warehouse?
A
data warehouse is a electronically storage of an Organization's historical data
for the purpose of analysis and reporting. According to Kimpball, a
datawarehouse should be subject-oriented, non-volatile, integrated and
time-variant.
Explanatory
Note
Non-volatile means that the data once loaded in the
warehouse will not get deleted later. Time-variant means the data will change
with respect to time.
What are
the benefits of data warehouse?
Historical
data stored in data warehouse helps to analyze different aspects of business
including, performance analysis, trend analysis, trend prediction etc. which
ultimately increases efficiency of business processes.
Why Data
Warehouse is used?
Data
warehouse facilitates reporting on different key business processes known as
KPI. Data warehouse can be further used for data mining which helps trend
prediction, forecasts, pattern recognition etc.
What is
the difference between OLTP and OLAP?
OLTP
is the transaction system that collects business data. Whereas OLAP is the
reporting and analysis system on that data.
OLTP systems are optimized for INSERT, UPDATE operations and therefore highly normalized. On the other hand, OLAP systems are deliberately deformalized for fast data retrieval through SELECT operations.
OLTP systems are optimized for INSERT, UPDATE operations and therefore highly normalized. On the other hand, OLAP systems are deliberately deformalized for fast data retrieval through SELECT operations.
Explanatory
Note:
In a departmental shop, when we pay the prices at the check-out counter, the sales person at the counter keys-in all the data into a "Point-Of-Sales" machine. That data is transaction data and the related system is a OLTP system. On the other hand, the manager of the store might want to view a report on out-of-stock materials, so that he can place purchase order for them. Such report will come out from OLAP system
In a departmental shop, when we pay the prices at the check-out counter, the sales person at the counter keys-in all the data into a "Point-Of-Sales" machine. That data is transaction data and the related system is a OLTP system. On the other hand, the manager of the store might want to view a report on out-of-stock materials, so that he can place purchase order for them. Such report will come out from OLAP system
What is
data mart?
Data
marts are generally designed for a single subject area. An organisation may
have data pertaining to different departments like Finance, HR, Marketing etc.
stored in data warehouse and each department may have separate data marts.
These data marts can be built on top of the data warehouse.
What is
ER model?
ER
model is entity-relationship model which is designed with a goal of normalising the data.
What is
dimensional modelling?
Dimensional
model consists of dimension and fact tables. Fact tables store different transnational measurements and the foreign keys from dimension tables that
qualifies the data. The goal of Dimensional model is not to achieve high degree
of normalisation but to facilitate easy and faster data retrieval.
What is
dimension?
A
dimension is something that qualifies a quantity (measure).
If I
just say“20kg”, it does not mean anything. But 20kg of Rice (Product) is sold
to Ramesh (customer) on 5th April (date), gives a meaningful sense. These
product, customer and dates are some dimension that qualified the measure.
Dimensions are mutually independent.
Technically
speaking, a dimension is a data element that categorizes each item in a data
set into non-overlapping regions.
What is
fact?
A
fact is something that is quantifiable (Or measurable). Facts are typically
(but not always) numerical values that can be aggregated.
What are
the different types of dimension?
In a
data warehouse model, dimension can be of following types,
1. Conformed Dimension
2. Junk Dimension
3. Degenerated Dimension
4. Role Playing Dimension
1. Conformed Dimension
2. Junk Dimension
3. Degenerated Dimension
4. Role Playing Dimension
Based
on how frequently the data inside a dimension changes, we can further classify
dimension as
1. Unchanging or static dimension (UCD)
2. Slowly changing dimension (SCD)
3. Rapidly changing Dimension (RCD)
1. Unchanging or static dimension (UCD)
2. Slowly changing dimension (SCD)
3. Rapidly changing Dimension (RCD)
What is
aggregation and what is the benefit of aggregation?
A
data warehouse usually captures data with same degree of details as available
in source. The "degree of detail" is termed as granularity. But all
reporting requirements from that data warehouse do not need the same degree of
details.
To
understand this, let's consider an example from retail business. A certain
retail chain has 500 shops across Europe. All the shops record detail level
transactions regarding the products they sale and those data are captured in a
data warehouse.
Each
shop manager can access the data warehouse and they can see which products are
sold by whom and in what quantity on any given date. Thus the data warehouse
helps the shop managers with the detail level data that can be used for
inventory management, trend prediction etc.
Now
think about the CEO of that retail chain. He does not really care about which
certain sales girl in London sold the highest number of chopsticks or which
shop is the best seller of 'brown breads'. All he is interested is, perhaps to
check the percentage increase of his revenue margin accross Europe. Or may be
year to year sales growth on eastern Europe. Such data is aggregated in nature.
Because Sales of goods in East Europe is derived by summing up the individual
sales data from each shop in East Europe.
Therefore,
to support different levels of data warehouse users, data aggregation is
needed.
What is
slicing-dicing?
Slicing means showing
the slice of a data, given a certain set of dimension (e.g. Product) and value
(e.g. Brown Bread) and measures (e.g. sales).
Finally thank you for sparing your valuable time on reading my blog and stay connected for my next blog.Thank You.Have a good time.
Comments
Post a Comment