data warehousing concepts
#1

ABSTRACT

Data warehousing is the process where organizations extract value from their informational assets though the use of special stores called data warehouses. In general data warehouse is defined to be subject-oriented, integrated, time-variant, and non-volatile. Using both internal and external systems sources of data, a data warehouse is created.
Data mining, the extraction of hidden predictive information from large databases, is a powerful new technology with great potential to help companies focus on the most important information in their data warehouses.
Data Warehouses are increasingly used by enterprises to increase efficiency and competitiveness. Using Scorecarding, Data Mining and on-line analytical processing (OLAP) analysis, business value can be extracted from Data Warehouses.Generating positive return-on-investments (ROI) from Data Warehouses requires a blend of business intuitiveness and technical skills.

This paper presents these strategies and technologies that will enhance the ROI of Data Warehousing.
Reply
#2
It is estimated that as high as 75% of the effort spent on building a data warehouse can be attributed to back-end issues, such as readying the data and transporting it into the data warehouse (Atre, 1998). Data quality tools are used in data warehousing to ready the data and ensure that clean data populates the warehouse, thus enhancing usability of the warehouse. This research focuses on the problems in the data that are addressed by data quality tools. Specific questions of the data can elicit information that will determine which features of the data quality tools are appropriate in which circumstances. The objective of the effort is to develop a tool to support the identification of data quality issues and the selection of tools for addressing those issues. A secondary objective is to provide information on specific tools regarding price, platform, and unique features of the tool.
Reply
#3
[attachment=5114]
data warehousing concepts


ABSTRACT


Data warehousing is the process where organizations extract value from their informational assets though the use of special stores called data warehouses. In general data warehouse is defined to be subject-oriented, integrated, time-variant, and non-volatile. Using both internal and external systems sources of data, a data warehouse is created.
Data mining, the extraction of hidden predictive information from large databases, is a powerful new technology with great potential to help companies focus on the most important information in their data warehouses.
Data Warehouses are increasingly used by enterprises to increase efficiency and competitiveness. Using Scorecarding, Data Mining and on-line analytical processing (OLAP) analysis, business value can be extracted from Data Warehouses.Generating positive return-on-investments (ROI) from Data Warehouses requires a blend of business intuitiveness and technical skills.


Reference: http://studentbank.in/report-data-wareho...z11YKzRIFM
Reply
#4
[attachment=7158]
Data Warehouse

What is Data Warehouse?

Collection of integrated, subject-oriented, time-variant and non-volatile data in support of managements decision making process.

Described as the "single point of truth", the "corporate memory", the sole historical register of virtually all transactions that occur in the life of an organization.


Transaction System vs. Data Warehouse

Transaction System



Supports day-to-day operational processes
Contains raw, detailed data that has not been refined or cleansed
Volatile -- data changes from day-to-day, with frequent updates
Technical issues drive the data structure and system design
Disparate data structures, physical locations, query types, etc.
Users rely on technical analysts for reporting needs
Operational processes impacted by queries run off of system

Data Warehouse


Supports management analysis and decision-making processes
Contains summarized, refined, and cleansed information
Non-volatile -- provides a data “snapshot”; adjustments are not permitted, or are limited
Business analysis requirements drive the data structure and system design
Integrated, consistent information on a single technology platform
Users have direct, fast access via On-line Analytical Processing tools
Minimal impact on operational processes


Reply
#5
PRESENTED BY:
PAWAN KUMAR
POOJA SINGH
POONAM SINGH
PRABHAT BHATT

[attachment=12444]
What is Data Warehousing?
Data warehousing is subject- oriented, integrated, time- variant, and non-volatile collection of data in support of management’s decision-making process.
Why Data warehousing?
 Planning and Decision Making.
 Avalability of Supportive Hardware and Software.
 Need of Integrated Business Model.
 Need of Knowledge System.
BENEFITS
 Historical information for comparative and competitive analysis.
 Enhanced data quali ty and completeness.
 Supplementing disaster recovery plans with another data back up source.
The difference bewteen OLTP and data warehousing.
A DBMS built for online transaction processing (OLTP) is generally regarded as unsuitable for data warehousing because each system is designed with a differing set of requirements in mind.
Building layer by layer
 Exterior Database layer.
Take additional data from outside database includes competitive and purcahasing trends.
 Information Access Layer.
it symbolize the tool which user uses everyday like excel,lotus,etc
 Data Access Layer.
it allows information access layer to chatter to the operational layer)
 Data Directory Layer.
its necessary to have some form of data directory or repositionary of metadata information
 Process Management Layer.
process management layer is concerned in scheduling various tasks
 Application Messaging Layer
process management layer is concerned in scheduling various tasks
 Data Warehouse Layer.
is the core warehouse where the genuine and authentic data is used for information purposes
 Data Staging Layer.
the final layer of the architecture is data performance.its also known as copy management and imitation management.

Reply
#6
Presented By
Mayank Agarwal

[attachment=12614]
ABSTRACT
Data entering the data warehouse comes from operational environment in almost every case.Data warehousing provides architectures and tools for business executives to syste-matically organize ,understand ,and use their data to make stragetic decisions.A large number of organizations have found that data warehouse systems are valuable tools in today’s competive,fast-evolving world. In the last several years ,many firms have spent millions of dollars in building enterprise wide data warehouses. Many people feel that with competition mounting in every industry ,data warehousing is the latest must have marketing weapon –a way to keep customers by learning more about their needs.
So you may ask,full of intrigue,” what exactly is a data warehose “. Data warehouses have been defined in many ways,making it difficult to formulate a rigorous definition. Loosely speaking , a data warehouse refers to a database that is maintened separately from an organization,s operational databases. Data warehouse systems allow for integration of a variety of applications systems . They support information processing by providing a solid platform of consolidated historical data for analysis.
Data warehousing is a more formalised methodology of these techniques. For example, many sales analysis systems and executive information systems (EIS) get their data from summary files rather then operational transaction files. The method of using summary files instead of operational data is in essence what data warehousing is all about. Some data warehousing tools neglect the importance of modelling and building a datawarehouse and focus on the storage and retrieval of data only. These tools might have strong analytical facilities, but lack the qualities you need to build and maintain a corporatewide data warehouse. These tools belong on the PC rather than the host.Your corporate wide (or division wide) data warehouse needs to be scalable, secure, openand, above all, suitable for publication.
1.1 Introduction to data warehouse
“The data warehouse is always a physically separate store of data transformed from the application data found in the operational environment”.
Data entering the data warehouse comes from operational environment in almost every case.Data warehousing provides architectures and tools for business executives to syste-matically organize ,understand ,and use their data to make stragetic decisions.A large number of organizations have found that data warehouse systems are valuable tools in today’s competive,fast-evolving world. In the last several years ,many firms have spent millions of dollars in building enterprise wide data warehouses. Many people feel that with competition mounting in every industry ,data warehousing is the latest must have marketing weapon –a way to keep customers by learning more about their needs.
So you may ask,full of intrigue,” what exactly is a data warehose “. Data warehouses have been defined in many ways,making it difficult to formulate a rigorous definition. Loosely speaking , a data warehouse refers to a database that is maintened separately from an organization,s operational databases. Data warehouse systems allow for integration of a variety of applications systems . They support information processing by providing a solid platform of consolidated historical data for analysis.
Data warehousing is a more formalised methodology of these techniques. For example, many sales analysis systems and executive information systems (EIS) get their data from summary files rather then operational transaction files. The method
of using summary files instead of operational data is in essence what data warehousing is allabout.Some data warehousing tools neglect the importance of modelling and building a datawarehouse and focus on the storage and retrieval of data only. These tools might havestrong analytical facilities, but lack the qualities you need to build and maintain a corporatewide data warehouse. These tools belong on the PC rather than the host.Your corporate wide (or division wide) data warehouse needs to be scalable, secure, openand, above all, suitable for publication.
• SCABLE means that your data warehouse must be able to handle both a growing volume and variety of data and a growing number of users that can access it. Most companies prefer for this reason to store their corporate wide data warehouse in a relational database above a multi dimensional data base storage. (You can model your data dimensional and store it in a relational database. More about dimensional modelling techniques later.
• SECURE means that your data warehouse administrator can centrally control who is
allowed to access what data and when.
• OPEN means that the data in your data warehouse is open to a wide range of query andother front end tools. For this reason a relational data base should be your first choice for a corporate wide data warehouse. The proprietary data storage structures that are used by some data analysis tools can be fed from this central data warehouse.
Reply
#7
[attachment=14179]
Data Warehouse
What is Datawarehouse

“A data warehouse is a collection of business information, derived directly from operational systems and some external data sources. Its specific purpose is to support business decisions, not business operations”
Characteristics of Datawarehouse
Various development Stages of Datawarehouse
Business Case Assessment
Enterprise Infrastructure Evaluation
Project Planning
Finalize Project requirement
Data Analysis
Prototyping
Database Design
Various development Stages of Datawarehouse
ETL Framework Design
ETL Package Development
BI Application Development
Data Validation
Implementation
Release Evaluation
Data Modeling
WHAT IS A DATA MODEL?
A data model is an abstraction of some aspect of
the real world (system).
WHY A DATA MODEL?
Helps to visualise the business
A model is a means of communication.
Models help elicit and document requirements.
Models reduce the cost of change.
Model is the essence of DW architecture based on which DW will be implemented
What do we want to do with the data?
Model depends on what kind of data analysis we want to do:
Different Data Analysis Techniques
Query and reporting
Display Query Results
Multidimensional analysis
Analyze data content by looking at it in different perspectives
Data mining
discover patterns and clustering attributes in data
Levels Of Modeling
Conceptual modeling
Describe data requirements from a business point of view without technical details
Logical modeling
Refine conceptual models
Data structure oriented, platform independent
Physical modeling
Detailed specification of what is physically implemented using specific technology
Conceptual Model
A conceptual model shows data through business eyes.
All entities which have business meaning.
Important relationships
Few significant attributes in the entities.
Few identifiers or candidate keys.
Logical Model
Replaces many-to-many relationships with associative entities.
Defines a full population of entity attributes.
May use non-physical entities for domains and sub-types.
Establishes entity identifiers.
Has no specifics for any RDBMS or configuration.
Physical Model
A Physical data model may include
Referential Integrity
Indexes
Views
Alternate keys and other constraints
Tablespaces and physical storage objects.
What needs to be modeled during a data warehouse project
STAGING AREA
YES ! (maybe multiple data models are required)
ODS
YES !
DATAWAREHOUSE/DATAMART
YES!
Data Modeling - Techniques
Modeling techniques
E-R Modeling
Dimensional Modeling
Implementation and modeling styles
Modeling versus implementation
Modeling: describe what should be built to non-technical folks
Implementation: describe what is actually built to technical folks
Implementation and modeling styles
Relational modeling
Use for implementation
Difficult to understand by non-technical folks
Dimensional modeling
Use for modeling during analysis and design phases
Can be implemented using other modeling styles e.g. object-oriented, relational
Limitations of E-R Modeling
Poor Performance
Tend to be very complex and difficult to navigate.
Dimensional Modeling
Dimensional modeling uses three basic concepts : measures, facts, dimensions.
Is powerful in representing the requirements of the business user in the context of database tables.
Focuses on numeric data, such as values counts, weights, balances and occurrences.

Dimensional modeling
Must identify
Business process to be supported
Grain (level of detail)
Dimensions
Facts

Conventions used in Dimensional modeling
Facts
Measures(Variables)
Dimensions
Dimension members
Dimension hierarchies

Facts
A fact is a collection of related data items, consisting of measures and context data.
Each fact typically represents a business item, a business transaction, or an event that can be used in analyzing the business or business process.
Facts are measured, “continuously valued”, rapidly changing information. Can be calculated and/or derived.
Fact Table
A table that is used to store business information (measures) that can be used in mathematical equations.
Quantities
Percentages
Prices
Calculated Values

Dimensions
A dimension is a collection of members or units of the same type of views.
Dimensions determine the contextual background for the facts.
Dimensions represent the way business people talk about the data resulting from a business process, e.g., who, what, when, where, why, how

Dimension Table
Table used to store qualitative data about fact records
Who
What
When
Where
Why

Dimension data should be
verbose, descriptive
complete
no misspellings, impossible values
indexed
equally available
documented ( metadata to explain origin, interpretation of each attribute)

Dimensional model
Visualize a dimensional model as a CUBE (hypercube because dimensions can be more than
3 in number)
Operations for OLAP
Drill Down :Higher level of detail
Roll Up: summarized level of data
(The navigation path is determined by hierarchies within dimensions.)
Slice: cuts through the cube. Users can focus on specific perspectives
Dice: rotates the cube to another perspective (change the dimension)
Drill down …. Roll up
Slice and Dice
Dimensions
Collection of members or units of the same type of views.
determine the contextual background for the facts.
the parameters over which we want to perform OLAP (eg. Time, Location/region, Customers)
Member is a distinct name to determine data item’s position (eg. Time - Month, quarter)
Hierarchy arrange members into hierarchies or levels

Hierarchies
Aggregates
Aggregate Tables are pre-stored summarized tables… created at a higher level of granularity across any or all of the dimensions.

If the existing granularity is Day wise sales, then creating a separate month wise sales table is an example of Aggregate Table.
The use of such aggregates is the single most effective tool the data warehouse designer has to improve query performance.

Usage of Aggregates can increase the performance of Queries by several times.

Measures
A measure is a numeric attribute of a fact, representing the performance or behaviour of the business relative to dimensions.
The actual numbers are called as variables.
eg. sales in money, sales volume, quantity supplied, supply cost, transaction amount
A measure is determined by combinations of the members of the dimensions and is located on facts.
Types of Facts
Additive
Able to add the facts along all the dimensions
Discrete numerical measures eg. Retail sales in $
Semi Additive
Snapshot, taken at a point in time
Measures of Intensity
Not additive along time dimension eg. Account balance, Inventory balance
Added and divided by number of time period to get a time-average
Non Additive
Numeric measures that cannot be added across any dimensions
Intensity measure averaged across all dimensions eg. Room temperature
Textual facts - AVOID THEM

Common structures for Data Marts Big Grinenormalize!
Star
Single fact table surrounded by denormalized dimension tables
The fact table primary key is the composite of the foreign keys (primary keys of dimension tables)
Fact table contains transaction type information.
Many star schemas in a data mart
Easily understood by end users, more disk storage required

Example of Star Schema
Common structures for Data MartsBig Grinenormalize!
Snowflake
Single fact table surrounded by normalized dimension tables
Normalizes dimension table to save data storage space.
When dimensions become very very large
Less intuitive, slower performance due to joins
May want to use both approaches, especially if supporting multiple end-user tools.

Example of Snow flake schema
Keys …
Primary Keys
uniquely identify a record
Foreign Keys
primary key of another table referred here
Surrogate Keys
system-generated key for dimensions
key on its own has no meaning
integer key, less space

More Keys …
Smart Keys
primary key out of various attributes of dimension
AVOID THEM!
Join to Fact table should be on single surrogate key
Production Keys
DO NOT USE Production defined attributes
Business may reuse/change them - DW cannot!

Basic Dimensional Modeling Techniques
Slowing changing Dimensions
Rapidly changing Small Dimensions
Large Dimensions
Rapidly changing Large Dimensions
Degenerate Dimensions
Junk Dimensions

Slowly Changing Dimensions
A dimension is considered a Slowly Changing Dimension when its attributes remain almost constant over time, requiring relatively minor alterations to represent the evolved state.
Types of SCD
Type I – Retains the recent updated Value
Type II – Entire history of the dimensional data is maintained.
Version
Flag
Date
Type III – Only Current and one Previous value get maintained.
The Time Dimension
Time_key
day_of_week
day_number_in_month
day_number_overall
week_number_in_year
month
quarter
fiscal_period
holiday_flag
weekday_flag
last_day_in_month_flag
season
event
Time Dimension
An exclusive Time dimension is required because the SQL date semantics and functions cannot generate several important attributes required for analytical purposes.
Attributes like weekdays, weekends, fiscal period, holidays, season cannot be generated by SQL statements.
Moreover SQL date stamps occupy more space largely increasing the size of the fact table.
Joins on such SQL generated date-stamps are costly decreasing the query speed significantly.
Time Dimension
The holiday flag and season attributes are useful for holiday VS non-holiday analysis and season business analysis.
Event attribute is needed to record special days like strike days, etc..
Data Modeling for Data Warehouse
12 Steps :
1. Study ER 2.Evaluate and Analyse
3. Review Dimension 4. Add Time Dimension
5. Identify Facts 6. Granularity
7. Merge Facts 8. Review Facts
9. Name Facts 10. Size the model
11. Record Metadata 12. Validate mode
ETL Overview
Extraction
Source Systems (Multiple Source Systems)
Flat files, Excel, Legacy Systems, RDBMS etc.
Frequency of Extraction
Staging Area (If any? How many?)
Most Transformations from Source to Staging
Cleansing and Data Quality
Data integrity, De-duplication, completeness, correctness
Transformation
Usage of tools
Reusability of Transformations
Reusability of Mappings
Different tools
Informatica
Oracle Warehouse Builder
ETI
Sagent
PL/SQL scripts
Open Source Tools
Loading
Loading Frequency
Optimized Loading
Indexing
Partitioning
Aggregation
Sum
Average
Max
Update Strategy
Error Handling
STAGING AREA - Some Clarity
Staging Area
optional
to cleanse the source data
Accepts data from different sources
Data model is required at staging area
Multiple data models may be required for parking different sources and for transformed data to be pushed out to warehouse
ODS - Some Clarity
Operational Data Store
Optional
Granular, detailed level data
May feed warehouse (eg when warehouse is aggregated)
Usually a relational model
May keep data for a smaller time period than warehouse

Data Warehouse Architecture
DW Architecture
Architecture Choices depend on
Current infrastructure
Business environment
Desired management and control structure
resources
commitment …..
Data Warehouse/data mart
Types of Data Warehouse
Enterprise Data Warehouse
Data Mart
Enterprise data warehouse
Contains data drawn from multiple operational systems
Supports time- series and trend analysis across different business areas
Can be used as a transient storage area to clean all data and ensure consistency
Can be used to populate data marts
Can be used for everyday and strategic decision making
Data Mart
Logical subset of enterprise data warehouse
Organized around a single business process
Based on granular data
May or may not contain aggregates
Object of analytical processing by the end user.
Less expensive and much smaller than a full blown corporate data warehouse.

DW Implementation Approaches
Top Down
Bottom-up
Combination of both
Choices depend on:
current infrastructure
resources
architecture
ROI
Implementation speed
Top Down Implementation
Bottom Up Implementation
OLAP
What is OLAP??????
OLAP - On Line Analytical Processing
OLAP enables analysts, managers, and executives to gain insight into data through fast, consistent, interactive access to a wide variety of possible views of information.
OLAP transforms raw data so that it reflects the real dimensionality of the enterprise as understood by the user.
Data Warehousing vs. OLAP
OLAP focuses on
Data transformed into information that meets the end-user’s analytical requirements
Data modeling and computation processes is consistent
OLTP and DW provides the source data whereas, OLAP turns that data into information.
OLAP - Functionality
OLAP functionality is characterized by
Dynamic multi-dimensional analysis of consolidated enterprise data supporting end user analytical and navigational activities.
Calculations and modeling applied across dimensions, through hierarchies and/or across members
Trend analysis over sequential time periods
Slicing subsets for on-screen viewing
Drill -down to deeper levels of consolidation
Reach-through to underlying detail data
Rotation to new dimensional comparisons in the viewing area
OLAP - Functionality
OLAP is implemented in a multi-user client/server mode and offers consistently rapid response to queries, regardless of database size and complexity.
OLAP helps the user synthesize enterprise information through comparative, personalized viewing, as well as through analysis of historical and projected data in various "what -if" data model scenarios.
OLAP Functional Requirements
Fast Access and Calculations
Speed is critical to maintain an analyst’s train of thought.
An analyst needs to navigate throughout the data which requires aggregations, or roll-ups.
Powerful Analytical Capabilities
There is more complicated calculations to OLAP than simple aggregations, or roll-ups.
OLAP Functional Requirements
Flexibility
viewing: graphs, charts, row or columns
definitions: format of numbers, name changes
analysis: Sales analyze data differently than marketing
interfaces: section wise,report look
Reply
#8



to get information about the topic "data warehouse computer science"full report ppt and related topic refer the page link bellow

http://studentbank.in/report-dbms-data-m...arehousing

http://studentbank.in/report-integration...ng-systems

http://studentbank.in/report-a-survey-of...on-and-lib
Reply

Important Note..!

If you are not satisfied with above reply ,..Please

ASK HERE

So that we will collect data for you and will made reply to the request....OR try below "QUICK REPLY" box to add a reply to this page
Popular Searches: powered by smf data communication networking concepts, data warehousing concepts tutorial ppt, project about warehousing, introduction to technology concepts fbla, warehousing project, warehousing used by shampoos, olap acronym,

[-]
Quick Reply
Message
Type your reply to this message here.

Image Verification
Please enter the text contained within the image into the text box below it. This process is used to prevent automated spam bots.
Image Verification
(case insensitive)

Possibly Related Threads...
Thread Author Replies Views Last Post
  Block Chain and Data Science jntuworldforum 0 8,144 06-10-2018, 12:15 PM
Last Post: jntuworldforum
  Data Encryption Standard (DES) seminar class 2 9,364 20-02-2016, 01:59 PM
Last Post: seminar report asees
  Skin Tone based Secret Data hiding in Images seminar class 9 7,035 23-12-2015, 04:18 PM
Last Post: HelloGFS
Brick XML Data Compression computer science crazy 2 2,396 07-10-2014, 09:26 PM
Last Post: seminar report asees
  Data Security in Local Network using Distributed Firewalls computer science crazy 10 15,021 30-03-2014, 04:40 AM
Last Post: Guest
  GREEN CLOUD -A Data Center Approach computer topic 0 1,543 25-03-2014, 10:13 PM
Last Post: computer topic
  3D-OPTICAL DATA STORAGE TECHNOLOGY computer science crazy 3 8,521 12-09-2013, 08:28 PM
Last Post: Guest
  Security in Data Warehousing seminar surveyer 3 10,017 12-08-2013, 10:24 AM
Last Post: computer topic
Star DATA MINING AND WAREHOUSE seminar projects crazy 2 3,374 05-02-2013, 12:00 PM
Last Post: seminar details
Thumbs Up Fiber Distributed Data Interface Computer Science Clay 1 8,296 23-01-2013, 03:48 PM
Last Post: seminar details

Forum Jump: