19-02-2011, 04:12 PM
presented by:
Amrita mishra
[attachment=8857]
PARALLEL DATABASE SYSTEMS
INTRODUCTION
A parallel database system (PDBS) is a DBMS implemented on a parallel computer which is made of a number of nodes (processors and memories) connected by a fast network within a cabinet.
It strives to exploit modern multiprocessor architectures using software-oriented solutions for data management
OBJECTIVE
Problems of conventional DBMS
- high disk access time.
- very large databases cant be supportedwithin a single system.
PDBS is the only viable solution for increasing the I/O bandwidth through parallelism & for storing huge databases in a single system.
ADVANTAGES OF PDBS
High Performance – Increased throughput (inter-query parallelism) & decreased response time (intra-query parallelism).
High Availability – Using data replication.
Extensibility – Linear scaleup and Linear speedup.
PARALLEL DBMS ARCHITECTURE
Shared Memory
Advantages – Simplicity, Load Balancing.
Problems – Cost, Limited Extensibility, Low Availability.
Shared Disk
Advantages – Cost, Extensibility, Load Balancing, Availability.
Problems – Higher Complexity, Potential Coherence Problems
Shared Nothing
Advantages – Cost, Extensibility, Availability.
Problems – Complex, Addition of new nodes requires reorganizing the database.
PARALLEL DBMS TECHNIQUES DATA ALLOCATION – Methods that spread the database across the system’s disks to ensure efficient parallel I/O.
Partitioning (Fragmentation) – 3 strategies
# Round Robin – i th tuple to partition (i mod n) for n partitions.
# Hashing – Apply hash function to some attribute to give partition no.
# Range Partitioning – Distribute tuples based on value(ranges) of some attribute.
USES OF DATA FRAGMENTATION
Maximize system performance.
Minimize response time (through intra-query parallelism).
Maximize throughput (through inter-query parallelism).
Problems: Skewed data distributions lead to non-uniform partitioning & hurt load balancing.