Object Oriented Database Systems
1.1 Modern Database Applications involve complex, specialised data structures
- design (CAD), engineering examples:
- building layout
- mechanics
- electronics (VLSI)
- software production
- chemical structures
- cartography (GIS: Geographic Information Systems)
- image processing
- knowledge engineering
- industrial production (CAM, CIM)
- others, e.g. office automation
1.2 Characteristics of Design Applications
- revolve around artifacts
- objects built out of other objects
- iterative
- multiple levels of abstraction
- tasks are shared among designers
1.3 What Object-Oriented Design is good at:
cf. OODB System Manifesto (Atkinson et al., 1989)
- O1 complex objects
- O2 object identity
- O3 encapsulation
- O4 types or classes
- O5 inheritance
- O6 overriding, overloading and late binding
- O7 computational completeness
- O8 extensibility
1.4 What Relational Databases are good at:
cf. OODB System Manifesto (Atkinson et al., 1989)
- D1 Persistence
- D2 Storage Management
- D3 Concurrency
- D4 Recovery
- D5 Ad Hoc Query Facility
1.5 Requirements of Modern Database Applications
- complex data structures: modeling, maintenance, access (O1, O2, O3, O6, D1)
- extensible type system (O4, O5, O8)
- navigation and query (D5)
- high performance (O7, D2, D3, D4)
If only relational databases were used, then the complex
data structures and the type system would have to be maintained by
external programs. These would have to be specially written for
each new application.
If object oriented programming languages (without databases) were used,
then special procedures would have to be written to store, access,
navigate and query the data. Storage management, concurrency and
recovery mechanisms would have to be specially written for
each new application.
1.6 Relational Database Issues:
Because the relational model is so simple, relational databases ...
- are fast and efficient
- have a "simple" formal model and semantics
- support data independence (physical & logical)
But users cannot ...
- define types (only fixed number of built-in types is available)
- express nested relationships: e.g. ((Street, Number) City)
- represent/manipulate complex entities as a single unit
- sufficiently express data that does not map well to tables
- write methods (database cannot represent behavior)
Users must explicitly ...
- manage various types of relationships (e.g. is-a, association, aggregation)
- define keys (integrity problems)
- write procedures for versioning
- write procedures for long duration transactions
Because SQL is not computationally complete ...
- some computations are not possible, e.g. find all rooms near the location
of room B2
- transitive closure is not computable (parts explosion problem)
- some applications require external programming language
1.7 Semantic Data Models
- Entity Relationship Model
- Extended Relational Model
- Semantic Data Model
- Functional Data Model
- Object Oriented Model
2.1 Object identity (O2)
Entities may not have identifiers:
- an entity may not have a unique name (e.g., literals versus objects)
- an entity may have more than 1 unique name (e.g., references)
- an entity may change its name over a period of time
2.2 Examples
1> a:= 5
2> b:= 5
3> c:= a
4> a:= 6
5> b:= "Hello World"
2.3 Examples - continued
- equal values: a:= 5 and b:= 5
- equal variable names but different values: a:= 5 and a:= 6
- different variables: b:= 5 and b:= "Hello World"
- c:= a either means deep copy or shallow copy
2.4 Identity versus Value (or State)
- different objects can have the same value
- identity: objects are identical if they have the same identifier
- equality: objects are equal if they have the same value(s)
- identity neither implies equality nor equality implies identity
- deep equality (all levels down must be checked) and shallow equality
(pointer to first level)
The Relational Model is Value-Based:
- instances (rows) are identified by primary keys
- keys are user-defined and can be changed by users
- results in the need for referential integrity
2.5 Object Identity
- object identity is independent of value and updates
- no misleading references to objects
- there is a function I that maps an object into its identity
Object Identifiers ...
- are (system-wide) unique
- are managed by the system
- never change during object-lifetime
- are never reused after object deletion
- do not carrying any semantics
2.6 Object Sharing
Example:
- employee database - two employees live in the same suburb
- suburb: equal value or identical object?
In Relational Databases:
- reference by foreign key for entity instances
- reference by value for attributes
Object Identity implies:
- different structures can refer to the same object
- avoids ambiguity and redundancy
2.7 Advantages of Object Identity:
- facilitates object sharing
- users do not need to worry about it, managed by the system
- objects can still have additional user-controlled names, these names
can be different in different applications and can be changed freely
- semantics of retrieval and manipulation clear
- consistency rules can be easily specified
But the system has more to do:
- operations for object assignment, deep and shallow copy needed
- tests for equality needed
- complex objects can be graphs: need to be managed by the system
- the semantics of the system is more complicated
3.1 Complex Objects (O1)
Complex objects are built from simple objects using constructors:
- data abstraction: types
- a type defines a representation and a set of operations
- representation = any other type
- operation = program (method) that can access the representation
- type constructors: tuple, set, array
- orthogonality of objects and constructors:
constructors can be used for any object
3.2 Object Description
Attributes vs. Properties
- Properties => Unidirectional
- Attributes => Bidirectional
- Can be modeled as a pair of properties with inverse.
Attribute Values
- simple types (literals, strings, integers) versus abstract data types
- single-valued versus multi-valued (set-values)
Attribute Domain
- set of values of similar type
Class Attributes
- associate a value with a type/class which applies to
class as a whole, e.g. minimum salary of class employee
3.3 Association and Aggregation
Association: is-associated-to relationship
- associate objects from several independent classes
- when an association instance is deleted, the
participating objects continue to exist
Aggregation: has-attribute and is-part-of relationships
- building composite objects from their component objects
- e.g. aggregate attribute values of an object to form the whole object
- e.g. aggregate objects that are related by a particular relationship
instance into a higher level aggregate object
- if an aggregate instance is deleted the component objects are also deleted
3.4 Operations for Complex Objects
- retrieve object and subobjects
- subject to retrieval predicates
- restricted to attributes/components of interest
- create and delete objects/subobjects (structure-building operations)
- deletion with/without components
- navigation within object structure
3.5 Sharing Revisited
- object identity facilitates object sharing
- objects have independent existence
- but parts of objects may not have independent existence
- dependent parts only exist while container exists
Sharing parts is dangerous if parts do not have independent existence!
| independent (own existence) | dependent (no own existence)
|
---|
sharable | e.g. module, class | e.g. public method
|
not sharable | e.g. private class | e.g. private method
|
4.1 Encapsulation (O3)
2 Levels in Object-Oriented Programming:
- 1) specification is visible for application programs
- interface describes allowable operations
- 2) implementation is encapsulated, hidden
- data part (state, values, attributes)
- procedural part (operations, methods)
-> encapsulation, logical data independence
- application programs are protected from implementation details
2 Levels in Relational Databases:
- 1) data
- 2) program (ad hoc query language + programming language)
-> data independent from programming
- allows ad hoc queries
- but table-specific methods cannot be defined
4.2 Encapsulation: Pros and Cons
Pros:
- extensibility, software engineering
Cons:
- ad hoc queries and similar operations are not allowed
(not all ad hoc queries raise maintainability issues thus there is no
reason to prohibit these)
- optimization, lack of a theory
4.3 Overriding, Overloading and Late Binding (O6)
- operations written at top level and overridden by subclasses
- overloading: different programs under same name depending on context
- late-binding: at run-time not compile-time
-> hides complexity from application programs
4.4 Computational Completeness (O7)
- programming languages are usually complete
- SQL is not complete, but SQL + programming language is complete
- different from "resource completeness"
4.5 Extensibility (O8)
- users can define their own types, methods, etc
- no distinction in usage between user-defined and system types
- there may be performance difference between user-defined and system types
5.1 Types or Classes (O4)
Types (e.g. C++, Java)
- summarize common features of a set of objects
- type-checking at compile-time for consistency
Classes (e.g. Smalltalk)
- similar to types but can be manipulated at run-time
- object factory (for creating new objects)
- object warehouse (extension = all instances of a class)
5.2 Natural Types versus Role Types
natural types: e.g. gender, species
- object belongs to at most one class of each natural type
- "classification"
role types: e.g. role of employee, customer, family relationships
- object can have different roles, simultaneously or at different times
5.3 Classes/Types: Pros and Cons
Pros:
- simplification, modularization, encapsulation
- operations/attributes can apply to instance or to all class members
simultaneously
- user-definable
Cons:
- role types imply that objects can be members of different classes
- class library: large vocabulary of classes and methods
6.1 Class Hierarchy
Hierarchy of Classes, Subclasses and Superclasses
- e.g. programmer -> computer scientist -> employee
- is-a relationship between subclass/class
- member-of relationship: objects are members of a class and its superclasses
Specialisation/Generalisation
- facilitates incremental design
- specialisation: top-down conceptual refinement
- separation based on differentiating features
- generalisation: bottom-up conceptual synthesis
- grouping based on suppressing differences
- normally a combination of both specialisation and
generalisation processes are employed
6.2 Multiple and Flexible Hierarchies
- tree hierarchy: each class has one immediate superclass
- poly-hierarchy: class can have several immediate superclasses
- because of role types: class may belong to different superclasses
- different contexts may require different hierarchies
- type lattice: unique smallest common superclass and unique largest
common subclass exist for each set of classes (i.e., multiple paths
exists but it can be calculated where they intersect)
6.3 Inheritance (O5)
Attribute Inheritance
- object has specific class attributes
- object inherits attributes from superclasses
Complexity of Inheritance
- simple inheritance (in tree hierarchies)
- multiple inheritance (in type lattices and poly-hierarchies)
- can lead to name conflicts
Degree of Inheritance
- selective inheritance
- default inheritance: can be overridden at lower levels
6.4 Types of Inheritance
- substitution inheritance
- based on behavior not values
- an instance of A can be used in any context in which an instance of B
is expected
- inclusion inheritance
- based on classification structure
- an instance of A is also an instance of B
- constraint inheritance
- subcase of inclusion inheritance
- instance of A has the same operations and fields as instance of B
- but: instance of A fulfills certain further constraints
- e.g. teenager -> person
- specialization inheritance
- subcase of inclusion inheritance
- but: instance of A has some extra fields compared to instance of B
6.5 Pros and Cons of Inheritance
Pros:
- code reutilisation
- additional semantics are represented
- modeling discipline
Cons:
- ambiguity and name conflicts in multiple inheritance
- maintenance
6.6 OO Typing System
- substitutability
- static type checking
- mutability
- subtyping by specialisation
Can't build a type system with all 4, can choose any 3 -
which ever are the most important
7.1 Comparison
| Pros, Why is it useful
| Cons, Why is it difficult
| How is it implemented in Object-Oriented Programming
| How is it implemented in Relational Databases
|
O1 complex objects
| modularity, object sharing
| operations needed
| attributes, constructors
| only system-defined types (e.g. date)
|
O2 object identity
| consistency
| system must maintain it
| object ID
| keys, referential integrity
|
O3 encapsulation
| extensibility software engineering
| optimization no ad hoc queries
| implementation, specification
| data separate from program
|
O4 types or classes
| modularization, user-definable
| role types, maintenance
| is implemented
| tables, no methods, no pointers
|
O5 inheritance
| code reutilisation
| name conflicts maintenance
| attribute and method inheritance
| ---
|
O6 overriding, overloading and late binding
| simplification for user
| no compile time type checking
| via class hierarchy
| ---
|
O7 computational completeness
| Church Turing hypothesis
| ---
| is complete
| needs programming language
|
O8 extensibility
| hide complexity
| optimization
| user defined types
| ---
|
D1 persistence
| easier for programmer
| difficult for complex structures, some data must be transient
| ---
| is implemented
|
D2 storage management
| easier for programmer
| ---
| memory allocation
| is implemented
|
D3 concurrency
| multiple users
| many possible application programs
| threads
| is implemented
|
D4 recovery
| stability, security
| many possible application programs
| ---
| is implemented
|
D5 ad hoc query facility
| direct data access
| difficult for complex structures
| ---
| is implemented
|
7.2 Summary
- Relational model and OO model have conflicting advantages/disadvantages
- There may never be a single widely accepted OODB model (such as
Relational Algebra is for relational databases)
- Different approaches (OO, Relational DB or OODB) may be necessary
for different applications