28-03-2011, 12:12 PM
[attachment=11151]
Introduction
• A data type defines a collection of data objects and a set of predefined operations on those objects
• A descriptor is the collection of the attributes of a variable
• An object represents an instance of a user-defined (abstract data) type
Primitive Data Types
• Almost all programming languages provide a set of primitive data types
• Primitive data types: Those not defined in terms of other data types
• Some primitive data types are merely reflections of the hardware
• Others require little non-hardware support
Primitive Data Types: Integer
• Almost always an exact reflection of the hardware so the mapping is trivial
• There may be as many as eight different integer types in a language
• Java’s signed integer sizes: byte, short, int, long
Primitive Data Types: Floating Point
• Model real numbers, but only as approximations
• Languages for scientific use support at least two floating-point types (e.g., float and double; sometimes more
• Usually exactly like the hardware, but not always
• IEEE Floating-Point
Standard 754
Primitive Data Types: Decimal
• For business applications (money)
– Essential to COBOL
– C# offers a decimal data type
• Store a fixed number of decimal digits
• Advantage: accuracy
• Disadvantages: limited range, wastes memory
Primitive Data Types: Boolean
• Simplest of all
• Range of values: two elements, one for “true” and one for “false”
• Could be implemented as bits, but often as bytes
– Advantage: readability
Primitive Data Types: Character
• Stored as numeric codings
• Most commonly used coding: ASCII
• An alternative, 16-bit coding: Unicode
– Includes characters from most natural languages
– Originally used in Java
– C# and JavaScript also support Unicode
Character String Types
• Values are sequences of characters
• Design issues:
– Is it a primitive type or just a special kind of array?
– Should the length of strings be static or dynamic?
• Character String Types Operations
• Typical operations:
– Assignment and copying
– Comparison (=, >, etc.)
– Catenation
– Substring reference
– Pattern matching
• Character String Type in Certain Languages
• C and C++
– Not primitive
– Use char arrays and a library of functions that provide operations
• SNOBOL4 (a string manipulation language)
– Primitive
– Many operations, including elaborate pattern matching
• Java
– Primitive via the String class
Character String Length Options
• Static: COBOL, Java’s String class
• Limited Dynamic Length: C and C++
– In C-based language, a special character is used to indicate the end of a string’s characters, rather than maintaining the length
• Dynamic (no maximum): SNOBOL4, Perl, JavaScript
• Ada supports all three string length options
• Character String Type Evaluation
• Aid to writability
• As a primitive type with static length, they are inexpensive to provide--why not have them?
• Dynamic length is nice, but is it worth the expense?
• Character String Implementation
Static length: compile-time descriptor
• Limited dynamic length: may need a run-time descriptor for length (but not in C and C++)
• Dynamic length: need run-time descriptor; allocation/de-allocation is the biggest implementation problem
• Compile- and Run-Time Descriptors
User-Defined Ordinal Types
• An ordinal type is one in which the range of possible values can be easily associated with the set of positive integers
• Examples of primitive ordinal types in Java
– integer
– char
– boolean
Enumeration Types
• All possible values, which are named constants, are provided in the definition
• C# example
enum days {mon, tue, wed, thu, fri, sat, sun};
• Design issues
– Is an enumeration constant allowed to appear in more than one type definition, and if so, how is the type of an occurrence of that constant checked?
– Are enumeration values coerced to integer?
– Any other type coerced to an enumeration type?
Evaluation of Enumerated Type
• Aid to readability, e.g., no need to code a color as a number
• Aid to reliability, e.g., compiler can check:
– operations (don’t allow colors to be added)
– No enumeration variable can be assigned a value outside its defined range
– Ada, C#, and Java 5.0 provide better support for enumeration than C++ because enumeration type variables in these languages are not coerced into integer types
Subrange Types
• An ordered contiguous subsequence of an ordinal type
– Example: 12..18 is a subrange of integer type
• Ada’s design
type Days is (mon, tue, wed, thu, fri, sat, sun);
subtype Weekdays is Days range mon..fri;
subtype Index is Integer range 1..100;
• Subrange Evaluation
• Aid to readability
– Make it clear to the readers that variables of subrange can store only certain range of values
• Reliability
– Assigning a value to a subrange variable that is outside the specified range is detected as an error
• Implementation of User-Defined Ordinal Types
• Enumeration types are implemented as integers
• Subrange types are implemented like the parent types with code inserted (by the compiler) to restrict assignments to subrange variables
Array Types
• An array is an aggregate of homogeneous data elements in which an individual element is identified by its position in the aggregate, relative to the first element.
Array Design Issues
• What types are legal for subscripts?
• Are subscripting expressions in element references range checked?
• When are subscript ranges bound?
• When does allocation take place?
• What is the maximum number of subscripts?
• Can array objects be initialized?
• Are any kind of slices allowed?
• Array Indexing
• Indexing (or subscripting) is a mapping from indices to elements
array_name (index_value_list) ® an element
• Index Syntax
– FORTRAN, PL/I, Ada use parentheses
• Ada explicitly uses parentheses to show uniformity between array references and function calls because both are mappings
– Most other languages use brackets
Arrays Index (Subscript) Types
• FORTRAN, C: integer only
• Pascal: any ordinal type (integer, Boolean, char, enumeration)
• Ada: integer or enumeration (includes Boolean and char)
• Java: integer types only
• C, C++, Perl, and Fortran do not specify range checking
• Java, ML, C# specify range checking
Subscript Binding and Array Categories
• Static: subscript ranges are statically bound and storage allocation is static (before run-time)
– Advantage: efficiency (no dynamic allocation)
• Fixed stack-dynamic: subscript ranges are statically bound, but the allocation is done at declaration time
– Advantage: space efficiency
• Stack-dynamic: subscript ranges are dynamically bound and the storage allocation is dynamic (done at run-time)
– Advantage: flexibility (the size of an array need not be known until the array is to be used)
• Fixed heap-dynamic: similar to fixed stack-dynamic: storage binding is dynamic but fixed after allocation (i.e., binding is done when requested and storage is allocated from heap, not stack)
• Heap-dynamic: binding of subscript ranges and storage allocation is dynamic and can change any number of times
– Advantage: flexibility (arrays can grow or shrink during program execution)
• Subscript Binding and Array Categories (continued)
• C and C++ arrays that include static modifier are static
• C and C++ arrays without static modifier are fixed stack-dynamic
• Ada arrays can be stack-dynamic
• C and C++ provide fixed heap-dynamic arrays
• C# includes a second array class ArrayList that provides fixed heap-dynamic
• Perl and JavaScript support heap-dynamic arrays
Array Initialization
• Some language allow initialization at the time of storage allocation
– C, C++, Java, C# example
int list [] = {4, 5, 7, 83}
– Character strings in C and C++
char name [] = “freddie”;
– Arrays of strings in C and C++
char *names [] = {“Bob”, “Jake”, “Joe”];
– Java initialization of String objects
String[] names = {“Bob”, “Jake”, “Joe”};
Arrays Operations
• APL provides the most powerful array processing operations for vectors and matrixes as well as unary operators (for example, to reverse column elements)
• Ada allows array assignment but also catenation
• Fortran provides elemental operations because they are between pairs of array elements
– For example, + operator between two arrays results in an array of the sums of the element pairs of the two arrays
Rectangular and Jagged Arrays
• A rectangular array is a multi-dimensioned array in which all of the rows have the same number of elements and all columns have the same number of elements
• A jagged matrix has rows with varying number of elements
– Possible when multi-dimensioned arrays actually appear as arrays of arrays
Slices
• A slice is some substructure of an array; nothing more than a referencing mechanism
• Slices are only useful in languages that have array operations
Implementation of Arrays
• Access function maps subscript expressions to an address in the array
• Access function for single-dimensioned arrays:
address(list[k]) = address (list[lower_bound])
+ ((k-lower_bound) * element_size)