If you want to do any serious data analysis in R, you need to learn some of
the object oriented features that this program has. The term "object
oriented" is difficult to define. The Wikipedia provides the following
definition:
In computer science, object-oriented programming, OOP for short, is a
computer programming paradigm. The idea behind object-oriented programming
is that a computer program is composed of a collection of individual units,
or objects, that act on each other, as opposed to a traditional view in
which a program is a list of instructions to the computer. Each object is
capable of receiving messages, processing data, and sending messages to
other objects. Object-oriented programming is claimed to give more
flexibility, easing changes to programs, and is widely popular in large
scale software engineering. Furthermore, proponents of OOP claim that OOP
is easier to learn for those new to computer programming than previous
approaches, and that the OOP approach is often simpler to develop and to
maintain, lending itself to more direct analysis, coding, and understanding
of complex situations and procedures than other programming methods.
en.wikipedia.org/wiki/Object-oriented_programming
The Wikipedia then presents six fundamental concepts associated with OOP.
* Class ' the unit of definition of data and behavior (functionality)
for some kind-of-thing, a class (for example, Dog) is the basis of
modularity and structure in an object-oriented computer program. A class
should typically be recognizable to a non-programmer familiar with the
problem domain, and the code for a class should be coherent and decoupled
(as should the code for any good pre-OOP function). With such modularity,
the structure of a program will correspond to the aspects of the problem
that the program is intended to solve.
* Object ' an instance of a class, an object (for example, "Rin Tin
Tin" the Dog) is the run-time manifestation of a particular exemplar of a
class. Each object has its own data, though the code within a class is
shared for economy.
In R, there is an lm class for the output of a linear regression model. For
example, the statement:
bivariate.model.1 <- lm(y~x1+x2)
creates an object, bivariate.model.1, of class lm.
* Encapsulation ' a type of privacy applied to the data and some of
the methods (that is, functions or subroutines) in a class, encapsulation
ensures that an object can be changed only through established channels
(namely, the class's public methods). Each object exposes an interface '
those public methods, which specify how other objects may read or modify
it. An interface can prevent, for example, any caller from adding a list of
children to a Dog when the Dog is less than one year old.
There is an update function in R that will take an existing lm object and
modify the fit by adding or removing terms from the regression model. The
coef function extracts model coefficients from an lm object.
* Inheritance ' a mechanism for creating subclasses, inheritance
provides a way to define a (sub)class as a specialization or subtype or
extension of a more general class (as Dog is a subclass of Canidae); a
subclass acquires all the data and methods of all of its superclasses, but
it can add or change data or methods as the programmer chooses. Inheritance
is the "is-a" relationship: a Dog is-a Canidae. This is in contrast to
composition, the "has-a" relationship, which user-defined datatypes brought
to computer science: a Dog has-a mother (another Dog) and has-a father,
etc.
The lm class has many subclasses for more complex regression methods. For
example, the glm class is used for generalized linear models and the lme
class is used for linear mixed effects models.
* Abstraction ' the ability of a program to ignore the details of an
object's (sub)class and work at a more generic level when appropriate; For
example, "Rin Tin Tin" the Dog may be treated as a Dog much of the time,
but when appropriate he is abstracted to the level of Canidae (superclass
of Dog) or Carnivora (superclass of Canidae), and so on.
The coef function produces the same type of results whether it is given an
lm object or a glm object.
* Polymorphism ' polymorphism is behavior that varies depending on the
class in which the behavior is invoked. For example, the result of bark()
for a Dog would differ from the result of bark() for a Jackal; and in a
more sophisticated animal-emulation program, bark() would differ for a
Chihuahua and a Saint Bernard.
The predict function produces predicted values for an lm object. For a glm
object, it also produces predicted values, but allows you to specify whether
you to predict on the original response scale or after the appropriate
link function has been applied. The coef function for an lme object (in
contrast to lm and glm objects) is more complex because there are estimates
at the various levels of the linear mixed effects models (e.g., estimates of
coefficents between subjects and coefficients within subjects).
When I get a chance I want to discuss the difference between S3 and S4
objects in R. Here are some references that discuss S3 and S4 objects:
This page was last modified on
08/21/07.
You are welcome to link to this page or other pages on this web site.
Individual educational uses are also okay. Please contact me for permission to
use these pages in any other way. For more details, please consult my
copyright notice.