Data Structures and Algorithms: October 2014

Using Iterators to Overcome Array-based Problems

Using simple array-based iteration not only ties algorithms to using arrays, but also requires that the
logic for determining which elements stay, which go, and in which order to process them, is known in
advance.

An iterator (also known as an enumerator) solves these problems by providing a generic interface for
looping over a set of data so that the underlying data structure or storage mechanism—such as an array,
database, and so on—is hidden

Whereas simple iteration generally requires you to write specific code
to handle where the data is sourced from or even what kind of ordering or preprocessing is required, an
iterator enables you to write simpler, more generic algorithms

Iterator Operations

public interface Iterator {
public void first();

public void last();

public boolean isDone();

public void next();

public void previous();

public Object current() throws IteratorOutOfBoundsException;
}

Not all data structures allow traversing the data in both directions, nor does it always make sense.

For this reason, it is acceptable for any of the traversal methods—first(), last(), next(), and previous()—to throw an UnsupportedOperationException to indicate this missing or unimplemented behavior

public class IteratorOutOfBoundsException extends RuntimeException {

}

The Iterable Interface

Iterable is an interface that provides a generic way to obtain an iterator from any data structure that supports it.

public interface Iterable {

public Iterator iterator();

}

Using an Iterator

Once an iterator has been obtained—

either by explicit construction or as an argument to a method—position to the beginning or end as

appropriate

Iterator iterator = ...;
iterator.first();
while (!iterator.isDone()) {
Object object = iterator.current();
...
iterator.next();
}

Iterator iterator = ...;

for (iterator.first();!iterator.isDone(); iterator.next()) {

Object object = iterator.current();

...

}

Standard Iterators

Array Iterator

The most obvious iterator implementation is one that wraps an array.

By encapsulating an array within an iterator, you can start writing applications that operate on arrays now and still extend easily to other data structures in the future.

Reverse Iterator

With a reverse iterator, however, the same behavior can be achieved without re-sorting and without duplicated code.

When the application calls first(), the reverse iterator actually calls last() on the underlying iterator. When the application calls next(), the underlying iterator’s previous() method is invoked, and so on. In this way, the behavior of the iterator can be reversed without changing the client
code that displays the results, and without re-sorting the array

Filtering Iterator

One of the more interesting and useful advantages of using iterators is the capability to wrap or decorate
(see the Decorator pattern [Gamma, 1995]) another iterator to filter the return values

The filter iterator works by wrapping another iterator and only returning values that satisfy some condition,
known as a predicate

Each time the underlying iterator is called, the returned value is passed to the
predicate to determine whether it should be kept or discarded

public interface Predicate {
public boolean evaluate(Object object);
}

evaluate(), that is called for each value, and returning a Boolean to indicate whether the value meets the selection criteria or not

Deterministic Algorithms:

the result of any algorithm can be determined exactly based on the inputs

Heuristic

problem is so difficult that finding a precise solution can be too costly in terms of time or resources
this case, a heuristic may be a more practical approach. Rather than try to find a perfect solution, a heuristic uses certain well-known characteristics of a problem to produce an approximate solution

Heuristics are often used to sift through data, removing or ignoring values that are irrelevant so that the
more computationally expensive parts of an algorithm can operate on a smaller set of data.

the major drawback with using a heuristic is that it is usually not possible to determine how it
will perform all of the time

This leads to a level of uncertainty in the overall
algorithm that may or may not be acceptable depending on the application.

Understanding Complexity in Relation to Algorithms

performance (the amount of CPU/memory/disk usage)
complexity (how well the algorithm scales)

It’s more important, therefore, to ascertain how a given algorithm behaves as the size of the problem
increases

greater interest is how the number of operations performed varies with the size of the problem

Algorithm Complexity

if the problem size were to increase by an order of magnitude, how does that affect
the number of operations required to perform a simple function? Does the number remain the same?
Does it double? Does it increase linearly with the size of the problem? Does it increase exponentially?

By measuring the complexity of an algorithm,
we hope to predict how it will perform: Complexity affects performance, but not vice versa

it is important to remember that complexity doesn’t provide a precise
measure of expected performance, but rather places certain bounds or limits on the achievable performance.

Understanding Big-O Notation

O(1): Pronounced “order 1” and denoting a function that runs in constant time
O(N): Pronounced “order N” and denoting a function that runs in linear time
O(N2): Pronounced “order N squared” and denoting a function that runs in quadratic time
O(log N): Pronounced “order log N” and denoting a function that runs in logarithmic time
O(N log N): Pronounced “order N log N” and denoting a function that runs in time proportional
to the size of the problem and the logarithmic time
O(N!): Pronounced “order N factorial” and denoting a function that runs in factorial time

Constant Time: O(1)

O(1) actually means that an algorithm takes constant time to run; in other words, performance isn’t affected by the size of the problem.

Example: performance is addressing main memory in a computer (array lookup)

It still doesn’t guarantee that the algorithm will be very fast.

Linear Time: O(N)

An algorithm runs in O(N) if the number of operations required to perform a function is directly proportional to the number of items being processed.

Even if you doubled or even tripled the number of registers in operation at any one time, the
processing time is still officially O(N).

Quadratic Time: O(N2)

The number of handshakes required for a group of size N turns out to be (N2 – N) / 2.

big-O notation disregards any constants—in this case, the 2—we’re left with N2 – N.

as N becomes larger, subtracting N from N2 will have less and less of an overall effect, so we can safely ignore the subtraction, leaving us with a complexity of O(N2)

Any algorithm with a complexity of O(N2) is pretty much guaranteed to be useless for solving all but the smallest of problems

Logarithmic Time: O(log N) and O(N log N)

This means that even when the size of the input data set increases by a factor of a million,
the run time will only increase by some factor of log(1,000,000) = 20

Achieving logarithmic running times usually requires your algorithm to somehow discard large portions of the input data set

Eg. Binary search, BST

O(N log N) is still better than O(N2) but not quite as good as O(N).

Factorial Time: O(N!)

Some algorithms can perform even more poorly than O(N2)— compare the O(N2) and O(N!)

Example: Calculating factorial of a number

TDD Structure

Data Structures and Algorithms

Thursday, October 2, 2014

Iteration/Enumerator

Introduction