Sparse Arrays

This technical paper was written by David Clarke of Dragon Thoughts Ltd and is Copyright David Clarke © 2002
This document describes sparse arrays and some related data structures.
The purpose of this document is to provide some general information about Sparse Arrays.
It does not seek to be a complete reference for data structures and discusses some issues with reference to a Java 2 based environment.

Target Audience

The intended audience is students or developers who need to have an array like structure where few elements contain actual values.

Sparse Array

A Sparse Array is a data structure where individual elements can be quickly accessed by index, like an array. What distinguishes a Sparse Array from any other array, is that many of the items do not contain values.

Arrays

Arrays are fundamental data structures in most programming languages. They provide fast access to individual items by storing values or references to values in a contiguous set of memory locations.

Compactness

Memory must be allocated in advance for all values or references. Sometimes, this is not practical when the quantity of elements cannot be easily guessed in advance, or leads to excessive memory overheads when a small number of elements as usually used, but there is a possibility of a large number.

Speed of Access

Because the memory is allocated contiguously with each element occupying a constant sized piece of memory, the position of each element can be calculated with simple arithmetic, making access to elements extremely fast.

Comparison to a List

Lists represent a data structure as a set of nodes linked to each other in a linear or circular manner. Finding an item, requires that the list is searched in a linear fashion. This means that as the number of elements becomes large, the average time to find an item goes up in proportion.

Summary

Compactness Speed of Access Handling of Empty Elements
Array Poor Unless the size is know in advance and the majority of elements are filled, when it becomes very good. Very fast Constant Space must be allocated with an indication of "emptiness".
Linked List Good There is some overhead for additional references to maintain the list, but unused elements do not need to be stored if all items are consecutive. Slow  Average access proportional to number of elements (including empty ones) Elements must be allocate to represent empty items, and addition or removal of items in the list affects the index of all following elements.
Tree Reasonable There is some overhead for additional references to maintain the list, and in Java, there is a need to build an entire object for each key of each element. Fast Average access proportional to Log2(number of elements) Empty elements do not need to be represented.
Sparse Array Very good There is some overhead for the lookup table of valid references. Fast Average access proportional to Log2(number of elements) Empty elements do not need to be represented.

.

This technical paper was written by David Clarke of Dragon Thoughts Ltd and is Copyright David Clarke © 2002