Data Structures and Abstract Data Types (ADTs) (and Lists)
Contents
Data Structures
A data structure is a container for data (e.g., numbers, strings, custom objects, etc.) with specific properties and has operations for accessing the data. For example, an array is a data structure that stores a set amount of data sequentially in memory and allows us to modify that data by indexing into the array. Data structures are not programming language specific, so we can talk about arrays in C++ as well as in Java, JavaScript, C#, Objective C, etc.
In C++, arrays are the simplest data structure to talk about since they are built in. However, we can roll our own, as well. We've seen an example of one of these already—a smart array, or, as we called it, Array
. We first talked about Array
when we introduced templates. Our Array
data structure stores data of any type using a standard array and provides methods for accessing that data, namely get()
and set()
. It also keeps track of the size.
Each operation can be analyzed with respect to time complexity. For example, the get()
and set()
methods of Array
are both O(1) operations.
An introduction to ADTs
Abstract Data Types, more commonly referred to as ADTs, are a way of talking about classes of data structures that share the same operations, but are implemented differently. That is, they are abstracted from the implementation. The implementation, however, will dictate the complexity of the operation. So we need to consider the specific data structure.
We have not seen ADTs yet in this class. Possibly the easiest to explain is the List ADT, which we describe next.
(Back to top)List (ADT)
The List ADT stores a list of items. That is, every item except for the first one (which we call the head of the list) has a predecessor and every item except for the first (or tail) as a successor. Items are referred to by their position (rather than index). So, position 1 means the first thing (head). In an array, this would be index 0. Typical operations for a List are as follows:
Operation name | Description |
---|---|
isEmpty() | Return whether or not the list is empty |
size() | Return the number of items in the list |
get(pos) | Return the item at position pos |
set(pos, item) | Replace the item at position pos with the given item |
add(pos, item) | Insert a new item at position pos |
append(item) | Insert a new item after the tail of the list |
remove(pos) | Removes the item at position pos |
We could very well implement the List ADT with an array. In our array, the index of an item is its position in the list minus 1. For add and remove, we need to expand our array and shift every item after the position being added or removed. Here's what an implementation of an List for integers using an array might look like in C++ (such an implementation is typically called an ArrayList or Vector, though in many libraries, both ArrayLists and Vectors will be available, with subtle differences; the library documentation should specify what those differences are):
In this implementation (which is not the only way to implement a List with arrays), the get()
and set
methods are very fast—O(1), actually. This is because arrays have random access. However, every time a new item is added, we copy the entire array. This means the add()
and append()
methods are comparatively quite expensive: O(n).
What are our alternatives? There is one classic data structure, which we have not yet learned about: a linked list.
(Back to top)Linked lists (data structure)
A linked list is logically on the same level as an array. That is, it is a data structure, not an ADT. However, it can be used to implement a List ADT. Linked lists consist of nodes, each of which is consists of one or two links to other nodes. The two simplest types of linked lists are singly linked lists and doubly linked lists. We'll describe the differences in a bit. They both include a property head
, which keeps track of the node at the beginning of the list and tail
, which keeps track of the node at the end of the list. Unlike arrays, linked lists are not random access. To reach an item at position 3, we must start at the head and then follow the nodes one at at time until we end up at position 3. This is a O(n) operations, unlike with arrays, for which access is O(1).
Singly linked list
In a singly linked list, each node has two fields: a data
field, which keep track of the item being stored, and a next
field, which points to the node in the following position. Here's what the node might look like in code. We'll call it SinglyLinkedNode
:
Inserting into a linked list is easy if we are inserting at the beginning or end of the list. To add to the beginning of the list, we need only update the head
property so that it points to the new node, and then make the new node point to the old head. To append to the list, we must update the old tail to point to the new node, and update the tail
property. Both of these operations are O(1). To insert into the middle, however, requires that we traverse the nodes until we have the node just before the item to insert (the predecessor) and the node just after (the follower). We can then update the predecessor to point to the new node, and the new node to point to the follower. However, this operation is O(n), as in the worst case, we have to traverse the entire list. Deletion is similar to insertion.
Here's what our List ADT for integers might look like implemented with a singly linked list:
Doubly linked list
Doubly linked lists are like singly linked lists, except that each node has a third field, previous
, which points to the node in the previous position. A doubly linked list has the advantage that we can traverse nodes forwards and backwards. Here's what a DoublyLinkedNode
might look like:
Inserting into a doubly linked list is as easy as singly linked lists for prepending and appending. For inserting in the middle of the list, a double linked list allows us to traverse from either the head or the tail, which can cut our time in half. However, this still results in a complexity of O(n) for arbitrary insertions (and deletions).
Advantages and disadvantages of linked lists and arrays
So why would we use a linked list over an array? While arrays are random access, inserting a new element anywhere in an array requires copying the entire array, which is a O(n) operation. With linked lists, insertion in the middle is is similar, but inserting at the ends is O(1). In addition, we do not need to have the entire size of the list stored in adjacent memory blocks, which is helpful for a very large list. For example, if your computer has room to store 1,000,000 numbers, but not in contagious memory, linked lists are beneficial. The down side is that linked lists store more information per element. Rather than a single integer, for example, an entry in a doubly linked lists also needs the next and previous pointers. That means storing 1,000,000 integers in an array takes up less total memory than storing them in a linked list.
To summarize: arrays offer fast access and less memory overhead, but require contiguous storage and are expensive to expand. Linked lists off fast insertion and deletion at the ends and do not require that data be stored contiguously in memory. However, access and insertion into the middle of the list is expensive.
(Back to top)