Chapter 4: Data Processing

4.1 Introduction

Modern computers can process vast amounts of data representing many aspects of the world. From these big data sets, we can learn about human behavior in unprecedented ways: how language is used, what photos are taken, what topics are discussed, and how people engage with their surroundings. To process large data sets efficiently, programs are organized into pipelines of manipulations on sequential streams of data. In this chapter, we consider a suite of techniques process and manipulate sequential data streams.

In Chapter 2, we introduced a sequence interface, implemented in Python by built-in data types such as tuple and list. In this chapter, we extend the concept of sequential data to include collections that have unbounded or even infinite size. Two mathematical examples of infinite sequences are the positive integers and the Fibonacci numbers. Sequential data sets of unbounded length also appear in other computational domains. For instance, the sequence of all emails ever written grows longer with every second and therefore does not have a fixed length. Likewise, the sequence of telephone calls sent through a cell tower, the sequence of mouse movements made by a computer user, and the sequence of acceleration measurements from sensors on an aircraft all extend without bound as the world evolves.

Continue: 4.2 Implicit Sequences

Chapter 4Hide contents

4.1 Introduction

4.2 Implicit Sequences

4.3 Declarative Programming

4.4 Unification

4.5 Distributed Computing

4.6 Distributed Data Processing

4.7 Parallel Computing

Chapter 4: Data Processing

4.1 Introduction