Could a computer surprise us? Rather than programmers crafting data-processing rules by hand, could a computer automatically learn these rules by looking at data?

To control something, first you need to be able to observe it.

**Lose Function**
The loss function takes the predictions of the network and the true target (what you wanted the network to output) and computes a distance score, capturing how well the network has done

**Backpropagation Algorithm** ****
This adjustment is the job of the *optimizer*, which implements what’s called the *Backpropagation* algorithm: the central algorithm in deep learning.

**Training Loop**
This is the training loop, which, repeated a sufficient number of times (typically tens of iterations over thousands of examples), yields weight values that minimize the loss function. A network with a minimal loss is one for which the outputs are as close as they can be to the targets: a trained network. Once again, it’s a simple mechanism that, once scaled, ends up looking like magic.
**Layer**
The core building block of neural networks is the layer, a data-processing module that you can think of as a filter for data. Some data goes in, and it comes out in a more useful form.

**Representations**
Specifically, layers extract representations out of the data fed into them—hopefully, representations that are more meaningful for the problem at hand.

**Data distillation**
Most of deep learning consists of chaining together simple layers that will implement a form of progressive data distillation. A deep-learning model is like a sieve for data processing, made of a succession of increasingly refined data filters—the layers.

**The Compilation Step**

*A loss function*— How the network will be able to measure its performance on the training data, and thus how it will be able to steer itself in the right direction.
*An optimizer*— The mechanism through which the network will update itself based on the data it sees and its loss function.
*Metrics to monitor during training and testing*— Here, we’ll only care about accuracy (the fraction of the images that were correctly classified).

**Overfitting**
The test-set accuracy turns out to be 97.8%—that’s quite a bit lower than the training set accuracy. This gap between training accuracy and test accuracy is an example of overfitting: the fact that machine-learning models tend to perform worse on new data than on their training data.

**Tensor**
Numpy arrays, also called tensors. At its core, a tensor is a container for data—almost always numerical data. So, it’s a container for numbers. You may be already familiar with matrices, which are 2D tensors: tensors are a generalization of matrices to an arbitrary number of dimensions (note that in the context of tensors, a dimension is often called an axis).

**Scalars (0D tensors)**
A tensor that contains only one number is called a scalar (or scalar tensor, or 0-dimensional tensor, or 0D tensor). In Numpy, a float32 or float64 number is a scalar tensor (or scalar array). You can display the number of axes of a Numpy tensor via the ndim attribute; a scalar tensor has 0 axes (ndim == 0). The number of axes of a tensor is also called its rank.

**Vectors (1D tensors)**
An array of numbers is called a vector, or 1D tensor. A 1D tensor is said to have exactly one axis.

**Dimensionality**
Dimensionality can denote either the number of entries along a specific axis (as in the case of our 5D vector) or the number of axes in a tensor (such as a 5D tensor), which can be confusing at times. In the latter case, it’s technically more correct to talk about a tensor of rank 5 (the rank of a tensor being the number of axes), but the ambiguous notation 5D tensor is common regardless.

**Matrices (2D tensors)**
An array of vectors is a matrix, or 2D tensor. A matrix has two axes (often referred to rows and columns). You can visually interpret a matrix as a rectangular grid of numbers.

**3D tensors**
If you pack such matrices in a new array, you obtain a 3D tensor, which you can visually interpret as a cube of numbers.

**Tenors Attributes**

**Number of axes (rank)**— For instance, a 3D tensor has three axes, and a matrix has two axes. This is also called the tensor’s ndim in Python libraries such as Numpy.
**Shape**— This is a tuple of integers that describes how many dimensions the tensor has along each axis. For instance, the previous matrix example has shape (3, 5), and the 3D tensor example has shape (3, 3, 5). A vector has a shape with a single element, such as (5,), whereas a scalar has an empty shape, ().
*Data type* (usually called dtype in Python libraries)—This is the type of the data contained in the tensor; for instance, a tensor’s type could be float32, uint8, float64, and so on. On rare occasions, you may see a char tensor. Note that string tensors don’t exist in Numpy (or in most other libraries), because tensors live in preallocated, contiguous memory segments: and strings, being variable length, would preclude the use of this implementation.

**Types of Data**

*Vector Data – 2D tensors of shape (samples, features)*
Bach single data point can be encoded as a vector, and thus a batch of data will be encoded as a 2D tensor (that is, an array of vectors), where the first axis is the samples axis and the second axis is the features axis.

*Timeseries data or sequence data— 3D tensors of shape (samples, timesteps, features)*
Whenever time matters in your data (or the notion of sequence order), it makes sense to store it in a 3D tensor with an explicit time axis. Each sample can be encoded as a sequence of vectors (a 2D tensor), and thus a batch of data will be encoded as a 3D tensor.

The time axis is always the second axis (axis of index 1), by convention.

*Images— 4D tensors of shape (samples, height, width, channels) or (samples, channels, height, width)*
Images typically have three dimensions: height, width, and color depth. Although grayscale images (like our MNIST digits) have only a single color channel and could thus be stored in 2D tensors, by convention image tensors are always 3D, with a one-dimensional color channel for grayscale images.

There are two conventions for shapes of images tensors: the channels-last convention (used by TensorFlow) and the channels-first convention (used by Theano).

*5D tensors of shape (samples, frames, height, width, channels) or (samples, frames, channels, height, width)*

Video data is one of the few types of real-world data for which you’ll need 5D tensors. A video can be understood as a sequence of frames, each frame being a color image. Because each frame can be stored in a 3D tensor (height, width, color_depth), a sequence of frames can be stored in a 4D tensor (frames, height, width, color_depth), and thus a batch of different videos can be stored in a 5D tensor of shape (-samples, frames, height, width, color_depth).