Preface
1 Introduction to R
2 Linear Algebra
2.1 Linear Algebra with R
2.1.1 Introduction
2.1.2 Matrix Notation
3 Introduction to Machine Learning and Deep Learning
3.1 Training, Validation and Test Data3.2 Bias and Variance
3.3 Underfitting and Overfitting
3.3.1 Bayes Error
3.4 Maximum Likelihood Estimation
3.5 Quantifying Loss3.5.1 The Cross-Entropy Loss
3.5.2 Negative Log-Likelihood
3.5.3 Entropy
3.5.4 Cross-Entropy
3.5.5 Kullback-Leibler Divergence
3.5.6 Summarizing the Measurement of Loss
4 Introduction to Neural Networks
4.1 Types of Neural Network Architectures
4.1.1 Feedforward Neural Networks (FFNNs)
4.1.2 Convolutional Neural Networks (Convnets)4.1.3 Recurrent Neural Networks (RNNs)
4.2 Forward Propagation
4.2.1 Notations
4.2.2 Input Matrix
4.2.3 Bias matrix4.2.4 Weight matrix for Layer-1
4.2.5 Activation function at Layer-1
4.2.6 Weights matrix of Layer-2
4.2.7 Activation function at Layer-2
4.3 Activation Functions4.3.1 Sigmoid
4.3.2 Hyperbolic tangent (tanh)
4.3.3 Rectified Linear Unit (ReLU)
4.3.4 leakyReLU
4.3.5 Softmax
4.4 Derivatives of Activation Functions
4.4.1 Derivative of the Sigmoid
4.4.2 Derivative of the tanh
4.4.3 Derivative of the ReLU
CONTENTS
4.4.4 Derivative of the lReLU
4.4.5 Derivative of the Softmax
4.5 Loss Functions
4.6 Derivative of the Cost Function
4.6.1 Derivative of Cross Entropy Loss with Sigmoid
4.6.2 Derivative of Cross Entropy Loss with Softmax
4.7 Back Propagation4.7.1 Backpropagate to the output layer
4.7.2 Backpropagate to the second hidden layer
4.7.3 Backpropagate to the _rst hidden layer
4.7.4 Vectorization of backprop equations
4.8 Writing a Simple Neural Network Application
4.8.1 Image Classi_cation using Sigmoid Activation Neural Network
4.8.2 Importance of Normalization5 Deep Neural Networks
5.1 Writing a Deep Neural Network (DNN) algorithm
5.2 Implementing a DNN using Keras
6 Regularization and Hyperparameter Tuning
6.1 Initialization
6.1.1 Zero initialization
6.1.2 Random initialization
6.1.3 Xavier initialization
6.1.4 He initialization
6.2 Gradient Descent
6.2.1 Gradient Descent or Batch Gradient Descent
6.2.2 Stochastic Gradient Descent6.2.3 Mini Batch Gradient Descent
6.3 Dealing with NaNs
6.3.1 Hyperparameters and Weight Initialization
6.3.2 Normalization
6.3.3 Using di_erent Activation functions6.3.4 Use of NanGuardMode, DebugMode, or MonitorMode
6.3.5 Numerical Stability
6.3.6 Algorithm Related
6.3.7 NaN Introduced by AllocEmpty
6.4 Optimization Algorithms
6.4.1 Simple Update
6.4.2 Momentum based Optimization Update
6.4.3 Nesterov Momentum Optimization Update
6.4.4 Adagrad (Adaptive Gradient Algorithm) Optimization Update
6.4.5 RMSProp (Root Mean Square Propagation) with Momentum Optimization Update
6.4.6 Adam Optimization (Adaptive Moment Estimation) with Momentum Update
6.4.7 Vanishing Gradient and Numerical stability6.5 Gradient Checking
6.6 Second order methods
6.7 Per-parameter adaptive learning rate methods
6.8 Annealing the learning rate
6.9 Regularization
6.9.1 Dropout Regularization
6.9.2 `2 Regularization
6.9.3 Combining dropout and `2 regularization?
6.10
Die Inhaltsangabe kann sich auf eine andere Ausgabe dieses Titels beziehen.