# Keras - *Overview*

*Overview*

**Keras is a machine learning framework written in Python.** It wrapper for low-level machine learning libraries like Theano or TensorFlow.

**Standard network model life cycle in Keras:**

- Define the network architecture
- Compile the network
- Fit the network
- Evaluate the network
- Use the network to make predictions

# 1. Define the network

**Keras neural networks are defined as a sequence of layers.** Neural network model are created with `Sequential`

class.

```
from keras.models import Sequential
model = Sequential()
# a valid model, but cannot be used without any layers in it
```

**You use add to add layers to the network.**

```
from keras.layers import Dense
from keras.models import Sequential
# 2 neurons in the input layer from `input_dim=2`
# 5 neurons in the first layer from `Dense(5)`
# 1 neuron in the output layer from `Dense(1)` as it is the last layer
model1 = Sequential()
model1.add(Dense(5, input_dim=2))
model1.add(Dense(1))
assert len(model1.layers) == 2 # input layer is not counted
# you can also give the layers as a list
layers = [Dense(5, input_dim=2), Dense(1)]
model2 = Sequential(layers)
assert len(model2.layers) == 2
```

**The first layer must define the number of inputs to expect.** Different network types have various ways to define input shape but `input_shape`

parameter always works.

```
from keras.layers import Dense
from keras.models import Sequential
model1 = Sequential()
model1.add(Dense(32, input_shape=(784,)))
assert model1.input_shape == (None, 784)
# None means "any positive integer"
# thus this network takes batches of any size but each input must have 784 values
model2 = Sequential()
model2.add(Dense(32, input_dim=784)) # shorthand that works with some layers
assert model2.input_shape == (None, 784)
# sometimes you need a fixed batch size
model3 = Sequential()
model3.add(Dense(32, batch_size=32, input_shape=(6, 8)))
assert model3.input_shape == (32, 6, 8))
```

**Activation functions are changed with Activation layer or activation parameter.**

```
import keras.backend as K
from keras.layers import Activation, Dense
from keras.models import Sequential
model1 = Sequential()
model1.add(Dense(64, input_shape=(32,), activation='tanh'))
# you can also add activation as a layer,
# which is applied to the PREVIOUS layer;
# so this is functionally the same as model1 definition
model2 = Sequential()
model2.add(Dense(64, input_shape=(32,)))
model2.add(Activation('tanh'))
# note that model2 has two layers already
assert len(model1.layers) == 1
assert len(model2.layers) == 2
# but the activation functions are the same
assert model1.layers[0].activation == model2.layers[1].activation
# you can also use element-wise functions as activation functions
model3 = Sequential()
model3.add(Dense(64, input_shape=(32,), activation=K.tanh))
```

**All standard activation functions:**

```
softmax = Softmax function
softplus = Softplus function
softsign = Softsign function
linear = Linear "identity" function
tanh = Hyperbolic tangent function
sigmoid = Sigmoid function
hard_sigmoid = Hard sigmoid, faster to compute
elu = Exponential Linear Unit
relu = Rectified Linear Unit
selu = Scaled Exponential Linear Unit, equal to: scale * elu(x, alpha)
where scale and alpha are pre-defined constants.
Use with `lecun_normal` initialization and `AlphaDropout` dropout.
```

**Good starting activation functions** used in the output layer of different predictive modeling problems:

- Regression: Linear activation function
`linear`

and the number of neurons matching the number of wanted outputs. - Binary Classification (2 classes): Logistic activation function
`sigmoid`

and one neuron. - Multiclass Classification (>2 class): SoftMax activation function
`softmax`

and one output neuron per class value, assuming a one-hot encoded output pattern.

# 2. Compile the network

**Defined network must be compiled for it to run.** Compilation turn high-level layer definitions to matrix operations.

```
from keras import losses, optimizers
from keras.layers import Dense
from keras.models import Sequential
model = Sequential()
model.add(Dense(32, input_shape=(784,)))
sgd = optimizers.SGD() # uses default values
model.compile(optimizer=sgd, loss=losses.mean_squared_error)
```

**The most common optimization algorithms:**

- Stochastic Gradient Descent
`sgd`

- ADAM
`adam`

- RMSprop
`rmsprop`

**A loss function returns the value optimizer will try to reduce.** Loss function is also called cost function. The loss function return value is the inverted "score" of the model; the lower the loss is, the better the model.

**Good starting loss functions** for different predictive modeling problems:

- Regression: Mean squared error
`mean_squared_error`

- Binary Classification: Logarithmic loss aka. cross entropy,
`binary_crossentropy`

. - Multiclass Classification: Multiclass logarithmic loss
`categorical_crossentropy`

.

**Compiling the network is done with compile.** You can also specify extra metrics to be gathered in the following fitting phase.

```
from keras import losses, optimizers
from keras.layers import Dense
from keras.models import Sequential
model = Sequential()
model.add(Dense(32, input_shape=(784,)))
sgd = optimizers.SGD(lr=0.02, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(
optimizer=sgd,
loss=losses.mean_squared_error,
metrics=['accuracy']
)
```

**Metric functions are like a loss function, the result are not used while training.** The loss function is used to optimize the model, metric functions allow you to judge the performance of your model.

**You can also define custom metric functions.** The function must take `y_true`

(true labels) and `y_pred`

(predicted labels).

```
import keras.backend as K
from keras import losses, optimizers
from keras.layers import Dense
from keras.models import Sequential
model = Sequential()
model.add(Dense(32, input_shape=(784,)))
def mean_pred(y_true, y_pred):
return K.mean(y_pred)
sgd = optimizers.SGD(lr=0.02, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(
optimizer=sgd,
loss=losses.mean_squared_error,
metrics=['accuracy', mean_pred]
)
```

# 3. Fit the network

**Once the network is compiled, it can be fit.** Which means training the network weights on a training dataset.

- training data to be specified, both a matrix of input patterns
`X`

- an array of matching output patterns
`y`

**Epochs** mean how many repeats of the training dataset do you use in fitting/training. Each epoch can be partitioned into groups of input-output pairs called **batches**. It is efficiency optimization, ensuring that not too many input patterns are loaded into memory at a time.

```
history = model.fit(x_train, y_train, batch_size=10, epochs=100)
# history contains summary of the training, loss and metrics recorded each epoch
```

# 4. Evaluate the network

**Once the network is trained, it can be evaluated.** We should evaluate the performance of the network on a separate dataset.

```
# model.evaluate returns loss + all metrics defined in model.compile
loss, accuracy = model.evaluate(x_test, y_test)
```

# 5. Use the network to make predictions

Finally, once we are satisfied with the performance, we can use it to make predictions on new data.

```
predictions = model.predict(my_x)
```