Keras -
Overview

Updated at 2017-08-04 21:44

Keras is a machine learning framework written in Python. It wrapper for low-level machine learning libraries like Theano or TensorFlow.

Standard network model life cycle in Keras:

Define the network architecture
Compile the network
Fit the network
Evaluate the network
Use the network to make predictions

1. Define the network

Keras neural networks are defined as a sequence of layers. Neural network model are created with Sequential class.

from keras.models import Sequential

model = Sequential()
# a valid model, but cannot be used without any layers in it

You use add to add layers to the network.

from keras.layers import Dense
from keras.models import Sequential

# 2 neurons in the input layer from `input_dim=2`
# 5 neurons in the first layer from `Dense(5)`
# 1 neuron in the output layer from `Dense(1)` as it is the last layer
model1 = Sequential()
model1.add(Dense(5, input_dim=2))
model1.add(Dense(1))
assert len(model1.layers) == 2 # input layer is not counted

# you can also give the layers as a list
layers = [Dense(5, input_dim=2), Dense(1)]
model2 = Sequential(layers)
assert len(model2.layers) == 2

The first layer must define the number of inputs to expect. Different network types have various ways to define input shape but input_shape parameter always works.

from keras.layers import Dense
from keras.models import Sequential

model1 = Sequential()
model1.add(Dense(32, input_shape=(784,)))
assert model1.input_shape == (None, 784)
# None means "any positive integer"
# thus this network takes batches of any size but each input must have 784 values

model2 = Sequential()
model2.add(Dense(32, input_dim=784))  # shorthand that works with some layers
assert model2.input_shape == (None, 784)

# sometimes you need a fixed batch size
model3 = Sequential()
model3.add(Dense(32, batch_size=32, input_shape=(6, 8)))
assert model3.input_shape == (32, 6, 8))

Activation functions are changed with Activation layer or activation parameter.

import keras.backend as K
from keras.layers import Activation, Dense
from keras.models import Sequential

model1 = Sequential()
model1.add(Dense(64, input_shape=(32,), activation='tanh'))

# you can also add activation as a layer,
# which is applied to the PREVIOUS layer;
# so this is functionally the same as model1 definition
model2 = Sequential()
model2.add(Dense(64, input_shape=(32,)))
model2.add(Activation('tanh'))

# note that model2 has two layers already
assert len(model1.layers) == 1
assert len(model2.layers) == 2

# but the activation functions are the same
assert model1.layers[0].activation == model2.layers[1].activation

# you can also use element-wise functions as activation functions
model3 = Sequential()
model3.add(Dense(64, input_shape=(32,), activation=K.tanh))

All standard activation functions:

softmax      = Softmax function
softplus     = Softplus function
softsign     = Softsign function
linear       = Linear "identity" function
tanh         = Hyperbolic tangent function
sigmoid      = Sigmoid function
hard_sigmoid = Hard sigmoid, faster to compute
elu          = Exponential Linear Unit
relu         = Rectified Linear Unit
selu         = Scaled Exponential Linear Unit, equal to: scale * elu(x, alpha)
               where scale and alpha are pre-defined constants.
               Use with `lecun_normal` initialization and `AlphaDropout` dropout.

Good starting activation functions used in the output layer of different predictive modeling problems:

Regression: Linear activation function linear and the number of neurons matching the number of wanted outputs.
Binary Classification (2 classes): Logistic activation function sigmoid and one neuron.
Multiclass Classification (>2 class): SoftMax activation function softmax and one output neuron per class value, assuming a one-hot encoded output pattern.

2. Compile the network

Defined network must be compiled for it to run. Compilation turn high-level layer definitions to matrix operations.

from keras import losses, optimizers
from keras.layers import Dense
from keras.models import Sequential

model = Sequential()
model.add(Dense(32, input_shape=(784,)))

sgd = optimizers.SGD()  # uses default values
model.compile(optimizer=sgd, loss=losses.mean_squared_error)

The most common optimization algorithms:

Stochastic Gradient Descent sgd
ADAM adam
RMSprop rmsprop

A loss function returns the value optimizer will try to reduce. Loss function is also called cost function. The loss function return value is the inverted "score" of the model; the lower the loss is, the better the model.

Good starting loss functions for different predictive modeling problems:

Regression: Mean squared error mean_squared_error
Binary Classification: Logarithmic loss aka. cross entropy, binary_crossentropy.
Multiclass Classification: Multiclass logarithmic loss categorical_crossentropy.

Compiling the network is done with compile. You can also specify extra metrics to be gathered in the following fitting phase.

from keras import losses, optimizers
from keras.layers import Dense
from keras.models import Sequential

model = Sequential()
model.add(Dense(32, input_shape=(784,)))

sgd = optimizers.SGD(lr=0.02, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(
	optimizer=sgd,
	loss=losses.mean_squared_error,
	metrics=['accuracy']
)

Metric functions are like a loss function, the result are not used while training. The loss function is used to optimize the model, metric functions allow you to judge the performance of your model.

You can also define custom metric functions. The function must take y_true (true labels) and y_pred (predicted labels).

import keras.backend as K
from keras import losses, optimizers
from keras.layers import Dense
from keras.models import Sequential

model = Sequential()
model.add(Dense(32, input_shape=(784,)))

def mean_pred(y_true, y_pred):
	return K.mean(y_pred)

sgd = optimizers.SGD(lr=0.02, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(
	optimizer=sgd,
	loss=losses.mean_squared_error,
	metrics=['accuracy', mean_pred]
)

3. Fit the network

Once the network is compiled, it can be fit. Which means training the network weights on a training dataset.

training data to be specified, both a matrix of input patterns X
an array of matching output patterns y

Epochs mean how many repeats of the training dataset do you use in fitting/training. Each epoch can be partitioned into groups of input-output pairs called batches. It is efficiency optimization, ensuring that not too many input patterns are loaded into memory at a time.

history = model.fit(x_train, y_train, batch_size=10, epochs=100)
# history contains summary of the training, loss and metrics recorded each epoch

4. Evaluate the network

Once the network is trained, it can be evaluated. We should evaluate the performance of the network on a separate dataset.

# model.evaluate returns loss + all metrics defined in model.compile
loss, accuracy = model.evaluate(x_test, y_test)

5. Use the network to make predictions

Finally, once we are satisfied with the performance, we can use it to make predictions on new data.

predictions = model.predict(my_x)