Keras - Overview
Keras is a machine learning framework written in Python. It wrapper for low-level machine learning libraries like Theano or TensorFlow.
Standard network model life cycle in Keras:
- Define the network architecture
- Compile the network
- Fit the network
- Evaluate the network
- Use the network to make predictions
1. Define the network
Keras neural networks are defined as a sequence of layers. Neural network model are created with Sequential
class.
from keras.models import Sequential
model = Sequential()
# a valid model, but cannot be used without any layers in it
You use add
to add layers to the network.
from keras.layers import Dense
from keras.models import Sequential
# 2 neurons in the input layer from `input_dim=2`
# 5 neurons in the first layer from `Dense(5)`
# 1 neuron in the output layer from `Dense(1)` as it is the last layer
model1 = Sequential()
model1.add(Dense(5, input_dim=2))
model1.add(Dense(1))
assert len(model1.layers) == 2 # input layer is not counted
# you can also give the layers as a list
layers = [Dense(5, input_dim=2), Dense(1)]
model2 = Sequential(layers)
assert len(model2.layers) == 2
The first layer must define the number of inputs to expect. Different network types have various ways to define input shape but input_shape
parameter always works.
from keras.layers import Dense
from keras.models import Sequential
model1 = Sequential()
model1.add(Dense(32, input_shape=(784,)))
assert model1.input_shape == (None, 784)
# None means "any positive integer"
# thus this network takes batches of any size but each input must have 784 values
model2 = Sequential()
model2.add(Dense(32, input_dim=784)) # shorthand that works with some layers
assert model2.input_shape == (None, 784)
# sometimes you need a fixed batch size
model3 = Sequential()
model3.add(Dense(32, batch_size=32, input_shape=(6, 8)))
assert model3.input_shape == (32, 6, 8))
Activation functions are changed with Activation
layer or activation
parameter.
import keras.backend as K
from keras.layers import Activation, Dense
from keras.models import Sequential
model1 = Sequential()
model1.add(Dense(64, input_shape=(32,), activation='tanh'))
# you can also add activation as a layer,
# which is applied to the PREVIOUS layer;
# so this is functionally the same as model1 definition
model2 = Sequential()
model2.add(Dense(64, input_shape=(32,)))
model2.add(Activation('tanh'))
# note that model2 has two layers already
assert len(model1.layers) == 1
assert len(model2.layers) == 2
# but the activation functions are the same
assert model1.layers[0].activation == model2.layers[1].activation
# you can also use element-wise functions as activation functions
model3 = Sequential()
model3.add(Dense(64, input_shape=(32,), activation=K.tanh))
All standard activation functions:
softmax = Softmax function
softplus = Softplus function
softsign = Softsign function
linear = Linear "identity" function
tanh = Hyperbolic tangent function
sigmoid = Sigmoid function
hard_sigmoid = Hard sigmoid, faster to compute
elu = Exponential Linear Unit
relu = Rectified Linear Unit
selu = Scaled Exponential Linear Unit, equal to: scale * elu(x, alpha)
where scale and alpha are pre-defined constants.
Use with `lecun_normal` initialization and `AlphaDropout` dropout.
Good starting activation functions used in the output layer of different predictive modeling problems:
- Regression: Linear activation function
linear
and the number of neurons matching the number of wanted outputs. - Binary Classification (2 classes): Logistic activation function
sigmoid
and one neuron. - Multiclass Classification (>2 class): SoftMax activation function
softmax
and one output neuron per class value, assuming a one-hot encoded output pattern.
2. Compile the network
Defined network must be compiled for it to run. Compilation turn high-level layer definitions to matrix operations.
from keras import losses, optimizers
from keras.layers import Dense
from keras.models import Sequential
model = Sequential()
model.add(Dense(32, input_shape=(784,)))
sgd = optimizers.SGD() # uses default values
model.compile(optimizer=sgd, loss=losses.mean_squared_error)
The most common optimization algorithms:
- Stochastic Gradient Descent
sgd
- ADAM
adam
- RMSprop
rmsprop
A loss function returns the value optimizer will try to reduce. Loss function is also called cost function. The loss function return value is the inverted "score" of the model; the lower the loss is, the better the model.
Good starting loss functions for different predictive modeling problems:
- Regression: Mean squared error
mean_squared_error
- Binary Classification: Logarithmic loss aka. cross entropy,
binary_crossentropy
. - Multiclass Classification: Multiclass logarithmic loss
categorical_crossentropy
.
Compiling the network is done with compile
. You can also specify extra metrics to be gathered in the following fitting phase.
from keras import losses, optimizers
from keras.layers import Dense
from keras.models import Sequential
model = Sequential()
model.add(Dense(32, input_shape=(784,)))
sgd = optimizers.SGD(lr=0.02, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(
optimizer=sgd,
loss=losses.mean_squared_error,
metrics=['accuracy']
)
Metric functions are like a loss function, the result are not used while training. The loss function is used to optimize the model, metric functions allow you to judge the performance of your model.
You can also define custom metric functions. The function must take y_true
(true labels) and y_pred
(predicted labels).
import keras.backend as K
from keras import losses, optimizers
from keras.layers import Dense
from keras.models import Sequential
model = Sequential()
model.add(Dense(32, input_shape=(784,)))
def mean_pred(y_true, y_pred):
return K.mean(y_pred)
sgd = optimizers.SGD(lr=0.02, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(
optimizer=sgd,
loss=losses.mean_squared_error,
metrics=['accuracy', mean_pred]
)
3. Fit the network
Once the network is compiled, it can be fit. Which means training the network weights on a training dataset.
- training data to be specified, both a matrix of input patterns
X
- an array of matching output patterns
y
Epochs mean how many repeats of the training dataset do you use in fitting/training. Each epoch can be partitioned into groups of input-output pairs called batches. It is efficiency optimization, ensuring that not too many input patterns are loaded into memory at a time.
history = model.fit(x_train, y_train, batch_size=10, epochs=100)
# history contains summary of the training, loss and metrics recorded each epoch
4. Evaluate the network
Once the network is trained, it can be evaluated. We should evaluate the performance of the network on a separate dataset.
# model.evaluate returns loss + all metrics defined in model.compile
loss, accuracy = model.evaluate(x_test, y_test)
5. Use the network to make predictions
Finally, once we are satisfied with the performance, we can use it to make predictions on new data.
predictions = model.predict(my_x)