TensorFlow Serving

Updated at 2018-07-19 23:15

TensorFlow Serving is a serving system for machine learning models.

Servables are the underlying objects that clients use for lookups and inference. A servable can be anything from a lookup table to multiple inference models model; typically a TensorFlow SavedModelBundle.

Servables have one or more versions. Clients may request either the latest version or a specific version.

Servable stream is the sequence of versions of a servable.

TensorFlow serving model is one or more servables. Model is called a composite model if it represents more than one servable, for example two trained models and a embedding table.

Sources are plugin modules that find and provide servables. Source supplies one loader instance for each servable version it makes available. Source can maintain a state across multiple servables or versions.

Aspired versions are the servable versions that should be loaded. Sources provide the list of aspired versions to the manager for loading; if previously loaded version is not on the list, it will be unloaded.

Loaders standardize the API for loading and unloading a servable.

Managers handle the lifecycle of servables. This mainly consists of loading, serving and unloading servables.

Servable lifecycle in a nutshell:

  1. Sources create loaders for servable versions.
  2. Sources send loaders as aspired versions to the manager.
  3. The manager loads and serves them to the clients.

Batching widget can be used to group requests. Significantly reduces the cost of performing inference on GPUs.

You host your models with model server. First you have to train and export a TensorFlow model snapshot with SavedModelBuilder.

tensorflow_model_server --port=9000 --model_name=mnist --model_base_path=/tmp/mnist_model/

Clients use a Python library or REST API to get results.


POST http://host:port/<URI>:<VERB>

URI: /v1/models/${MODEL_NAME}[/versions/${MODEL_VERSION}]
VERB: classify|regress|predict

Python Library:

pip install tensorflow-serving-api
from tensorflow_serving.apis import predict_pb2
from tensorflow_serving.apis import prediction_service_pb2

stub = prediction_service_pb2.beta_create_PredictionService_stub(

request = predict_pb2.PredictRequest() = 'mnist'
request.model_spec.signature_name = 'predict_images'
result_future = stub.Predict.future(request, 5.0)
	_create_rpc_callback(label[0], result_counter)