Horovod is a plugin to TensorFlow/Keras.
- Uses TensorFlow custom operation mechanism.
- Uses Message Passing Interface (MPI) for worker discovery and work coordination.
- Uses NVIDIA NCCL for the actual reduction. NVIDIA all-reduce library to do collective communication, optimized for GPUs.
Horovod's power comes from ring-allreduce. Ring-allreduce is a high-performance computing strategy and last year Baidu's Silicon Valley AI Lab demonstrated its benefits with machine learning. Horovod uses NVIDIA Collective Communications Library (NCCL) for the ring-allreduce which is more optimized.
Each node: * Receives from exactly one node. * Sends to exactly one node.
Message Passing Interface (MPI) is a communication protocol for programming parallel computers. MPI is the main approach in high-performance computing today. MPI offers synchronization and communication between a set of processes in a language-independent way. Usually you start as many processes as you have CPUs/GPUs.
Maximize number of GPUs per machine. Network traffic is frequently the bottleneck for Horovod.