Neural Networks - Capsule
Capsule neural networks are a neural networks intended to make machines better able to understand the world through images or video using 3D positioning.
Capsule is a group of neurons. Each neuron output represents different properties of the same entity.
Basic capsules have two parts:
- probability that an entity of a specific type is present
- "pose matrix" to represent position, orientation, scale, deformation, velocity and color.
Vision systems should utilize geometry. Capsules are designed to track different parts of an object, such as a nose and ears, and their relative positions in space.
Viewpoints are hard for convolutional neural networks. Viewpoint changes have complicated effects on pixels but simple, linear effects on the pose matrix.
A capsule in one layer votes for the pose matrix of the capsules in the layer above. A familiar object can be detected by looking for agreement between votes for its pose matrix. These votes come from parts that have already been detected. Each of these votes are weighted by an assignment coefficient.
This works because high-dimensional coincidental clusters do not happen by chance.
Human sight has concept of coordinate frames. If you see an object from totally different angle but it retains hard pixels, you won't be able to recognize the object.
Unfamiliar viewpoints will have much better prediction accuracy when using capsules. Familiar viewpoints will have the same prediction accuracy.