Duration: ~4h
Deep Learning has been the most significant breakthrough in the past 10 years in the field of pattern recognition and machine learning. It has achieved significant advancements in terms of the effectiveness of prediction models on many research topics and application fields, ranging from computer vision, natural language processing, embodied AI and to more traditional fields of pattern recognition. This paradigm shift has radically changed the research methodology towards a data-oriented approach, in which learning involves all steps of the prediction pipeline from feature extraction to classification. While research efforts have concentrated on the design of effective feature extraction and prediction architectures, computation has moved from CPU-only approaches to the dominant use of GPUs and massively parallel devices, empowered by large-scale and highly dimensional datasets.

In this context, optimization and careful design of neural architectures play an increasingly important role which directly affects the research pace, the effectiveness of state of the art models and their applicability in production scale. Architectural choices, indeed, have an outstanding impact on training and execution times, thus ultimately affecting the progress speed of many research areas related to pattern recognition. Clearly, this is the case of fields that rely on high-dimensional data such as video processing, action recognition, high-resolution image understanding, and has recently become a hot topic for research fields which imply sequential predictions like reinforcement learning, embodied AI and natural language understanding. Overall, the need for effective and efficient solutions is important in most research areas related to pattern recognition and machine learning.

The goal of this tutorial is to present techniques for training deep neural networks on multi-GPU technology to shorten the training time required for data-intensive applications. Working with deep learning tools, frameworks, and workflows to perform neural network training, participants learn concepts for implementing Horovod multi-GPUs to reduce the complexity of writing efficient distributed software. This tutorial targets any research field related to pattern recognition, ranging from computer vision to natural language processing and multimedia, in which data and computationally intensive architectures are needed to solve key research issues.

At the conclusion of the tutorial, participants will have an understanding of:
- Various approaches to multi-GPU training
- Algorithmic and engineering challenges to the large-scale training of a neural network.
- The linear neuron model and the loss function and optimization logic for gradient descent.
- Concepts for transforming single-GPU implementation to Horovod multi-GPU implementation to reduce the complexity of writing efficient distributed software.
- Techniques that improve overall performance of the entire pipeline