DEEP LEARNING FOR MULTI-GPUS

Deep Learning has been the most significant breakthrough in the past 10 years in the field of pattern recognition and machine learning. It has achieved significant advancements in terms of the effectiveness of prediction models on many research topics and application fields, ranging from computer vision, natural language processing, embodied AI and to more traditional fields of pattern recognition. This paradigm shift has radically changed the research methodology towards a data-oriented approach, in which learning involves all steps of the prediction pipeline from feature extraction to classification. While research efforts have concentrated on the design of effective feature extraction and prediction architectures, computation has moved from CPU-only approaches to the dominant use of GPUs and massively parallel devices, empowered by large-scale and highly dimensional datasets.

In this context, optimization and careful design of neural architectures play an increasingly important role which directly affects the research pace, the effectiveness of state of the art models and their applicability in production scale. Architectural choices, indeed, have an outstanding impact on training and execution times, thus ultimately affecting the progress speed of many research areas related to pattern recognition. Clearly, this is the case of fields that rely on high-dimensional data such as video processing, action recognition, high-resolution image understanding, and has recently become a hot topic for research fields which imply sequential predictions like reinforcement learning, embodied AI and natural language understanding. Overall, the need Deep Learning has been the most significant breakthrough in the past 10 years in the field of pattern recognition and machine learning. It has achieved significant advancements in terms of the effectiveness of prediction models on many research topics and application fields, ranging from computer vision, natural language processing, embodied AI and to more traditional fields of pattern recognition. This paradigm shift has radically changed the research methodology towards a data-oriented approach, in which learning involves all steps of the prediction pipeline from feature extraction to classification. While research efforts have concentrated on the design of effective feature extraction and prediction architectures, computation has moved from CPU-only approaches to the dominant use of GPUs and massively parallel devices, empowered by large-scale and highly dimensional datasets.

In this context, optimization and careful design of neural architectures play an increasingly important role which directly affects the research pace, the effectiveness of state of the art models and their applicability in production scale. Architectural choices, indeed, have an outstanding impact on training and execution times, thus ultimately affecting the progress speed of many research areas related to pattern recognition. Clearly, this is the case of fields that rely on high-dimensional data such as video processing, action recognition, high-resolution image understanding, and has recently become a hot topic for research fields which imply sequential predictions like reinforcement learning, embodied AI and natural language understanding. Overall, the need for effective and efficient solutions is important in most research areas related to pattern recognition and machine learning.

The goal of this tutorial is to present techniques for training deep neural networks on multi-GPU technology to shorten the training time required for data-intensive applications. Working with deep learning tools, frameworks, and workflows to perform neural network training, participants learn concepts for implementing Horovod multi-GPUs to reduce the complexity of writing efficient distributed software. This tutorial targets any research field related to pattern recognition, ranging from computer vision to natural language processing and multimedia, in which data and computationally intensive architectures are needed to solve key research issues.

Tutorial organisers

“NVIDIA AI Technology Centre IT”

Giuseppe Fiameni 

Giuseppe Fiameni is a Solution Architect at NVIDIA where he oversees the NVIDIA AI Technology Centre in Italy, a collaboration among NVIDIA, CINI and CINECA to accelerate academic research in the field of Artificial Intelligence through collaboration projects. He has been working as HPC specialist at CINECA, the largest HPC facility in Italy, for more than 14 years providing support for large-scale data analytics workloads.Scientific Area: Data Science, Deep Learning, Artificial Intelligence.

Frédéric Parienté

Frédéric Parienté is a senior manager in the Solutions Architecture and Engineering group at NVIDIA and the deputy director for the NVIDIA AI Technology Center (NVAITC) in EMEA. Frédéric first joined NVIDIA as a market development manager for Accelerated Computing in 2015. Previously, he spent 18 years at Sun Microsystems–Oracle as a performance software engineer and was the regional Director of ISV Engineering when he left Oracle in 2015. Frédéric graduated in General Engineering from ENSTA ParisTech, Mechanical Engineering from University of Illinois and Finance from Université Paris Dauphine.

Duration: ~4h
At the conclusion of the tutorial, participants will have an understanding of:

– Various approaches to multi-GPU training

– Algorithmic and engineering challenges to the large-scale training of a neural network.

– The linear neuron model and the loss function and optimization logic for gradient descent.

– Concepts for transforming single-GPU implementation to Horovod multi-GPU implementation to reduce the complexity of writing efficient distributed software.

– Techniques that improve overall performance of the entire pipeline