Lamda: Language models for dialog applications
We present LaMDA: Language Models for Dialog Applications. LaMDA is a family of
Transformer-based neural language models specialized for dialog, which have up to 137B …
Transformer-based neural language models specialized for dialog, which have up to 137B …
Gpipe: Efficient training of giant neural networks using pipeline parallelism
Scaling up deep neural network capacity has been known as an effective approach to
improving model quality for several different machine learning tasks. In many cases, increasing …
improving model quality for several different machine learning tasks. In many cases, increasing …
Gshard: Scaling giant models with conditional computation and automatic sharding
Neural network scaling has been critical for improving the model quality in many real-world
machine learning applications with vast amounts of training data and compute. Although this …
machine learning applications with vast amounts of training data and compute. Although this …
Mlperf training benchmark
Abstract Machine learning is experiencing an explosion of software and hardware solutions,
and needs industry-standard performance benchmarks to drive design and enable …
and needs industry-standard performance benchmarks to drive design and enable …
MapCG: Writing parallel program portable between CPU and GPU
Graphics Processing Units (GPU) have been playing an important role in the general purpose
computing market recently. The common approach to program GPU today is to write GPU …
computing market recently. The common approach to program GPU today is to write GPU …
Lingvo: a modular and scalable framework for sequence-to-sequence modeling
Lingvo is a Tensorflow framework offering a complete solution for collaborative deep learning
research, with a particular focus towards sequence-to-sequence models. Lingvo models …
research, with a particular focus towards sequence-to-sequence models. Lingvo models …
Image classification at supercomputer scale
Deep learning is extremely computationally intensive, and hardware vendors have
responded by building faster accelerators in large clusters. Training deep learning models at …
responded by building faster accelerators in large clusters. Training deep learning models at …
GSPMD: general and scalable parallelization for ML computation graphs
We present GSPMD, an automatic, compiler-based parallelization system for common machine
learning computations. It allows users to write programs in the same way as for a single …
learning computations. It allows users to write programs in the same way as for a single …
AutoFDO: Automatic feedback-directed optimization for warehouse-scale applications
AutoFDO is a system to simplify real-world deployment of feedback-directed optimization (FDO).
The system works by sampling hardware performance monitors on production machines …
The system works by sampling hardware performance monitors on production machines …
Overlap communication with dependent computation via decomposition in large deep learning models
Large deep learning models have shown great potential with state-of-the-art results in many
tasks. However, running these large models is quite challenging on an accelerator (GPU or …
tasks. However, running these large models is quite challenging on an accelerator (GPU or …