User profiles for Dehao Chen

Dehao Chen

- Verified email at google.com - Cited by 4907

Dehao Chen

- Verified email at emory.edu - Cited by 175

Lamda: Language models for dialog applications

…, D Lepikhin, J Qin, D Chen, Y Xu, Z Chen… - arXiv preprint arXiv …, 2022 - arxiv.org
We present LaMDA: Language Models for Dialog Applications. LaMDA is a family of
Transformer-based neural language models specialized for dialog, which have up to 137B …

Gpipe: Efficient training of giant neural networks using pipeline parallelism

…, A Bapna, O Firat, D Chen, M Chen… - Advances in neural …, 2019 - proceedings.neurips.cc
Scaling up deep neural network capacity has been known as an effective approach to
improving model quality for several different machine learning tasks. In many cases, increasing …

Gshard: Scaling giant models with conditional computation and automatic sharding

D Lepikhin, HJ Lee, Y Xu, D Chen, O Firat… - arXiv preprint arXiv …, 2020 - arxiv.org
Neural network scaling has been critical for improving the model quality in many real-world
machine learning applications with vast amounts of training data and compute. Although this …

Mlperf training benchmark

…, P Bailis, V Bittorf, D Brooks, D Chen… - Proceedings of …, 2020 - proceedings.mlsys.org
Abstract Machine learning is experiencing an explosion of software and hardware solutions,
and needs industry-standard performance benchmarks to drive design and enable …

MapCG: Writing parallel program portable between CPU and GPU

C Hong, D Chen, W Chen, W Zheng, H Lin - Proceedings of the 19th …, 2010 - dl.acm.org
Graphics Processing Units (GPU) have been playing an important role in the general purpose
computing market recently. The common approach to program GPU today is to write GPU …

Lingvo: a modular and scalable framework for sequence-to-sequence modeling

J Shen, P Nguyen, Y Wu, Z Chen, MX Chen… - arXiv preprint arXiv …, 2019 - arxiv.org
Lingvo is a Tensorflow framework offering a complete solution for collaborative deep learning
research, with a particular focus towards sequence-to-sequence models. Lingvo models …

Image classification at supercomputer scale

C Ying, S Kumar, D Chen, T Wang, Y Cheng - arXiv preprint arXiv …, 2018 - arxiv.org
Deep learning is extremely computationally intensive, and hardware vendors have
responded by building faster accelerators in large clusters. Training deep learning models at …

GSPMD: general and scalable parallelization for ML computation graphs

Y Xu, HJ Lee, D Chen, B Hechtman, Y Huang… - arXiv preprint arXiv …, 2021 - arxiv.org
We present GSPMD, an automatic, compiler-based parallelization system for common machine
learning computations. It allows users to write programs in the same way as for a single …

AutoFDO: Automatic feedback-directed optimization for warehouse-scale applications

D Chen, DX Li, T Moseley - … of the 2016 International Symposium on …, 2016 - dl.acm.org
AutoFDO is a system to simplify real-world deployment of feedback-directed optimization (FDO).
The system works by sampling hardware performance monitors on production machines …

Overlap communication with dependent computation via decomposition in large deep learning models

…, A Davis, B Ilbeyi, B Hechtman, D Chen… - Proceedings of the 28th …, 2022 - dl.acm.org
Large deep learning models have shown great potential with state-of-the-art results in many
tasks. However, running these large models is quite challenging on an accelerator (GPU or …