Google Scholar

User profiles for Dehao Chen

Dehao Chen

- Verified email at google.com - Cited by 4907

Dehao Chen

- Verified email at emory.edu - Cited by 175

[PDF] arxiv.org

Lamda: Language models for dialog applications

…, D Lepikhin, J Qin, D Chen, Y Xu, Z Chen… - arXiv preprint arXiv …, 2022 - arxiv.org

We present LaMDA: Language Models for Dialog Applications. LaMDA is a family of
Transformer-based neural language models specialized for dialog, which have up to 137B …

Save Cite Cited by 1137 Related articles All 7 versions View as HTML

[PDF] neurips.cc

Gpipe: Efficient training of giant neural networks using pipeline parallelism

…, A Bapna, O Firat, D Chen, M Chen… - Advances in neural …, 2019 - proceedings.neurips.cc

Scaling up deep neural network capacity has been known as an effective approach to
improving model quality for several different machine learning tasks. In many cases, increasing …

Save Cite Cited by 1470 Related articles All 14 versions View as HTML

[PDF] arxiv.org

Gshard: Scaling giant models with conditional computation and automatic sharding

D Lepikhin, HJ Lee, Y Xu, D Chen, O Firat… - arXiv preprint arXiv …, 2020 - arxiv.org

Neural network scaling has been critical for improving the model quality in many real-world
machine learning applications with vast amounts of training data and compute. Although this …

Save Cite Cited by 713 Related articles All 7 versions View as HTML

[PDF] mlsys.org

Mlperf training benchmark

…, P Bailis, V Bittorf, D Brooks, D Chen… - Proceedings of …, 2020 - proceedings.mlsys.org

Abstract Machine learning is experiencing an explosion of software and hardware solutions,
and needs industry-standard performance benchmarks to drive design and enable …

Save Cite Cited by 309 Related articles All 5 versions View as HTML

[PDF] tsinghua.edu.cn

MapCG: Writing parallel program portable between CPU and GPU

C Hong, D Chen, W Chen, W Zheng, H Lin - Proceedings of the 19th …, 2010 - dl.acm.org

Graphics Processing Units (GPU) have been playing an important role in the general purpose
computing market recently. The common approach to program GPU today is to write GPU …

Save Cite Cited by 226 Related articles All 13 versions

[PDF] arxiv.org

Lingvo: a modular and scalable framework for sequence-to-sequence modeling

J Shen, P Nguyen, Y Wu, Z Chen, MX Chen… - arXiv preprint arXiv …, 2019 - arxiv.org

Lingvo is a Tensorflow framework offering a complete solution for collaborative deep learning
research, with a particular focus towards sequence-to-sequence models. Lingvo models …

Save Cite Cited by 199 Related articles All 6 versions View as HTML

[PDF] arxiv.org

Image classification at supercomputer scale

C Ying, S Kumar, D Chen, T Wang, Y Cheng - arXiv preprint arXiv …, 2018 - arxiv.org

Deep learning is extremely computationally intensive, and hardware vendors have
responded by building faster accelerators in large clusters. Training deep learning models at …

Save Cite Cited by 148 Related articles All 3 versions View as HTML

[PDF] arxiv.org

GSPMD: general and scalable parallelization for ML computation graphs

Y Xu, HJ Lee, D Chen, B Hechtman, Y Huang… - arXiv preprint arXiv …, 2021 - arxiv.org

We present GSPMD, an automatic, compiler-based parallelization system for common machine
learning computations. It allows users to write programs in the same way as for a single …

Save Cite Cited by 88 Related articles All 2 versions View as HTML

[PDF] acm.org

AutoFDO: Automatic feedback-directed optimization for warehouse-scale applications

D Chen, DX Li, T Moseley - … of the 2016 International Symposium on …, 2016 - dl.acm.org

AutoFDO is a system to simplify real-world deployment of feedback-directed optimization (FDO).
The system works by sampling hardware performance monitors on production machines …

Save Cite Cited by 114 Related articles All 8 versions

[PDF] acm.org

Overlap communication with dependent computation via decomposition in large deep learning models

…, A Davis, B Ilbeyi, B Hechtman, D Chen… - Proceedings of the 28th …, 2022 - dl.acm.org

Large deep learning models have shown great potential with state-of-the-art results in many
tasks. However, running these large models is quite challenging on an accelerator (GPU or …

Save Cite Cited by 26 Related articles All 2 versions

Create alert

Cite

Advanced search

Saved to My library

User profiles for Dehao Chen

Dehao Chen

Dehao Chen

Lamda: Language models for dialog applications

Gpipe: Efficient training of giant neural networks using pipeline parallelism

Gshard: Scaling giant models with conditional computation and automatic sharding

Mlperf training benchmark

MapCG: Writing parallel program portable between CPU and GPU

Lingvo: a modular and scalable framework for sequence-to-sequence modeling

Image classification at supercomputer scale

GSPMD: general and scalable parallelization for ML computation graphs

AutoFDO: Automatic feedback-directed optimization for warehouse-scale applications

Overlap communication with dependent computation via decomposition in large deep learning models