Gradient descent finds global minima of deep neural networks S Du, J Lee, H Li, L Wang, X Zhai International conference on machine learning, 1675-1685, 2019 | 1289 | 2019 |

Gradient descent provably optimizes over-parameterized neural networks SS Du, X Zhai, B Poczos, A Singh arXiv preprint arXiv:1810.02054, 2018 | 786 | 2018 |

Generalization bounds of sgld for non-convex learning: Two theoretical viewpoints W Mou, L Wang, X Zhai, K Zheng Conference on Learning Theory, 605-638, 2018 | 149 | 2018 |

On the multiple descent of minimum-norm interpolants and restricted lower isometry of kernels T Liang, A Rakhlin, X Zhai Conference on Learning Theory, 2683-2711, 2020 | 123 | 2020 |

How many samples are needed to estimate a convolutional neural network? SS Du, Y Wang, X Zhai, S Balakrishnan, RR Salakhutdinov, A Singh Advances in Neural Information Processing Systems 31, 2018 | 80 | 2018 |

Consistency of interpolation with Laplace kernels is a high-dimensional phenomenon A Rakhlin, X Zhai Conference on Learning Theory, 2595-2623, 2019 | 79 | 2019 |

On the risk of minimum-norm interpolants and restricted lower isometry of kernels T Liang, A Rakhlin, X Zhai arXiv preprint arXiv:1908.10292 27, 2019 | 28 | 2019 |

How many samples are needed to estimate a convolutional or recurrent neural network? SS Du, Y Wang, X Zhai, S Balakrishnan, R Salakhutdinov, A Singh arXiv preprint arXiv:1805.07883, 2018 | 16 | 2018 |

Near optimal stratified sampling T Yu, X Zhai, S Sra arXiv preprint arXiv:1906.11289, 2019 | 3 | 2019 |