Events

Department of Mathematics, SMU

126 Clements Hall

Mamikon Gulian, Applied Math, Brown, *Fractional Laplacians in Bounded Domains, Levy Processes, and Feynman-Kac Formulas*

Prof. Jianguo Liu, Math, Duke, *Analysis of some machine learning algorithms: stochastic gradient descent and online PCA*

In this talk I will present some mathematical questions in machine learning including:

(1) Semi-group of the stochastic gradient descent (SGD) and online principal component analysis (PCA) and diffusion approximation.

SGD and its variants are the most common tools in the supervised learning and it is widely believed that the behavior of SGDs shall be described by stochastic differential equations (SDE). I will present a simple and rigorous justification of this claim by using small jumpapproximation theory in Markov process and stability and truncation error analysis of semigroup solutions of SGD and the semigroup solution to stochastic differential equation (SDE). This is a joint work with Lei Li (Duke) and Yuanyuan Feng (CMU).

(2) Estimates on the escape time for SGD to escape from saddle points and local maximums and analysis on mini-batch SGD

Estimating the escape time is a central question in machine learning in very high dimensional and non-convex statistical optimization. I will present a result on estimating of escape time using the theory of large deviation of random dynamical system. I will use the diffusion approximation to analyze the effects of batch size for the deep neural networks. We will explain that small batch size is helpful for SGD algorithms to escape unstable stationary points and sharp minimizers. This is a joint work with Lei Li (Duke) and Junchi Li (Princeton).

(3) Application of SGD in optical tomography

Many of inverse problems can be formatted as statistics optimization problems and online deep learning methods such as SGD can be used to effectively solve the prohibitive memory and computation problem in very high dimensions. I will present a successful example of online learning in optical tomography which has many applications in medial image. This is a joint work with Ke Chen and Qin Li (U Wisconsin-Madison).

Prof. Anne Gelb, Math, Dartmouth, *Reducing the effects of bad data measurements using variance based weighted joint sparsity*

We introduce the variance based joint sparsity (VBJS) method for sparse signal recovery and image reconstruction from multiple measurement vectors. Joint sparsity techniques employing $\ell_{2,1}$ minimization are typically used, but the algorithm is computationally intensive and requires fine tuning of parameters. The VBJS method uses a weighted $\ell_1$ joint sparsity algorithm, where the weights depend on the pixel-wise variance. The VBJS method is accurate, robust, cost efficient and also reduces the effects of false data.

Prof. George Biros, ICES, UT Austin, *Randomized linear algebra for hierarchical and kernel matrices*

Hierarchical matrices have blocks that can be approximated well by

low-rank decomposition. Kernel matrices are hierarchical matrices

whose entries are related to pairwise distances between points. These

matrices find many applications in science, engineering, and machine

learning. I will present algorithms for constructing low-rank

approximations and using them to accelerate matrix-vector products and

the solution of linear systems.

In particular, Professor Biros will present two algorithms: ASKIT and GOFMM. ASKIT

(approximate skeletonization kernel independent treecode) dependents

only on the local intrinsic dimension of the dataset as opposed to the

dimension of the ambient space of the dataset. GOFMM (Geometry

Oblivious Fast Multipole Method) generalizes the FMM method to

arbitrary symmetric positive definite matrices--no geometric

information is necessary. Under certain circumstances, GOFMM enables

an approximate matrix-vector multiplication in O(N log N) or even O(N)

time. As a highlight, we will report results for Gaussian kernel

matrices with 100 million points in 64 dimensions, and for eight

million points in 1000 dimensions.

Prof. Michael Mascagni, Math & CS, Florida State, *The "White Rat" of Numerical Reproducibility*

We explore an application from the author's work in neuroscience. A code used to investigate neural development modeled 100 neurons with all-to-all excitatory connectivity. We used a simple ordinary differential equation system to model each neuron, and this model was used to produce a paper published in the Journal of Neurophysiology. Later a colleague used our code to continue this work, and found he could not reproduce our results. This lead us to thoroughly investigate this code and we discovered that it offered many different ways to thwart reproducibility.

Numerical reproducibility is considered a task that directly follows from the determinism in computations. However, reproducibility has become an intense concern and and issue for research. In fact, the author developed an international workshop of numerical reproducibility that is now a regular offering at the annual Supercomputing XX conference. We will show how this particular code provides a lack of reproducibility from the following sources:

- The non-associativity of floating-point operations in two ways
- Differences in library mathematical functions whose reliability and correctness we take for granted

This code's sensitivity makes it a very powerful tool to explore many different manifestations of numerical reproducibility. However, this code is by no means exceptional, as in neuroscience these types of models are used extensively to gain insights on the functioning of the nervous system. In addition, these types of models are widely used in many other fields of study.

Prof. Lin Lin, Math, Berkeley, *Accelerating Hartree-Fock-like equations*

Hartree-Fock-like equations are very widely used in quantum chemistry and materials science, but the computational cost for solving such these equations is quite high. In a simplified mathematical setting, the solution requires the computation of low-lying eigenpairs of a large matrix in the form A+B. Here applying A to a vector is easy but A has a large spectral radius, while applying B (the Fock operator) is costly but B has a small spectral radius. It turns out that most eigensolvers are not well equipped to solve such problems efficiently. We will discuss some recently developed strategies to significantly accelerate such calculations, and to enable the solution of Hartree-Fock-like equations for more than 4000 atoms in a planewave basis set. Some of the techniques have been included in community electronic structure software packages such as Quantum ESPRESSO. We also find that the setup of the Hartree-Fock-like equations introduces interesting questions from the perspective of numerical analysis.

* Refreshment starts 15 minutes before talks.

Further details available on math colloquia wiki page.