The faculty members of the Department of Statistical Science are prominent scholars, researchers, and consultants, as well as dedicated teachers.

All are actively engaged in research which is being published in professional journals. Their research has been funded by major grants from private organizations and governmental agencies, including the Department of Energy, Advanced Research Projects Agency, National Science Foundation, Office of Naval Research, Department of Education, Air Force Office of Scientific Research, the National Institutes of Health, Department of Veteran Affairs, and the National Aeronautics and Space Administration.

Faculty members are actively working in the following areas:

### Analysis of Censored and Incomplete Data

In many life-testing and reliability studies, the experimenter may not always obtain complete information on failure times for all experimental units. Development of novel statistical methodologies to analyze different kinds of censored and incomplete data is an important research area. We have developed parametric frequestist and Bayesian statistical methods, and nonparametric statistical methods to analyze censored and incomplete data. Related computational algorithms are developed and these newly developed methodologies have shown to be efficient and practical.

**Faculty**: H. K. Tony Ng

### Analysis of Data from Epidemiology Studies

Epidemiology is the study of factors affecting the health and illness of populations, and serves as the foundation of interventions made in the interest of public health and preventive medicine. Statistics plays an important role in epidemiology, from collection of the data to drawing conclusions. I have worked with epidemiologists and researchers in medical fields on statistical modeling and analysis of data from medical studies. For instance, we have analyzed data regarding Parkinson’s disease to gain an understanding of trace element variations in cerebrospinal fluid and serum of Parkinson’s disease.

**Faculty**: H. K. Tony Ng

### Analysis of Degradation Data

Many failures of systems result from a gradual and irreversible accumulation of damage that occurs during a system’s life cycle. This is known as a degradation process. Information on product reliability can be obtained by analyzing degradation data. We have been working on the development of degradation models and the related statistical analyses. We have applied the statistical techniques for degradation modeling to analyze data from pharmaceutical industry and data related to power grid reliability. We have also edited a book on “Statistical Modeling for Degradation Data” to provide timely discussions of and presentations on methodological developments of the analysis of degradation data.

**Faculty**: H. K. Tony Ng

### Bayesian Statistics

With advances in theory and high-performance computing, Bayesian Statistics has become extremely useful nearly everywhere. My work in Biostatistics and Bioinformatics largely relies on development of new Bayesian methods and application of Bayesian theory, with focus on two interwoven subareas (i.e., Bayesian spatial models and Bayesian hierarchical modeling). I also work on classical topics like Bayesian variable selection and Bayesian treed models.

**Faculty**: Sherry Wang

### Complex-valued data analysis

An emerging field of statistics driven by applications in wireless communication is the analysis of data derived from digital communication systems, radar, sonar, etc. which are not real-valued but rather complex-valued. When statistical models are used to describe the transmission of data across 5G communication networks with multiple input and multiple outputs (MIMO), the raw data are complex responses so each datum response has both amplitude and phase. All transmission systems and receiving antennas have responses that are not in phase and which need to be put into phase. Distribution theory for such data poses special challenges to statistical theory particularly since the parameters of interest in such models are themselves typically complex-valued parameters. Standard distributions such as multivariate normal, Wishart, etc. need to be generalized to create models for complex multivariate normal, complex Wishart, and many other complex-valued distributions. A common misconception in Statistics is that time series and Kalman filtering models when applied in high tech contexts are used in their real response versions as typically taught in statistics classes - they are not. Rather models with complex-valued responses using complex distribution theory are used for dealing with microwave transmission and for the most commonly used example of filtering theory: radar and sonar tracking of moving objects. The likelihood functions derived from such complex data are necessarily real-valued functions as a function of complex-valued parameters and this ensures that they are not complex-analytic functions. This complicates the development of statistical likelihood theory which requires differentiation of such likelihoods. The emergence of this subject requires a substantial rewriting of the basics in mathematical statistics to accommodate inference for complex parameters from complex data.

**Faculty**: Ron Butler

### Design of early-stage cancer clinical trials

Phase 1 trials of cancer treatments are first-in-human studies used to evaluate the safety of new treatments. Most such studies use traditional rule-based designs that are familiar and simple to implement but are inflexible and have poor statistical properties. Contemporary model-based designs address these problems, but are not in wide use because they are difficult and costly to implement in many centers. We are devising and evaluating new designs that are simple to apply but have better statistical properties than the traditional methods.

**Faculty: **Daniel Heitjan

### Analysis of Data from Epidemiology Studies

In the past decades, the rapid growth of the technologies in genetics has produced many opportunities for statisticians to analyze data from genetic studies. Testing of association in genetic studies is one of the methods for understanding how genetic factors contribute to human disease. I have worked with researchers in bioinformatics to development novel methodologies for analyzing data from genome-wide association studies. We also study the validity of different statistical methods and make suitable adjustments when the GWAS involves correlated diseases. We have successfully developed statistical procedures to detect differentially expressed genes and to analyze real data obtained from GWAS. We have extended the results for the X-chromosome and considered the gene-environment interaction.

**Faculty**: H. K. Tony Ng

### Geometric and Topological Data Analysis

A defining characteristic of many modern data applications is their unstructured nature. The basic unit of analysis could be something other than a traditional observation such as regular arrays with fixed numbers of rows and columns and a single observation in each cell. Such questions are not amenable to traditional statistical procedures based on simple array-structured data. Geometric and topological data analysis provides a mathematical representation of the shape of a data and extracts structural information from a complex data set. Statistical inference on geometric and topological data analysis results provides a direct inference on the shape of data.

**Faculty**: Chul Moon

### Information Geometry in Statistics with Applications in Reliability

Information geometry and statistical manifold can be considered as a differential geometric approach for statistical inference and prediction. Statistical inferential methods based on non-additive entropy (Tsallis statistics) and Amari-Chentsov structure are of interest. We have developed Bayesian analysis based on statistical manifold and study the related properties. The statistical methods based on information geometry are applied to reliability and survival analysis and provide guidelines to the practitioners.

**Faculty**: H. K. Tony Ng

### Laplace and Fourier transform inversion

The physical and engineering sciences have traditionally used Laplace, Fourier, and z-transforms as a mean for analyzing the behavior of complex random systems. Such transforms underlie most of the study of systems theory but typically it is the inverses of such transforms as time-domain functions which are of greater interest. For example, in any stochastic network or electric circuit, the equivalent transmittance between any two nodes in the system is a Laplace or z-transform for an associated impulse response function but it is rather the impulse response function in time that is of more practical interest. Thus the inversion of such transforms becomes an important mathematical concern. Numerical inversion of such transforms draws from the theory of complex variables, numerical analysis, and the mathematics of computation as it relates to floating point arithmetic.

**Faculty**: Ron Butler

### Meta-analysis Methods in Biomedical Studies

In the dawn of a big data era, there is an increasingly urgent need to perform meta-analysis, i.e., the statistical procedure to synthesize information across a collection of relevant studies, to avoid indecisive or potentially conflicting conclusions from individual data and to leverage “wisdom of crowds” for more effective and reliable scientific discoveries. I have actively led my research team to develop new meta-analysis methods for (i) integrating multiple pathway enrichment studies and (ii) analyzing rare binary adverse events. The first project has been supported by NIH, with the goal to develop statistical methods and computational tools for integrative gene set enrichment analysis to efficiently synthesize diverse mRNA expression data from multiple studies. The strength of meta-analysis is significant in drug safety evaluation as well, where the number of cases (adverse events) can be very limited in a single study. The U.S. Food and Drug Administration (FDA) released a draft guidance for industry titled “Meta-Analyses of Randomized Controlled Clinical Trials to Evaluate the Safety of Human Drugs or Biological Products” in November 2018, which demonstrates the importance of meta-analysis in the development of new drugs. Such meta-analysis often involves binary outcomes of rare events, which is the focus of the second project.

**Faculty**: Sherry Wang

### Measuring Sensitivity to Nonignorability

Most data sets of any consequence have some missing observations. When the propensity to be missing is associated with the values of the observations, we say that the data are nonignorably missing. Nonignorability can lead to bias and other problems when one applies standard statistical analyses to the data. In principle, one can eliminate such problems by estimating models that account for nonignorability, but these models are notoriously non-robust and difficult to fit. An alternative approach is to measure the sensitivity to nonignorability, that is, to evaluate whether nonignorability, if it exists, is sufficient to change parameter estimates from their values under standard ignorable models. A primitive version of this idea is to tally the fraction of missing observations in a univariate data set; if the fraction is small, then presumably the potential bias arising from nonignorability is also small. We have developed methods and software to measure sensitivity for a broad range of data structures, missingness types, and statistical models.

**Faculty:**Daniel Heitjan

### Non-probability sampling

Lynne Stokes directs a team of Ph.D. students working on two projects related to Gulf of Mexico fisheries. The first, funded through a contract with NOAA, is developing and evaluating new methods for estimating catch of recreational anglers. These methods augment data from traditional surveys of anglers with real-time electronic self-reports. These new methods are being considered as replacements or supplements for NOAA’s current data collection methods. The second project, The Great Red Snapper Count, is a two-year $10 million effort by a multi-disciplinary team of 21 researchers who will provide a fisheries independent estimate of the abundance of red snapper in the Gulf of Mexico. The SMU team provides statistical support for the project, which will require integrating a variety of data collection and estimation strategies across the Gulf.

**Faculty**: Lynne Stokes

### Optimal Plans in Accelerated Life and Degradation Testing Experiments

In reliability testing experiments, one is often interested in maximizing the information obtained from the experiment, subject to some constraints in the experimental time and the number of experimental units. Based on likelihood and asymptotic theory, optimal solutions are obtained numerically by direct search and discrete optimization algorithms under different models. In order to study the performance of the optimal plans in real-life situations when the model assumptions are violated, detailed sensitivity analyses are also provided. We have obtained the results for different kinds of accelerated life testing and degradation testing experiments. We have also provided a comprehensive comparison of the optimal constant-stress and step-stress life-testing experiments and discussed the merits of different types of optimal experimental designs.

**Faculty**: H. K. Tony Ng

### Order Statistics

An order statistic is the realized ranked value of a random variable in a sample. The study of order statistics can be useful in a range of problems, such as evaluating the reliability of a manufacturing system that depends on performance of many similar parts or the risk to a life insurance company for its portfolio of policies. Inference from order statistics can provide robust and cost-effective testing and estimation. An example of efficient estimation using the theory of order statistics is ranked set sampling.

**Faculty: **Tony Ng, Xinlei Wang, Lynne Stokes

### Ranking and Selection

Decision-makers are frequently confronted with the problem of selecting from among a set of possible choices. Ranking and selection addresses the problem of how to choose the best among a group of items, where the quality of those items is measured imperfectly. Another aspect of the problem that we have studied is how to assess the quality of the measures themselves; i.e., ranking the rankers. Our approaches have included various ways of modeling the evaluation process. Applications have been wide-ranging, from wine-tasting, to proposal evaluation, to diving scores.

**Faculty: **Jing Cao, Lynne Stokes, Monnie McGee

### Real-time prediction in clinical trials

Clinical trial planning involves the specification of a projected duration of enrollment and follow-up needed to achieve the targeted study power. If pre-trial estimates of enrollment and event rates are inaccurate, projections can be faulty, leading to inadequate power or other mis-allocation of resources. We have developed an array of methods that use the accruing trial data to efficiently and correctly predict future enrollment counts, times of occurrence of landmark events, estimated final treatment effects, and ultimate significance of the trial.

**Faculty:**Daniel Heitjan

### Saddlepoint approximations and higher-order asymptotic theory

Modern methods used in statistics and probability often require the compu- tation of probabilities from complicated models in which what is know is the underlying transform theory for the distributions of interest rather than their explicit expressions. It is in this context that saddlepoint methods aid in the computations of such probabilities. Of particular relevance are the majority of probability computations used in stochastic modeling. The companion subject of higher-order asymptotic theory provides tools for making more precise com- putations than those normally derived from using central limit theory as based on the theory of weak convergence.

**Faculty**: Ron Butler

### Statistical Inference of Component Characteristics from System Lifetimes

In system reliability engineering, engineers and researchers are often interested in the lifetime distribution of the system as well as the lifetime distribution of the components which make-up the system. In many cases, the lifetimes of an n-component coherent system can be observed from a life-test but not the lifetimes of the components. We have developed the paradigm to obtain reliability information on components in a system through suitable statistical analysis. Computational algorithms and statistical methods for reliability assessment and comparisons are successfully investigated. These methodologies are applied to analyze real furniture reliability data.

**Faculty**: H. K. Tony Ng

### Statistical Analysis of “Omics” data

Due to advances in data acquisition technologies, enormous amounts of “omics” data have been generated to accelerate the pace of scientific discovery; and the volume continues to expand exponentially. Deeply understanding such massive, highly complex data requires innovations in statistical ideas, methods and computational algorithms. Over the past ten years, supported by both NIH and NSF, I have been developing statistical and bioinformatical methods for preprocessing, modeling and analyzing increasingly large (i.e., higher volume, higher density and higher dimension), complex and diverse data efficiently, and to collaborate with biomedical researchers for improving the understanding of biological processes, discovering genetic diagnosis and prognosis markers, and ultimately promoting prevention and treatment for complex human diseases.

**Faculty**: Sherry Wang

### Statistics for Artificial Intelligence

Artificial intelligence is intelligence exhibited by machines that run algorithms designed to mimic humans’ cognition mechanisms. Statistics must play a vital role in artificial intelligence as estimation uncertainties virtually exist for every created algorithm. However, many hot AI topics are yet to be adequately addressed by statisticians, and existing methods are based on ad hoc algorithms or optimization procedures, which do not allow for accurate model specification or statistical inference. In recent years, I have been working on several interesting topics that largely fall in the machine learning field, with an attempt to seamlessly fuse statistics into AI: (i) supervised dimension reduction for ultrahigh dimensional data where the sample size is much smaller than the dimensionality, (ii) multiple instance learning, and (iii) semi-supervised learning using only positive and unlabeled Data. These topics have a very wide range of applications ranging from genomics, genetics, tumor immunology, chemical simulation and text mining.

**Faculty**: Sherry Wang

### Stochastic processes, feedback systems and networks

This subject involves the study and modeling of random phenomena over space and time with particular emphasis on how components of the systems interact to create the dynamics of stochastic phenomenon. Feedback processes and mechanisms are an integral part of this subject. Such models include Markov chains, semi-Markov processes, diffusion processes, and their underlying renewal theory. This subject body represents the majority of mathematical models used in the physical sciences, engineering sciences, and stochastic finance.

**Faculty**: Ron Butler