The faculty members of the Department of Statistical Science are prominent scholars, researchers, and consultants, as well as dedicated teachers.

All are actively engaged in research which is being published in professional journals. Their research has been funded by major grants from private organizations and governmental agencies, including the Department of Energy, Advanced Research Projects Agency, National Science Foundation, Office of Naval Research, Department of Education, Air Force Office of Scientific Research, the National Institutes of Health, Department of Veteran Affairs, and the National Aeronautics and Space Administration.

Faculty members are actively working in the following areas:

### Complex-valued data analysis

An emerging field of statistics driven by applications in wireless communication is the analysis of data derived from digital communication systems, radar, sonar, etc. which are not real-valued but rather complex-valued. When statistical models are used to describe the transmission of data across 5G communication networks with multiple input and multiple outputs (MIMO), the raw data are complex responses so each datum response has both amplitude and phase. All transmission systems and receiving antennas have responses that are not in phase and which need to be put into phase. Distribution theory for such data poses special challenges to statistical theory particularly since the parameters of interest in such models are themselves typically complex-valued parameters. Standard distributions such as multivariate normal, Wishart, etc. need to be generalized to create models for complex multivariate normal, complex Wishart, and many other complex-valued distributions. A common misconception in Statistics is that time series and Kalman filtering models when applied in high tech contexts are used in their real response versions as typically taught in statistics classes - they are not. Rather models with complex-valued responses using complex distribution theory are used for dealing with microwave transmission and for the most commonly used example of filtering theory: radar and sonar tracking of moving objects. The likelihood functions derived from such complex data are necessarily real-valued functions as a function of complex-valued parameters and this ensures that they are not complex-analytic functions. This complicates the development of statistical likelihood theory which requires differentiation of such likelihoods. The emergence of this subject requires a substantial rewriting of the basics in mathematical statistics to accommodate inference for complex parameters from complex data.

**Faculty**: Ron Butler

### Design of early-stage cancer clinical trials

Phase 1 trials of cancer treatments are first-in-human studies used to evaluate the safety of new treatments. Most such studies use traditional rule-based designs that are familiar and simple to implement but are inflexible and have poor statistical properties. Contemporary model-based designs address these problems, but are not in wide use because they are difficult and costly to implement in many centers. We are devising and evaluating new designs that are simple to apply but have better statistical properties than the traditional methods.

**Faculty: **Daniel Heitjan

### Geometric and Topological Data Analysis

A defining characteristic of many modern data applications is their unstructured nature. The basic unit of analysis could be something other than a traditional observation, such as regular arrays with fixed numbers of rows and columns and a single observation in each cell. Such questions are not amenable to traditional statistical procedures based on simple array-structured data. Geometric and topological data analysis provides a mathematical representation of the shape of data and extracts structural information from a complex data set. We have developed statistical approaches for geometric and topological data analysis that provide a direct inference on the shape of data.

**Faculty**: Chul Moon

### Laplace and Fourier transform inversion

The physical and engineering sciences have traditionally used Laplace, Fourier, and z-transforms as a mean for analyzing the behavior of complex random systems. Such transforms underlie most of the study of systems theory but typically it is the inverses of such transforms as time-domain functions which are of greater interest. For example, in any stochastic network or electric circuit, the equivalent transmittance between any two nodes in the system is a Laplace or z-transform for an associated impulse response function but it is rather the impulse response function in time that is of more practical interest. Thus the inversion of such transforms becomes an important mathematical concern. Numerical inversion of such transforms draws from the theory of complex variables, numerical analysis, and the mathematics of computation as it relates to floating point arithmetic.

**Faculty**: Ron Butler

### Mixed-valued time series analysis

Multivariate time series are routinely modeled and analyzed by the well-known vector autoregressive (VAR) models. The main reasons are ease in computation arising from the imposed linearity, easily understood by a wide audience, and provide predictions. Though VAR models are well understood from a theoretical and methodological point of view, and are quite useful for analysis of continuous-valued data, they are inappropriate when dealing with multivariate time series when some of its components are integer-valued such as the daily number of new patient admissions to a hospital, the number of crimes in a particular region, trading volume during a time period. The goal is to develop new statistical tools and models for analyzing multivariate mixed-valued time series data. This is significant because multivariate time series data, discrete and continuous-valued, is collected in diverse scientific areas such as demography, econometrics, sociology, public health, and neurobiology for the purpose of forecasting, planning and informing policy.

**Faculty**: Raanju Sundararajan

### Multivariate and high-dimensional time series analysis

Time series data from various sources appear often in multivariate and high-dimensional form. Numerous important problems from application areas such as neuroscience, finance, environmental science and engineering involve analyzing time series data. As an example in neuroscience, functional MRI (fMRI) data from neuroscience experiments are recorded as a high-dimensional time series with signal from several thousand spatial locations in the brain. The interest here is in understanding time-varying interactions between different brain locations and also assist relating this with neurological disorders. As an example in engineering, power systems operations of renewable energy sources like wind depend on modeling and forecasting of multivariate time series data. Managing the renewable energy grid is critical in utilizing that energy source effectively and time series methods play a central role in the decision making process of these systems. The problems identified in the above mentioned areas require new time series methods that are computationally feasible and grounded in theory. Ongoing research focuses on developing such methods and these have theoretical and methodological importance in time series analysis.

**Faculty**: Raanju Sundararajan

### Measuring Sensitivity to Nonignorability

Most data sets of any consequence have some missing observations. When the propensity to be missing is associated with the values of the observations, we say that the data are nonignorably missing. Nonignorability can lead to bias and other problems when one applies standard statistical analyses to the data. In principle, one can eliminate such problems by estimating models that account for nonignorability, but these models are notoriously non-robust and difficult to fit. An alternative approach is to measure the sensitivity to nonignorability, that is, to evaluate whether nonignorability, if it exists, is sufficient to change parameter estimates from their values under standard ignorable models. A primitive version of this idea is to tally the fraction of missing observations in a univariate data set; if the fraction is small, then presumably the potential bias arising from nonignorability is also small. We have developed methods and software to measure sensitivity for a broad range of data structures, missingness types, and statistical models.

**Faculty:**Daniel Heitjan

### Nonparametric Statistics

Nonparametric statistics aim to infer an unknown quantity while making a few underlying assumptions. Because nonparametric methods make fewer assumptions, they can be useful when existing information about the application is insufficient. Nonparametric methods could provide more robust and simpler inference than parametric methods for various cases. Empirical likelihood is an example of the nonparametric method of inference.

**Faculty:**Chul Moon

### Non-probability sampling

Lynne Stokes directs a team of Ph.D. students working on two projects related to Gulf of Mexico fisheries. The first, funded through a contract with NOAA, is developing and evaluating new methods for estimating catch of recreational anglers. These methods augment data from traditional surveys of anglers with real-time electronic self-reports. These new methods are being considered as replacements or supplements for NOAA’s current data collection methods. The second project, The Great Red Snapper Count, is a two-year $10 million effort by a multi-disciplinary team of 21 researchers who will provide a fisheries independent estimate of the abundance of red snapper in the Gulf of Mexico. The SMU team provides statistical support for the project, which will require integrating a variety of data collection and estimation strategies across the Gulf.

**Faculty**: Lynne Stokes

### Order Statistics

An order statistic is the realized ranked value of a random variable in a sample. The study of order statistics can be useful in a range of problems, such as evaluating the reliability of a manufacturing system that depends on performance of many similar parts or the risk to a life insurance company for its portfolio of policies. Inference from order statistics can provide robust and cost-effective testing and estimation. An example of efficient estimation using the theory of order statistics is ranked set sampling.

**Faculty: **Xinlei Wang, Lynne Stokes, Chul Moon

### Ranking and Selection

Decision-makers are frequently confronted with the problem of selecting from among a set of possible choices. Ranking and selection addresses the problem of how to choose the best among a group of items, where the quality of those items is measured imperfectly. Another aspect of the problem that we have studied is how to assess the quality of the measures themselves; i.e., ranking the rankers. Our approaches have included various ways of modeling the evaluation process. Applications have been wide-ranging, from wine-tasting, to proposal evaluation, to diving scores.

**Faculty: **Jing Cao, Lynne Stokes, Monnie McGee

### Real-time prediction in clinical trials

Clinical trial planning involves the specification of a projected duration of enrollment and follow-up needed to achieve the targeted study power. If pre-trial estimates of enrollment and event rates are inaccurate, projections can be faulty, leading to inadequate power or other mis-allocation of resources. We have developed an array of methods that use the accruing trial data to efficiently and correctly predict future enrollment counts, times of occurrence of landmark events, estimated final treatment effects, and ultimate significance of the trial.

**Faculty:**Daniel Heitjan

### Saddlepoint approximations and higher-order asymptotic theory

Modern methods used in statistics and probability often require the compu- tation of probabilities from complicated models in which what is know is the underlying transform theory for the distributions of interest rather than their explicit expressions. It is in this context that saddlepoint methods aid in the computations of such probabilities. Of particular relevance are the majority of probability computations used in stochastic modeling. The companion subject of higher-order asymptotic theory provides tools for making more precise com- putations than those normally derived from using central limit theory as based on the theory of weak convergence.

**Faculty**: Ron Butler

### Stochastic processes, feedback systems and networks

This subject involves the study and modeling of random phenomena over space and time with particular emphasis on how components of the systems interact to create the dynamics of stochastic phenomenon. Feedback processes and mechanisms are an integral part of this subject. Such models include Markov chains, semi-Markov processes, diffusion processes, and their underlying renewal theory. This subject body represents the majority of mathematical models used in the physical sciences, engineering sciences, and stochastic finance.

**Faculty**: Ron Butler