July 10 - 14, 2023

Title: **Statistical Foundations: From Theory to Practice**

Name: Vianey Leos Barajas

Tutorial: Statistics plays a foundational role in data science and applied mathematics. In addition to the theory, we must understand the role of statistics and probability in practice and be able to connect it to real world applications. This is most often seen through the use of probability distribution/mass functions that allocate probability to events of interest. However, sample size, sampling mechanisms and other factors can and do have an impact on the data we collect in practice. In this short course, we will use R to learn about statistical theory as well as simulate data from a variety of distributions and models, e.g., linear models, generalized linear models, and hierarchical models.

Title: **Data Driven Mathematical Models and Simulation Techniques**

Name: Keisha Cook

Tutorial: Applied mathematics research involves problems that rely on samples of data taken from the outside world. This can be seen in fields ranging from biology, physics, engineering, environmental science, and more. To understand and predict information about data that has yet to be collected, we rely on simulations of real world scenarios. We can use the collected data to compute known parameters. The relevant parameters can be used to develop models that simulate the behavior of the collected data. To predict unknown parameters and future outcomes of our scenarios, we rely on inference methods. In this short course, we will learn how to build a mathematical model from data, simulate data that closely represents the collected data, and use the simulations to make model predictions. Overall, we want the data to influence our model development decisions and for the information we learn from the models to help us understand the data.

Title: **Using Machine Learning Techniques to Explore and Analyze LiDAR 3D Point Clouds**

Name: F. Patricia Medina

Tutorial: LiDAR is an optical remote sensing method that uses laser beams to estimate the spatial coordinates of desired targets on earth. LiDAR has been used extensively for self-driving cars technology, urban planification, and forestry (climate change applications). Features include many physical properties such as intensity. These 3D point clouds include classes from a natural environment such as water, man-made structures, vegetation, and the ground. Our goal would be to classify the 3D point clouds using an ML pipeline. This short course will give participants an opportunity to gain understanding of the data, and learn how to clean data by using tools like Python or OpenRefine and feature engineering. The second session would include an overview of some dimensionality reduction methods such as principal component analysis. We will then introduce neural networks in the context of classification and discuss their implementation. Other classification techniques will be introduced. We also expect to share different data visualization options and practice storytelling through final 5 minute presentations.

Title: **Using Machine Learning to Find Interesting Phenomena in Large Image Archives**

Name: Umaa Rebbapragada

Tutorial: Finding and classifying interesting phenomena in large image archives is often performed manually and is therefore a time- and labor-intensive process. Once a scientist has found interesting phenomena in a mission archive, they may wish to find more examples. Data science methods can streamline discovery of interesting phenomena with a high level of accuracy, dramatically reducing the time and cost of finding the relevant features while enabling interpretability. This short course will cover how data science can be used to classify features in image archives, in particular focusing on supervised learning techniques that assume a catalog of labeled image features already exist.