The sponsors and projects for G-RIPS Berlin 2022 include:
Sponsor: Cray Germany GmbH
Title: High performance computing (HPC) for real-world simulations
Project Description: Object stores have been recognized as one of the crucial building blocks for Exascale HPC systems, since they permit novel usage of permanent data storage outside of the POSIX file system paradigm. DAOS, the Distributed Asynchronous Object Storage, is one of the most scalable examples, and using a combination of NVRAM and SSD storage offers both high metadata and block storage performance, while being resilient to failures of the storage infrastructure.
In this project we propose to investigate offload capabilities from processes to the DAOS object store. Subsequently, the technical question is, how the offloading of data actions to DAOS can be implemented. We propose to design an API so that a user process can trigger data processing on the DAOS server by shipping code to the DAOS server similar to stored procedures, or activating software container deployed on the DAOS server, or lazy objects within DAOS.
Project 2: Biotronik GmbH / 1000shapes GmbH
Sponsor: Biotronik GmbH / 1000shapes GmbH
Title: Auto-ML for bio-medical data-analysis
Project Description: The amount of available bio-medical data has rapidly increased in recent years. Not only in the scientific context, but also in hospitals and in industry, more and more detailed data on a wide variety of diseases are being collected and are available in public databases. Analyzing this data has not only become the bottle-neck due to the size of the data-sets, but also because designing appropriate models has become very time-consuming due to the many possible options and algorithms. Some years ago there might have been a standard way how a particular data type (e.g. genomics data) should have been analyzed – however, these days it is very rare that a one-size-fits-all analysis approach exists for a particular data type. In fact, the contrary is the case: designing a good machine leaning model involves putting together multiple components which have tons of (hyper-)parameters and calibration steps. This often involves steps for preprocessing, feature selection, classification, interpretation and so on. As a result, designing a suited and well working machine learning model often involves a time-consuming (cyclical) process of putting together a multi-component analysis pipeline, fine-tuning the parameters, evaluating the results, replacing some of the components with other algorithms, another fine-tuning, more evaluation and repeating over again. This goes on until a good combination of algorithms and parameters is found. However, whether it was the best combination for this dataset can in most cases not be determined: maybe there is this other algorithm that would have resulted in a better outcome but was not tried out?!
The main idea of Auto-ML is that machine learning algorithms take over the construction of the analysis pipeline which before has been done manually – including a smart way for hyper-parameter optimization. There are several frameworks available today that are implementing the Auto-ML toolbox and allow experiments with own data-sets (see e.g. this review by Waring et al. [1]).
In this project we will use bio-medical data-sets from our project sponsors to evaluate the power of Auto-ML approach and compare it to the current state-of-the-art solutions. The aim is to generate new machine-learning models for disease diagnosis.
[1]: https://doi.org/10.1016/j.artmed.2020.101822
Project 3: FICO and Gurobi Optimization
Sponsor: FICO and Gurobi Optimization
Title: Machine learning for combinatorial optimization
Project Description: While most combinatorial optimization solvers are presented as general-purpose, one-size-fits-all algorithms, this project’s main scientific question is the following: Is machine learning a viable option for improving traditional combinatorial optimization solvers on specific problem distributions, when historical data is available?
This general problem captures a practical scenario highly relevant to many application areas, where a practitioner repeatedly solves problem instances from a specific distribution, with redundant patterns and characteristics. For example, managing a large-scale energy distribution network requires solving very similar CO problems on a daily basis, with a fixed power grid structure while only the demand changes over time. This change of demand is hard to capture by hand-engineered expert rules, and ML-enhanced approaches offer a possible solution to detect typical patterns in the demand history. Other examples include crew scheduling problems that have to be solved daily or weekly with minor variations, or vehicle routing where the traffic conditions change over time, but the overall transportation network does not.
In 2021, the NeurIPS conference, one of the major venues for machine learning research featured a competition on precisely this question [1]. Researchers submitted many different approaches to solve three different challenges: Creating primal solutions, improving dual bounds, and configuring solvers by parameter tuning. The task of the G-RIPS project will be to select one of the three challenges and improve upon the existing approaches submitted to last year’s competition.
The project is supervised by MODAL SynLab in collaboration with industry partners FICO [2] and Gurobi Optimization [3], both of which provide market-leading optimization solvers for mixed-integer programming.
[1] https://www.ecole.ai/2021/ml4co-competition/
[2] https://www.fico.com/en/products/fico-xpress-optimization
[3] https://www.gurobi.com/