Graduate-level Research in Industrial Projects for Students (GRIPS)

Graduate-level Research in Industrial Projects for Students (GRIPS) – Berlin 2019

June 24 - August 16, 2019

Sponsors and Projects

The sponsors and projects for 2019 include:

Project 1: Deutsche Bahn

Project 2: 1000shapes (Biotechnology)

Project 3: Deloitte Deutschland

Project 1: Deutsche Bahn

Sponsor

Deutsche Bahn (DB) is Germany’s major railway company. It transports on average 5.4 million customers every day over a rail network that consists of 33,500 km of track, and 5,645 train stations. DB operates in over 130 countries world-wide. It provides its customers with mobility and logistical services, and operates and controls the related rail, road, ocean and air traffic networks.

Project

You will learn to think about railway networks from a planner’s perspective. Making up ICE rotations sounds easy at first, but you will soon find out that a lot of constraints have to be taken into account and do not forget about the size of Germany’s rail network! This makes finding and understanding suitable mathematical programming models a difficulty of its own. It will be your daily business to deal with huge data sets. You will write scripts to process the data and extract useful information. The past project assignments included to find out how robust optimization methodology can be incorporated in the optimization process and to develop a rotation plan for the situation that a restricted amount of train conductors is available, e.g. in a strike scenario.

Hosting Lab

The RailLab located at Zuse Institute cooperates with DB Fernverkehr to develop an optimization core that helps to operate the Intercity-Express (ICE), Germany’s fastest and most prestigious train, in the most efficient way. This is achieved by determining how the ICEs should rotate within Germany and, thereby, reducing the number of empty trips. The software has now been deployed in production at DB Fernverkehr for several years.

Requirements

The prospective participant should:

have a good command of a high-level programming language (preferably C++) and experience in writing scripts, e.g. in Python or Shell,
have attended classes in the area of combinatorial optimization, linear and integer programming or acquired the foundations of this field by some other means
be prepared to work with huge datasets from industry partners (which involves cleaning and preprocessing to overcome inconsistencies and incompleteness).

Ideally he or she:

is familiar with procedures in the area of rail traffic and/or logistics,
has experience in working in a Linux/Unix environment and
collaborative work on source code (e.g. working with revision control systems).

Project 2: 1000shapes (Biotechnology)

Sponsor

1000shapes GmbH is a ZIB spin-off that transfers research in life sciences into products for clinical applications. 1000shapes provides advanced solutions in image and geometry processing for 2D and 3D product design, covering the full spectrum from measurement, analysis, planning up to manufacturing. In the medical field, 1000shapes is interested in analyzing medical image based data, such as x-ray, CT or MRT data.

Project

The project will deal with the integrative analysis of large medical data sets coming from a large study about knee osteoarthritis, one of the most common causes of disability in adults. Based on clinical, imaging, genomics and proteomics data the project team will work on and with state-of-the-art algorithms for analyzing this data. The ultimate goal is to integrate the single data sources into a large modelling framework which allows detection / diagnosis of the disease.

Problems and (some) hope: Most of the data coming from available bio-medical data sources, such as images or proteomics data, is ultra high-dimensional and very noisy. At the same time, this data exhibits a very particular structure, in the sense that it is highly sparse. Thus the information content of this data is much lower than its actual dimension seems to suggest, which is the requirement for any following step in this project: the dimension reduction of the data with as little loss of information as possible.

Unfortunately the sparsity structure of this data is complex, (in most cases) not known a-priori, and usually does not coincide with often assumed patterns such as joint sparsity or Gaussian noise. This means, although the data is highly sparse, the sparsity structure as well as the noise distribution is non-standard. However, specifically adapted dimension reduction strategies such as compressed sensing do not readily exist e.g. for proteomics data.

However, methods exist that allow to identify the sparsity structure of the contained information from very high-dimensional, noisy -omics and imaging data. Once this has been achieved, the next step is the integrating of the (low-dimensional) information into one unified mode. We will use a network-based approach, modelling the various biological levels through a multiplex network coming from existing databases such as known protein/protein or gene/protein interactions. The hope is that this model can shed some light on the mechanisms of osteoarthritis and maybe even allow new ways of early diagnosis of this disease.

Hosting Lab

The members of the MedLab develop new mathematical methods that allow identification of disease specific signatures within modern large-scale bio-medical datasets, such as genomics or proteomics sources. Having these signatures (e.g. changing concentrations of a blood protein during some viral infection) will allow to build new diagnostic test but also to gain insights about disease mechanisms. This is based on the insight that changes in cells – while they undergo transformation from “normal” to a malignant state (e.g. during infections) – happen on many biological levels, including genes, proteins and metabolites. Integrative analysis of all these levels allows generation of more detailed and informative models about a disease when compared to just analyzing the effect of single biomarkers, such as blood values or proteins levels.

Requirements

The prospective participant should:

have a background in mathematics, bioinformatics or computer science,
have experience in network analysis,
have experience with a high-level programming language (e.g. C/C++, Java or Python) and a statistical software package such as R,
have attended classes in the area of data mining or acquired the foundations of this field by some other means
be prepared to work with very large datasets from industry partners (which involves preprocessing, e.g. to overcome inconsistencies and incompleteness).
Ideally, be familiar with the biological background and has already worked with biological data-sets, and finally…
have experience in working in a Linux/Unix environment.

Project 3: Deloitte Deutschland

Sponsor

Deloitte provides audit, risk advisory, tax, financial advisory, and consulting services to public and private clients spanning multiple industries; legal advisory services in Germany are provided by Deloitte Legal. With a globally connected network of member firms in more than 150 countries, Deloitte brings excellent capabilities and high-quality service to clients, delivering the insights they need to address their most complex business challenges. Deloitte’s approximately 286,000 professionals are committed to making an impact that matters.

Project

The goal of this project is to analyze data anomalies in the context of credit card data. Based on anonymized credit card transactions, time series patterns need to be analyzed in order to detect payment fraud. Based on both statistical and machine learning approaches, competing fraud detection models will be investigated. As a result, the applied approaches will be compared with respect to accuracy, complexity and feasibility. Additionally, the results will be visualized. You will work together with experienced data scientists and you will learn in this project how to visualize and to communicate results within a heterogeneous team. Furthermore, you will get deep insights into the challenges that practitioners are facing, especially when the achieved solutions to such real-world problems need to be implemented into applied risk management processes.

Hosting Lab

This project will be hosted by the Med Lab together with the Analytics Institute.

The Analytics Institute has been working as a think tank and accelerator in the sectors of business and technology since 2014. Since it was created, it enables companies to think new and differently about the use of data analytics and Big Data to explore and exploit available opportunities.

The Analytics Institute operates at the intersection of business, academia and technology as an expert and catalyst for analytics in the marketplace, enabling clients, partners and stakeholders to develop, implement and improve sustainable analytics solutions within their business and technological infrastructure.

Team

Our team bundles the expertise of data scientists, data engineers, designers and experienced industry experts, to create customized analytics solutions according to our clients’ need.

Requirements

The prospective participant should:

Have a solid statistical or mathematical background,
Have a good command of a programming language and should be experienced in writing scripts, e.g. in R or Python,
and experience with Big Data

Ideally he or she:

Is familiar with methodologies of outlier detection,
Has already worked with machine learning algorithms,
Is interested in financial risk management and
Has experience in visualizing of large data sets.