Introduction to data analytics with Apache Spark + Hands-On/Walkthrough part

René Jäkel
Technische Universität Dresden

The efficient and intelligent handling of large, often distributed and heterogeneous data sets increasingly determines the scientific and economic competitiveness in most application areas. Mobile applications, social networks, multimedia collections, sensor networks, data intense scientific experiments and complex simulations generate nowadays a huge data deluge. Nonetheless, processing and analyzing these data sets with innovative methods open up various new opportunities for its exploitation and new insights. The resulting resource requirements exceed the possibilities of state-of-the-art methods for the acquisition, integration, analysis and visualization of data. In recent years, many promising approaches have been developed and are available as community frameworks in the big data area to process large sets of data, which become increasingly interesting to be evaluated by domain scientists, spanning from specialized implementations using deep learning approaches to the processing and analysis of large scale stream-based sensor data. Furthermore, sophisticated and specialized hardware options are available in the high performance computing area which are interesting and could foster the efficient analysis of big data analytical workloads.

Presentation (PDF File)

Back to Science at Extreme Scales: Where Big Data Meets Large-Scale Computing Tutorials