ICS 2012 logo







ICS 2012 WORKSHOPS & TUTORIALS


The 26th International Conference on Supercomputing (ICS2012) program will include workshops and tutorials scheduled on Monday, June 25th and on Friday, June 29th .

Workshops

Tutorials



Schedule


Monday 25th June

Morning

W1: "Future HPC systems: the Challenges of power-constrained performance"

CANCELLED T1: "POTRA: a framework for Building Power Models For Next Generation Multicore Architectures"

CANCELLED T3: "Parallel & Cloud Programming with Haskell"

T5: "Parallel Programming with Cilk Plus using the Intel Compiler and Autotuning"

W2: "PAPAH: Performance, Applications, and Parallelism for Android and HTML5 (PAPAH 2012)"

Afternoon

CANCELLED T2: "Power Architecture processors: architecture and performance"

CANCELLED T4: "CUDA Programming, Profiling and Optimization"

T6: "SnuCL: An OpenCL Framework for Heterogeneous CPU/GPU Clusters"

Friday 29th June

Morning

CANCELLED W3: "First International Workshop on Energy Efficiency in Supercomputing (EESC 2012)"

W4: "Second International Workshop on High-performance Infrastructure for Scalable Tools (WHIST 2012)"

W5: "2nd International Workshop on Runtime and Operating Systems for Supercomputers (ROSS12)"

CANCELLED T7: "A practical approach to performance analysis and modeling of large-scale systems"

CANCELLED T9: "Developing Scientific Applications with the Eclipse Parallel Tools Platform"

Afternoon

CANCELLED T8: "Designing High-End Computing Systems and Programming Models with InfiniBand and High-speed Ethernet"

CANCELLED T10: "High Performance Computing in Biomedical Informatics"





Workshops



W1: "Future HPC systems: the Challenges of power-constrained performance"

Duration and schedule:

Full Day (Monday, June 25th 9:00 - 18:00)

Organizers

Sanzio Bassini (CINECA, IT), Adolfy Hoisie (Pacific Northwest National Laboratory, USA), Darren J. Kerbyson (Pacific Northwest National Laboratory, USA), Dirk Pleiter (Julich S upercomputing Centre and University of Regensburg, Germany), and Fabio Schifano (University of Ferrara, Italy)

Description

In the past, the computational requirements of important scientific applications have often driven innovations in high performance computing. Presently major changes in computer architectures are taking place that are often driven by consumer electronics. In addition as systems continue to scale in size, we expect that power consumption will become a major concern for future generation supercomputers. Current systems consume more than a Megawatt per Petaflop. Achieving Exascale levels of computation - a 100x improvement in performance from today - will be significantly constrained if power requirements were to similarly scale. The optimization of power and energy at all levels, from application to system software and to hardware at both processor and system scales, is required.

The challenges ahead are many-fold. Increasing parallelism, memory systems, interconnection networks, storage and uncertainties in programming models all add to the complexities. The recent trend with integrating accelerators in large-scale systems provides additional challenges in marshaling the increased parallelism and data movements. More rapid realization of energy savings will require significant increases in measurement resolution and optimization techniques. The interplay between performance, power, and reliability also leads to complex trade-offs. This workshop aims at reviewing the latest development of HPC systems, that in the near future may evolve in directions substantially different from today's paradigms. We are interested in assessing their potential impact on scientific computing, and in state-of-the-art tools and techniques for measuring and optimizing performance as well as power. We will also discuss new hardware capabilities that may become crucial in the near future for accurate monitoring and optimization of performance and power.

Workshop's web page

Back to schedule

W2: "PAPAH: Performance, Applications, and Parallelism for Android and HTML5 (PAPAH 2012)"

Duration and schedule:

Full Day (Monday, June 25th 9:00 - 18:00)

Organizers

Alex Nicolau (University of California, Irvine, USA), Alex Veidenbaum (University of California, Irvine, USA)

Description

Android is the OS of choice for the majority of smartphones today and its importance is likely to grow in the future. HTML5 aims to enhance Web applications with support for the latest multimedia and graphics. Both Android and HTML5 are designed to specifically adapt to a huge variety of platforms, displays and performance/power configurations and the myriad applications developed under both ANDROID and HTML5 are increasingly demanding in terms of performance, communication and power management. At the same time, smartphone processors are becoming multi-cores thus providing opportunities for performance/power optimization of both the OS and the browser on such platforms. This workshop is a forum for discussion and presentation of cutting-edge research on analysis and optimization of Android and HTML5 performance in the context of current and future parallel architectures - from mobile devices up to servers.

Workshop's web page

Back to schedule

CANCELLED W3: "First international Workshop on Energy Efficiency in Supercomputing (EESC 2012)"

Duration and schedule:

Full Day workshop (Friday, June 29th 9:00 - 18:00)

Organizers

Arndt Bode (LRZ, TU Munich, DE) and Bronis R. de Supinski (Lawrence Livermore National Laboratori, USA)

Description

One of the major challenges on the way towards exascale computing is the ever growing demand in electrical power and energy that HPC systems consume. The reasons are the ever growing number of transistors on the processors and the steep increase in the number of processors in order to accelerate the pace of HPC systems. The increased parallelism requires more system infrastructure (e.g., interconnect, memory, I/O nodes, and cooling) in order to harness most of the available compute power. This parallelism in turn leads to higher system power levels and energy consumption, which requires a bigger support infrastructure (e.g., building design, power supply, and cooling) to handle the increased power and cooling requirements.

The challenge that the HPC community faces in the next decade will be to reduce the power requirements on every level of HPC systems while still increasing compute performance. This workshop will provide a forum for researchers to present and to exchange ideas concerning power monitoring, modeling and saving methods for all levels of HPC.


Back to schedule

W4: "Second International Workshop on High-performance Infrastructure for Scalable Tools (WHIST 2012)"

Duration and schedule:

Full Day workshop (Friday, June 29th 9:00 - 18:00)

Organizers

Todd Gamblin (Lawrence Livermore National Laboratory, USA), Nathan Tallent (Pacific Northwest National Laboratory, USA)

Description

From laptops to supercomputers, increasingly complex multicore and accelerator hardware is driving rapid growth in concurrency. At the high end, exascale systems are expected to support over 100 million threads, primarily due to increased intra-node concurrency. To take full advantage of this increased concurrency, new software and programming models are necessary. With increased system and application complexity, scalable tools are critical for diagnosing the root causes of performance and correctness problems.

To diagnose and correct problems in highly concurrent systems, tools themselves are becoming more complex. Tools will require sophisticated infrastructure to measure, analyze, diagnose and present the causes of an execution's anomalies. In many cases, tools will combine online and offline analysis. They may use sophisticated modeling and statistical analysis techniques. They may attempt to correct problems; and they may have to survive faults. To manage this complexity, there is a need for abstractions that simplify tool design and for infrastructure that is reusable and extensible.

Workshop's web page

Back to schedule

W5: "2nd International Workshop on Runtime and Operating Systems for Supercomputers (ROSS12)"

Duration and schedule:

Full Day (Friday, June 29th 9:00 - 18:00)

Organizers

Torsten Hoefler (NCSA, University of Illinois at Urbana-Champaign, USA) and Kamil Iskra (Argonne National Laboratory, USA)

Description

The complexity of node architectures in supercomputers increases as we cross petaflop milestones on the way towards Exascale. Increasing levels of parallelism in multi- and many-core chips and emerging heterogeneity of computational resources coupled with energy and memory constraints force a re-evaluation of our approaches towards operating systems and runtime environments. The International Workshop on Runtime and Operating Systems for Supercomputers provides a forum for researchers to exchange ideas and discuss research questions that are relevant to upcoming supercomputers.

Workshop's web page

Back to schedule

Tutorials



CANCELLED T1: "POTRA: a framework for Building Power Models For Next Generation Multicore Architectures"

Duration and schedule:

Half Day (Monday, June 25th 9:00 - 13:00)

Organizers

Ramon Bertran (Barcelona Supercomputing Center, ES), Marc Gonzalez (Universitat Politecnica de Catalunya, ES)

Description

The tutorial will cover the main issues for building power models based on hardware counters. Initially we describe the basic options to generate a platform where to build a model (device for getting actual power measurements, sampling the device, getting the measurements). Second, we describe how to breakdown a multicore architecture in components from where to obtain contributions on the overall power consumption. The tutorial describes the set of counters commonly available in last generation multicore architectures and which ones better describe the activity of each architectural component. The tutorial describes how to linearly correlate the activity in each component with actual power. We describe a basic algorithm that uses stochastic methods to build power models. Finally, the tutorial describes how to validate a power model in terms of accuracy and responsiveness. Each step of the tutorial is applied to a current multicore architecture.

Tutorial's web page

Back to schedule

CANCELLED T2: "Power Architecture processors: architecture and performance"

Duration and schedule:

Half Day tutorial (Monday, June 25th 14:00 - 18:00)

Organizers

Jose Moreira (IBM T.J. Watson Research Center, USA)

Description

The Power Systems line of compute servers covers both commercial and scientific computing. This tutorial will discuss the architecture and organization of the latest Power Systems processor, Power7. We will pay particular attention to those features the impact the performance of programs running on Power Systems. We will discuss aspects of the Power7 core, the Power7 chip and of multi-chip systems. We will also discuss compilers, libraries, performance tools and techniques for extracting high-performance of applications in Power Systems. Finally, we will discuss future directions for Power Systems

Tutorial's web page

Back to schedule

CANCELLED T3: Parallel & Cloud Programming with Haskell"

Duration and schedule:

Half Day (Monday, June 25th 9:00 - 13:00)

Organizers

Dr Peter J. Braam (Parallel Scientific, Inc, USA)

Description

Haskell is a state of the art functional programming language which has a very advanced compiler, delivering performance comparable with C.

Several frameworks for parallel programming have been developed (Par, Repa, DPH, Accelerate are among the best known), addressing key demands in this area, such as targeting multiple heterogeneous architectures (GPU's, cores and other accelerators, using Monads and DSL's) and by implementing nearly automatic, deterministic parallelization in the compiler with support for Nested Data Parallel Computing. In the Tutorial we will give examples of how these frameworks can be used to write deterministic parallel code. We will pay attention to the code construction which is simple, but also to some understanding of profiles, where the effects of stream fusion will be discussed.

We will wrap up the tutorial with an overview of algorithms for sparse irregular problems, such as the SpGem (sparse matrix x sparse matrix) algorithms and a Haskell version of the STINGER framework for streaming analytics.

Tutorial's web page

Back to schedule

CANCELLED T4: "CUDA Programming, Profiling and Optimization"

Duration and schedule:

Half Day (Monday, June 25th 14:00 - 19:00)

Organizers

Isaac Gelado (Barcelona Supercomputing Center, ES), Wen-mei Hwu (University of Illinois at Urbana Champaign, USA), Nacho Navarro (Universitat Politecnica de Catalunya/Barcelona Supercomputing Center, ES)

Description

This tutorial will focus on CUDA profiling and optimizations. The goal of this tutorial is to provide attendees with a Swiss Army Knife that will allow them to optimize their CUDA applications.

During the first part of this tutorial, attendees will learn how to use the NVIDIA CUDA profiler, the Extrae library and the Paraver visualization tool to profile their applications. This tutorial is mainly focused to single GPU applications, but we will also cover the basics of multi-GPU and GPU-cluster profiling.

In the second part of this tutorial we will guide attendees through the most common CUDA programming patterns, and we will teach a set of optimization techniques applicable to each pattern. During this part we will use the profiling tools previously explained to show the effect of each optimization technique.

Tutorial's web page

Back to schedule

T5: "Parallel Programming with Cilk Plus using the Intel Compiler and Autotuning"

Duration and schedule:

Half Day (Monday, June 25th 9:00 - 13:00)

Organizers

Kirkegaard, Knud (Intel Corporation, USA)

Description

The latest release of the Intel Compiler (ICC v12.1) supports several extensions to C/C++ for parallel programming, collectively known as Intel® Cilk™ Plus. Cilk keywords provide a straightforward way to convert a sequential program into a multithreaded program, thereby exploiting the thread-level parallelism available on multicore machines. On the other hand, Cilk Plus Array Notation is an expressive method to vectorize a computational kernel in order to exploit the SIMD parallelism within each core. This tutorial serves as an introduction to the extensions available in Intel Cilk Plus. We present a programming methodology based on cache-oblivious techniques, that use the different Intel Cilk Plus extensions together, to achieve high performance on multicore architectures. In addition, we will introduce a newly developed autotuning tool that provides further performance improvement.

Tutorial's web page

Back to schedule

T6: SnuCL: An OpenCL Framework for Heterogeneous CPU/GPU Clusters"

Duration and schedule:

Half Day (Monday, June 25th 14:00 - 18:00)

Organizers

Jaejin Lee (Center for Manycore Programming, Seoul National University, KR)

Description

Open Computing Language (OpenCL) is a programming model for heterogeneous parallel computing systems. OpenCL provides a common abstraction layer across different multicore architectures, such as CPUs, GPUs, DSPs, and Cell BE processors. Programmers can write an OpenCL application once and run it on any OpenCL-compliant system. However, current OpenCL is restricted to a single heterogeneous system. To target heterogenous CPU/GPU clusters, programmers must use the OpenCL framework combining with a communication library, such as MPI. The same thing is true for CUDA.

This tutorial will cover usages and internals of an OpenCL framework, called SnuCL. It naturally extends the original OpenCL semantics to the heterogeneous cluster environment. The target cluster contains multiple CPUs and GPUs in a node. The nodes in the cluster are connected by an interconnection network, such as Gigabit and InfiniBand switches. For such clusters, SnuCL provides an illusion of a single heterogeneous system for the programmer. A GPU or a set of CPU cores becomes an OpenCL compute device. SnuCL allows the application to utilize compute devices in a compute node as if they were in the host node.

With SnuCL, OpenCL applications written for a single heterogeneous system with multiple OpenCL compute devices can run on the cluster without any modification. SnuCL achieves both high performance and ease of programming. In addition, we characterize the performance of an OpenCL implementation (SNU NPB suite) of the NAS Parallel Benchmark suite (NPB) on the target heterogeneous parallel platform. We believe that understanding the performance characteristics of conventional workloads, such as the NPB, with an emerging programming model (i.e., OpenCL) is important for developers and researchers to adopt the programming model. The source code of SnuCL and the SNU NPB suite is available at: http://aces.snu.ac.kr/Center_for_Manycore_Programming/Software.html

Tutorial's web page

Back to schedule

CANCELLED T7: "A practical approach to performance analysis and modeling of large-scale systems"

Duration and schedule:

Half Day (Friday, June 29th 9:00 - 13:00)

Organizers

Adolfy Hoisie, Darrend Kerbyson (Performance and Architecture Lab, Pacific Northwest National Laboratory, USA)

Description

This tutorial presents a practical approach to the performance modeling of large-scale, scientific applications on high performance systems. The defining characteristic of our tutorial involves the description of a proven modeling approach, developed at PAL, of full-blown scientific codes, ranging from a few thousand to over 100,000 lines, that has been validated on systems containing 1,000's of processors. The goal is to impart a detailed understanding of factors contributing to the resulting performance of an application when mapped onto a given HPC platform. Performance modeling is the only technique that can quantitatively elucidate this understanding. We show how models are constructed and demonstrate how they are used to predict, explain, diagnose, and engineer application performance in existing or future codes and/or systems. Notably, our approach does not require the use of specific tools but rather is applicable across commonly used environments. Moreover, since our performance models are parametric in terms of machine and application characteristics, they imbue the user with the ability to "experiment ahead" with different system configurations or algorithms/coding strategies. Both will be demonstrated in studies emphasizing the application of these modeling techniques including: verifying system performance, comparison of large-scale systems, and examination of possible future systems.




Back to schedule

CANCELLED T8: "Designing High-End Computing Systems and Programming Models with InfiniBand and High-speed Ethernet"

Duration and schedule:

Half Day (Friday, June 29th 14:00 - 18:00)

Organizers

Dhabaleswar K. (DK) Panda (The Ohio State University, USA)

Description

Modern network architectures such as IB and HSE have many novel features which were not available in previous networks. Apart from raw performance (8-, 16-, 24-, 32- and 56-Gbps for IB and 10- and 40-Gbps for HSE), these network architectures provide various other features too, such as hardware protocol offload, remote memory access capabilities, hardware multicast, Quality of service, rate control, multi-pathing, fault tolerance and path migration capabilities. Owing to such capabilities, these architectures are quickly being adopted by many scientific, enterprise and cloud computing platforms. Products with varying levels of hardware support are also becoming available. At the same time, multi-core computing platforms with varying architectures are emerging. Thus, current and future network architectures provide new ways to design next generation High-End Computing (HEC) systems and programming models with multi-core architectures.

This tutorial is aimed at bringing different network architectures (with emphasis on IB and HSE) to the audience in a single coherent presentation, focus on their individual strengths and limitations and provide a comparative study between these standards.

Based on these emerging trends and the associated challenges, the goals of this tutorial are as follows:

1. Making the attendees familiar with the IB and HSE architectures and the associated benefits.

2. Demonstrating how the OpenFabrics stack is trying to provide a convergence between these two standards.

3. Providing an overview of available IB and HSE hardware/software solutions.

4. Illustrating sample performance numbers for different programming models showing trends in various environments and how they are taking advantage of IB and HSE features. In summary, the tutorial aims to make the attendees familiar with current high-speed network architectures (with focus on IB and HSE), their benefits, available hardware/software solutions with these standards, the latest trends in designing high-end computing, networking, and storage systems with these standards, and providing a critical assessment of whether these technologies are ready for prime-time or not.

Tutorial's web page

Back to schedule

CANCELLED T9: "Developing Scientific Applications with the Eclipse Parallel Tools Platform"

Duration and schedule:

Half Day (Friday, June 29th 9:00 - 13:00)

Organizers

Greg Watson (IBM, USA), Carsten Karbach (Julich Supercomputing Centre, DE)

Descritpion

Many scientific application developers still use command-line tools and tools with diverse and sometimes confusing user interfaces for the different aspects of their development activities. In contrast, other computing disciplines have employed advanced application development environments, which have demonstrated considerable success in improving productivity, reducing defects, and reducing time-to-market. The Eclipse Parallel Tools Platform (PTP) combines tools for coding, static analysis, revision control, refactoring, debugging, job submission, and more, into a best-practice integrated environment for increasing developer productivity. Leveraging the successful open-source Eclipse platform, PTP helps manage the complexity of scientific code development and optimization on diverse platforms, and provides tools to gain insights into complex code that is otherwise difficult to attain. This tutorial will provide attendees with a hands-on introduction to Eclipse PTP, and will cover installing, configuring, and using PTP for developing scientific applications using a variety of languages and programming models.

Tutorial's web page

Back to schedule

CANCELLED T10: "High Performance Computing in Biomedical Informatics"

Duration and schedule:

Half Day (Friday, June 29th 14:00 - 18:00)

Organizers

Hesham H. Ali (University of Nebraska at Omaha, USA)

Description

The field of Biomedical Informatics has been attracting a lot of attention in recent years. The massive size of the current available biological and medical databases and its high rate of growth have a great influence on the types of research currently conducted and researchers are focusing more than ever to maximize the use of these databases. Hence, it would be of great advantage for researchers to utilize High Performance Computing (HPC) system to explore the data stored in the available databases and extract new information that would lead to better understanding of various biological and medical phenomena.

The Biomedical Informatics domain is rich in applications that require extracting useful information from very large and continuously growing sequence of databases. The marriage between the bioinformatics domain and high performance computing is a natural one; the problems in this domain tend to be highly parallelizable and deal with large datasets, hence using HPC is a natural fit.

In addition, from the IT point-of-view, the problem of efficiently collecting, sharing, mining and analyzing the wealth of information available in a growing set of the biological and clinical data has common roots in many IT applications. This is particularly critical in managing biological and clinical data since relevant data is available in different shapes and forms, and hence, employing all available data to extract meaningful properties is an enormous task. Heterogeneous data, obtained from microarrays, high throughput sequencers, mass spectrometry experiments and clinical records, can all be used to find potential correlations between genes/proteins and the susceptibility to have a particular disease. Addressing these issues require significant computational facilities; hence the need to integrate HPC research. How to efficiently manage the utilization of HPC systems in Biomedical Informatics is quickly emerging as one of the most urgent and critical problems in advancing biomedical research. The proposed tutorial will address these issues with a particular focus on the following objectives:

1- Provide an overview of the exciting disciplines of Biomedical Informatics, including medical, public health and bio informatics with a focus on the computationally intensive data mining/data analysis problems and their growing need for HPC systems.

2- Introduce the main computational problems in biomedical research with a focus on the current available algorithmic tools and address the advantages and the shortcoming of each tool.

3- Introduce the audience to the concept of intelligent data integrating and analysis tools with a focus on the need to incorporate HPC systems. Such tools are critical to leverage data collected from different resources to produce useful information in a timely manner. The success of these tools can further advance biomedical research and has the potential lead to new discoveries directly related to efficiencies and innovations in Healthcare.

Tutorial's web page

Back to schedule