Opening Remarks | 9:00-9:05 | Reception | |
Keynote Presentation | Session Chair: Wu Feng | 9:05-10:00 | Reception |
Nothing Amuses More Harmlessly Than Computation Prof. Dr. Michael Klein, member of the National Academy of Sciences and a Fellow of the Royal Society of London |
|||
Highlighted Papers | Session Chair: Simon McIntosh-Smith | 10:30-12:00 | Reception |
Preparing HPC Applications for the Exascale Era: A Decoupling Strategy Ivy Peng, Stefano Markidis, Roberto Gioiosa and Gokcen Kestor |
|||
An efficient, distributed stochastic gradient descent algorithm for deep-learning applications Guojing Cong and Onkar Bhardwarj |
|||
Large Scale Parallelization of Smoothed Particle Hydrodynamics Method on Heterogeneous Cluster Yingrui Wang, Leisheng Li and Rong Tian |
|||
Graph Analytics and ML | Session Chair: Guojing Cong | 13:30-15:00 | Council |
Boosting the efficiency of HPCG and Graph500 with near-data processing Erik Vermij, Leandro Fiorin, Christoph Hagleitner and Koen Bertels |
|||
GCN: GPU-based Cube CNN Framework for Hyperspectral Image Classification Han Dong, Tao Li, Jiabing Leng, Lingyan Kong and Gang Bai |
|||
Nearly Balanced Work Partitioning for Heterogeneous Algorithms Mallipeddi Hardhik, Dip Sankar Banerjee, Kiran Raj Ramamoorthy, Kishore Kothapalli and Kannan Srinathan |
|||
Enhancing Programming Runtime Systems | Session Chair: Hans Vandierendonck | 13:30-15:00 | 3.30 |
GLTO: On the Adequacy of Lightweight Thread Approaches for OpenMP Implementations Adrián Castelló, Sangmin Seo, Rafael Mayo Gual, Pavan Balaji, Enrique S. Quintana-Orti and Antonio J. Peña |
|||
Locality-Aware Dynamic Task Graph Scheduling Jordyn Maglalang, Sriram Krishnamoorthy and Kunal Agrawal |
|||
Practical Experience with Transactional Lock Elision Tingzhe Zhou, Pantea Zardoshti and Michael Spear |
|||
Linear Algebra Algorithms | Session Chair: James Lin | 13:30-15:00 | 3.31 |
Variable-Size Batched LU for Small Matrices and its Integration into Block-Jacobi Preconditioning
Hartwig Anzt, Jack Dongarra, Goran Flegar and Enrique S. Quintana-Orti | |||
High-performance and Memory-saving Sparse General Matrix-Matrix Multiplication for NVIDIA Pascal GPU
Yusuke Nagasaka, Akira Nukada and Satoshi Matsuoka | |||
Constrained Tensor Factorization with Accelerated AO-ADMM
Shaden Smith, Alec Beri and George Karypis | |||
Data and Networks | Session Chair: Ram Kesavan | 15:30-17:00 | Council |
Efficient Data Sharing on Heterogeneous Systems
Victor Garcia-Flores, Eduard Ayguade and Antonio J. Peña | |||
HyPPI NoC: Bringing Hybrid Plasmonics to an Opto-Electronic Network-on-Chip
Vikram Narayana, Shuai Sun, Armin Mehrabian, Volker Sorger and Tarek El-Ghazawi | |||
ES2: Aiming at an Optimal Virtual I/O Event Path
Xiaokang Hu, Wang Zhang, Jian Li, Ruhui Ma, Haibing Guan and Feng Wu | |||
GPU & Runtime Systems | Session Chair: Pavan Balaji | 15:30-17:00 | 3.30 |
MPI-GDS: High Performance MPI Designs with GPUDirect-aSync for CPU-GPU Control Flow Decoupling
Akshay Venkatesh, Ching-Hsiang Chu, Khaled Hamidouche, Sreeram Potluri, Davide Rossetti and Dhabaleswar Panda | |||
Efficient and Scalable Multi-Source Streaming Broadcast on GPU Clusters for Deep Learning
Ching-Hsiang Chu, Xiaoyi Lu, Ammar Ahmad Awan, Hari Subramoni, Jahanzeb Hashmi, Bracy Elton and Dhabaleswar Panda | |||
Overlapping Data Transfers with Computation on GPU with Tiles
Burak Bastem, Didem Unat, Weiqun Zhang, Ann Almgren and John Shalf | |||
Graphs and Networks | Session Chair: Weitong Cai | 15:30-17:00 | 3.31 |
Accelerating Graph Analytics by Utilising the Memory Locality of Graph Partitioning
Jiawen Sun, Hans Vandierendonck and Dimitrios Nikolopoulos | |||
Parallel Algorithm for the Computation of Cycles in Relative Neighborhood Graphs
Hari Sundar and Parmeshwar Khurd | |||
High Performance Query Processing for Web Scale RDF Data using BSP Style Communication and Balanced Distribution
Minho Bae, Junho Eum, Donghoon Kim and Sangyoon Oh | |||
Drinks Reception and Awards | Session Chair: Simon McIntosh-Smith | 17:00-19:30 | Reception |
Keynote Presentation | Session Chair: Daniel Katz | 9:00-10:00 | Reception |
An Overview of Communication Avoiding Algorithms for Dense and Sparse Linear Algebra Dr. Laura Grigori, who leads Alpines, a joint group between INRIA Paris and J.L. Lions Laboratory, UPMC. | |||
Storage | Session Chair: Matthew Curtis-Maury | 10:30-12:00 | Council |
OptiMatch: Enabling an Optimal Match between Green Power and Various Workloads for
Renewable-Energy Powered Storage Systems
Xiaoyang Qu, Jiguang Wan, Fengguang Song, Xiaozhao Zhuang, Fei Wu and Changsheng Xie | |||
Favorable Block First: A Comprehensive Cache Scheme to Accelerate Partial Stripe
Recovery of Triple Disk Failure Tolerant Arrays
Luyu Li, Houxiang Ji, Chentao Wu, Jie Li and Minyi Guo | |||
Non-sequential Striping for Distributed Storage Systems with Different Redundancy Schemes
Yanwen Xie, Dan Feng and Fang Wang | |||
IO & Cloud | Session Chair: Sangyoon Oh | 10:30-12:00 | 3.30 |
Predicting Response Latency Percentiles for Cloud Object Storage Systems
Yi Su, Dan Feng, Yu Hua and Zhan Shi | |||
WA-Dataspaces: Exploring the Data Staging Abstractions for Wide-Area Distributed Scientific Workflows
Mehmet Fatih Aktaş, Javier Diaz-Montes, Ivan Rodero and Manish Parashar | |||
Scalable Write Allocation in the WAFL File System
Matthew Curtis-Maury, Ram Kesavan and Mrinal Bhattacharjee | |||
Numerical Applications | Session Chair: Frank Takes | 10:30-12:00 | 3.31 |
Parallel Construction of Simultaneous Deterministic Finite Automata on Shared-memory Multicores
Minyoung Jung, Jinwoo Park, Johann Blieberger and Bernd Burgstaller | |||
Parallel Reconstruction of Three Dimensional Magnetohydrodynamic Equilibria in Plasma Confinement Devices
Sudip Seal, Mark Cianciosa, Steven Hirshman, Andreas Wingen, Robert Wilcox and Ezekial Unterberg | |||
Performance Analysis and Optimization of Sparse Matrix-Vector Multiplication on
Modern Multi- and Many-Core Processors
Athena Elafrou, Georgios Goumas and Nectarios Koziris | |||
Networks | Session Chair: Tarek El-Ghazawi | 13:30-15:00 | Council |
Network aware Multi-user Computation Partitioning in Mobile Edge Clouds
Lei Yang, Zhenyu Wang, Jiannong Cao and Weigang Wu | |||
Fading-Resistant Link Scheduling in Wireless Networks
Chenxi Qiu and Haiying Shen | |||
Order/Radix Problem: Towards Low End-to-End Latency Interconnection Networks
Ryota Yasudo, Michihiro Koibuchi, Koji Nakano, Hiroki Matsutani and Hideharu Amano | |||
Cloud Scheduling | Session Chair: Manish Parashar | 13:30-15:00 | 3.30 |
Dynamic QoS-Aware Controller for Resource Allocation in Lambda Platform
Mohammadreza Hoseinyfarahabady, Javid Taheri, Zahir Tari, Albert Y. Zomaya | |||
CELIA: Cost-time Performance of Elastic Applications on Cloud
Sunimal Rathnayake, Dumitrel Loghin and Yong Meng Teo | |||
The Cloud as an OpenMP Offloading Device
Hervé Yviquel and Guido Araujo | |||
GPU Applications | Session Chair: Antonio Pena | 13:30-15:00 | 3.31 |
Simple and Fast Parallel Algorithms for the Voronoi Maps and the Euclidean Distance Map, with GPU implementations
Takumi Honda, Shinnosuke Yamamoto, Hiroaki Honda, Koji Nakano and Yasuaki Ito | |||
High-Performance Recommender System Training using Co-Clustering on CPU/GPU Clusters
Kubilay Atasu, Thomas Parnell, Celestine Dünner, Michail Vlachos and Haralampos Pozidis | |||
Exploiting GPUs for fast force-directed visualization of large-scale networks
Govert Brinkmann, Kristian Rietveld and Frank Takes | |||
Data & IO | Session Chair: Min Si | 15:30-17:00 | Council |
A Coflow-based Co-optimization Framework for High-performance Data Analytics
Long Cheng, Ying Wang, Yulong Pei and Dick Epema | |||
PDS: An I/O-Efficient Scaling Scheme for Parity Declustered Data Layout
Zhipeng Li, Yinlong Xu, Yongkun Li, Chengjin Tian and Youhui Bai | |||
Data Caching in Next Generation Mobile Cloud Services, Online vs. Off-line
Yang Wang, Shuibing He, Xiaopeng Fan and Chengzhong Xu | |||
Computation and Optimization | Session Chair: Bernd Burgstaller | 15:30-17:00 | 3.30 |
Towards Highly Efficient DGEMM on the Emerging SW26010 Many-core Processor
Lijuan Jiang, Chao Yang, Yulong Ao and Wenjing Ma | |||
Optimizations of Two Compute-bound Scientific Kernels on SW26010 Many-core Processor
James Lin, Zhigeng Xu, Akira Nukada, Naoya Maruyama and Satoshi Matsuoka | |||
Bitslice Vectors: A Software Approach to Customizable Data Precision on Processors with SIMD Extensions
Shixiong Xu and David Gregg | |||
Data Analytics | Session Chair: Taisuke Boku | 15:30-17:00 | 3.31 |
Runtime Data Layout Scheduling for Machine Learning Dataset
Yang You | |||
A Machine Learning Approach for Efficient Parallel Simulation of Beam Dynamics on GPUs
Kamesh Arumugam, Desh Ranjan, Mohammad Zubair, Balsa Terzic, Alexander Godunov and Tunazzina Islam | |||
Multiple Pattern Matching for Network Security Applications: Acceleration through Vectorization
Charalampos Stylianopoulos, Magnus Almgren, Olaf Landsiedel and Marina Papatriantafilou | |||
Conference Dinner | 18:45-21:30 | Great Hall |
Keynote Presentation | Session Chair: Simon McIntosh-Smith | 9:00-10:00 | Reception |
Clouds, Things and Robots: The Transputer Revisited Prof. David May, CTO and co-founder of XMOS, and Professor of Computer Science at Bristol University, United Kingdom |
Keynote Slides: [pdf] | ||
Graph Algorithms | Session Chair: Hari Sundar | 10:30-12:00 | Council |
Parallel Space-Time Kernel Density Estimation
Erik Saule, Dinesh Panchananam, Alexander Hohl, Wenwu Tang and Eric Delmelle | |||
Parallel Algorithm for Single-Source Earliest-Arrival Problem in Temporal Graphs
Peng Ni, Masatoshi Hanai, Wen Jun Tan, Chen Wang and Wentong Cai | |||
Greed is Good: Parallel Algorithms for Bipartite-Graph Partial Coloring on Multicore Architectures
Mustafa Kemal Taş, Kamer Kaya and Erik Saule | |||
Performance & Power Tuning for Heterogeneous Platforms | Session Chair: Rong Ge | 10:30-12:00 | 3.30 |
A Scalable Hierarchical Semi-Separable Library for Heterogeneous Clusters
Isuru Fernando, Sanath Jayasena, Milinda Fernando and Hari Sundar | |||
Autotuning GPU Kernels via Static and Predictive Analysis
Robert Lim, Boyana Norris and Allen Malony | |||
A Pareto Framework for Data Analytics on Heterogeneous Systems: Implications for Green Energy Usage and Performance
Aniket Chakrabarti, Srinivasan Parthasarathy and Christopher Stewart | |||
Various Parallel Algorithms | Session Chair: Simon McIntosh-Smith | 13:30-15:00 | Council |
Scheduling independent tasks in parallel under power constraints
Laurent Philippe, Ayham Kassab, Jean-Marc Nicod and Veronika Rehn-Sonigo | |||
A Novel Minimum Time Parallel 2-D Discrete Wavelet Transform Algorithm for General Purpose Processors
Eduardo Moscoso Rubino, Alberto Jose Alvares, Raul Marin Prades and Pedro Sanz Valero | |||
A Parallel TSP-Based Algorithm for Partitioning Graphs
Harshvardhan Das and Subodh Kumar | |||
Resilience & Power Aware Scheduling | Session Chair: Federico Silla | 13:30-15:00 | 3.30 |
E-Storm: Replication-based State Management in Distributed Stream Processing Systems
Xunyun Liu, Aaron Harwood, Shanika Karunasekera, Benjamin Rubinstein and Rajkumar Buyya | |||
Resilience for Stencil Computations with Latent Errors
Aiman Fang, Aurélien Cavelan, Yves Robert and Andrew Chien | |||
Application-Aware Power Coordination on Power Bounded NUMA Multicore Systems
Rong Ge, Pengfei Zou and Xizhou Feng |
P2S2 Workshop – Agenda | Location: Council |
Opening Remarks | 8:15 |
Keynote Presentation Dr. Antonio Pena, Barcelona Supercomputing Center |
|
Coffee Break | 9:30-10:00 |
Session 2: Scalable Distributed Computing Session Chair: Min Si, Argonne National Laboratory |
|
Improving Valiant Routing for Slim Fly Networks Deyu Han, Zhaofeng Wang, David P. Bunde |
|
Efficient Broadcasting Algorithm in Harary-like Networks Puspal Bhabak, Hovhannes Harutyunyan, Peter Kropf |
|
Programming Model for Developing Supercomputer Combinatorial Solvers Ghaith Tarawneh, Andrey Mokhov, Matthew Naylor, Alex Rast, Simon W. Moore, David B. Thomas, Alex Yakovlev, Andrew Brown |
|
Efficient Scalable Computing through Flexible Applications and Adaptive Workloads Sergio Iserte, Rafael Mayo, Enriqie S. Quintano-Orti, Vicenc Beltran, Antonio J. Pena |
|
Lunch Break | 12:00-13:30 |
Session 3: On-node Optimizations Session Chair: Taisuke Boku, University of Tsukuba |
|
A Region-Aware Multi-Objective Auto-Tuner for Parallel Programs Klaus Kofler, Juan J. Durillo, Philipp Gschwandtner, Thomas Fahringer |
|
A Parallel Shared-Memory ARchitecture for OWL Ontology Classification Zixi Quan, Volker Haarslev |
|
Communication-Computation Overlapping with Dynammic Loop Scheduling for Preconditioned
Parallel Iterative Soles on Multicore/Manycore Clusters Kengo Nakajima, Toshihiro Hanawa |
|
OpenMP memkind: An Extension for Heterogeneous Physical Memories Xi Wang, John D. leidel, Yong Chen |
|
Afternoon Break | 15:30-16:00 |
Session 4: Invited Papers Session Chair: John D. Leidel, Tactical Computing Laboratories |
|
On a Storage System Software Stack for Extreme Scale Data Centric Computing Sai Narasimhamurthy |
|
Toward Highly Productive Parallel Programming on Large Scale Accelerated Comuting Taisuke Boku, Hitoshi Murai, Masahiro Nakoo, Jinpil Lee Mitsuhisa Sato, Akihiro Tabuchi, Keisuke Tsugane |
|
Towards Portable and Adaptable Asynchronous Communication for One-Sided Applications Min Si, Jeff Hammond, Masamichi Takagi, Yutaka Ishikawa |
|
Integrating Memory Perspective into the BSC Performance Tools Harald Servat, Jesus Labarta, Hans-Christian Hoppe, Judit Gimenez, Antonio J. Pena |
SRMPDS Workshop – Agenda | Location: 3.31 |
Opening Remarks | 8:00 |
Thermal-aware Job Scheduling of MapReduce Applications on High-Performance Clusters Shubbhi Taneja, Yi Zhou, Mohammed Ibrahim Alghamdi, Xiao Qin |
|
Anticipation Preference-Based Heuristic Scheduling in Grid Virtual Organizations Victor Toporkov, Dmitry Yemelyanov, Anna Toporkova |
|
Scheduling Optimization in Ophthalmology using Multi-Objective Integer Models Alejandro Betancourt, Alberto Colino, Angelina Lazaro |
|
Coffee Break | 9:30-10:00 |
Optimizing Memory Management in Deeply Heterogeneous HPC Accelerators Anna Pupykina, Giovanni Agosta |
|
Blockchain-based multi-level scoring system for P2P clusters Josef Gattermayer, Pavel Tvrdik |
|
SWAS: Stealing Work using Approximate System-load Information Stavros Tzilis, Miquel Pericàs, Pedro Trancoso, Ioannis Sourdis |
AWASN Workshop – Agenda | Location: 3.32 |
Opening Remarks | 10:30 |
Effect on Group Detection Based on Human Proximity for Human Relationship
Extraction in Daily Life Yuko Hirabe, Manato Fujimoto, Yutaka Arakawa, Hirohiko Suwa, Keiichi Yasumoto |
|
Trajectory Data Cleansing Using HMM Qin Wang, Min-Te Sun and Kazuya Sakai |
EMS Workshop – Agenda | Location: 3.32 |
Opening Remarks | 14:00 |
Using the integrated GPU to improve CPU sort performance Grigore Lupescu, Emil Slusanschi, Nicolae Tapus |
|
Hierarchical Read/Write Analysis for Pointer-Based OpenCL Programs on RRAM Lin-Ya Yu, Shao-Chung Wang, Jenq-Kuen Lee |
|
OpenCL 2.0 Compiler Adaptation on LLVM for PTX Simulators Chun-Chieh Yang, Shao-Chung Wang, Min-Yi Hsu, Yuan-Ming Chang, Yuan-Shin Hwang, Jenq-Kuen Lee |
|
Coffee Break | TBC |
Embedded Accelerators for Scientific High-Performance Computing: an Energy Study of
Gaussian Elimination Workloads Beau Johnston, Brian Lee, Luke Angove, Alistair Rendell |
|
In-place Irregular Computation for Message-passing Chip-multiprocessors Zhang Youhui, Zhang Youyang, Li Yanhua, Fei Xiang, Zheng Weimin |
BIOHPC Workshop – Agenda | Location: 3.30 |
Opening Remarks | 8:00 |
Implementation of an efficient Blind Docking technique on HPC architectures for the discovery of
allosteric inhibitors Horacio Perez-Sanchez, Jose P. Ceron-Carrasco, Jose M. Cecilia |
|
Heterogeneous Hardware Support in BEAGLE, a High-Performance Computing Library
for Statistical Phylogenetics Daniel L. Ayres, Michael P. Cummings |
|
Coffee Break | 9:30-10:00 |
Parallel Desolvation Energy Term Calculation for Blind Docking on GPU Architectures Hocine Saadi, Nadia Nouali-Taboudjemat, Abdellatif Rahmoun, Baldomero Imbernón, Horacio Peréz-Sánchez, José M. Cecilia |
|
A Parallel Cellular Automaton Tumor Growth Model with Dynamic Load Balancing for Multicore Programming Alberto G. Salguero, Manuel I. Capel and Antonio J. Tomeu |
PSTI Workshop – Agenda | Location: 3.30 |
Opening Remarks | 14:00 |
An Empirical Evaluation of Design Abstraction and Performance of Thrust Framework Ajai V. George, Sankar Manoj, Sanket Rajan Gupte, Santonu Sarkar |
|
Performance Analysis and Optimization of the FFTXlib on the Intel
Knights Landing Architecture Michael Wagner, Victor López, Julián Morillo, Carlo Cavazzoni, Fabio Affinito, Judit Giménez, Jesús Labarta |
|
Towards a Better Expressiveness of the Speedup Metric in MPI Context Jean-Baptiste Besnard, Allen D. Malony, Sameer Shende, Marc Pérache, Patrick Carribault, Julien Jaeger |
HPC4BD Workshop – Agenda | Location: 3.33 |
Opening Remarks | 8:30 |
An Efficient Filter Strategy for Theta-Join Query in Distributed Environment Wenjie Liu, Zhanhuai Li, Yuntao Zhou |
|
Exploiting key-value data stores scalability for HPC Cesare Cugnasco, Yolanda Becerra, Jordi Torres, Eduard Ayguadé |
|
Coffee Break | 9:30-10:00 |
Deriving Highly Efficient Implementations of Parallel PageRank Bart van Strien, Kristian Rietveld, Harry Wijshoff |
|
EDDS: An Enhanced Density-based Method for Evolutionary of Clustering Data Streams Ammar Thaher Yaseen Al Abd Alazeez, Hongbo Du, Sabah Jassim |
HUCAA Workshop – Agenda | Location: 3.33 |
Opening Remarks | 14:00 |
An Improved Abstract GPU Model with Data Transfer Thomas C. Carroll, Prudence W.H. Wong |
|
A Comparative Performance Analysis of Remote GPU Virtualization over
Three Generations of GPUs Carlos Reano, Federico Silla |
|
Coffee Break | 9:30-10:00 |
Towards a Scalable and Adaptable Resource Allocation Framework in Cloud Environments Huanhuan Xiong, Christos Filelis-Papadopoulos, Dapeng Dong, Gabriel G. Castane, John P. Morrison |
|
SharP: Towards Programming Extreme-Scale Systems with Hierarchical Heterogeneous Memory Manjunath Gorentla Venkata, Ferrol Aderholdt, Zachary Parchman |
|
Turning GPUs into Floating Devices over The Cluster: The Beauty of GPU Migration Javier Prades, Federico Silla |