Opening Remarks 9:00-9:05 Reception
Keynote Presentation Session Chair: Wu Feng 9:05-10:00 Reception
Nothing Amuses More Harmlessly Than Computation
Prof. Dr. Michael Klein, member of the National Academy of Sciences and a Fellow of the Royal Society of London
Highlighted Papers Session Chair: Simon McIntosh-Smith 10:30-12:00 Reception
Preparing HPC Applications for the Exascale Era: A Decoupling Strategy
Ivy Peng, Stefano Markidis, Roberto Gioiosa and Gokcen Kestor
An efficient, distributed stochastic gradient descent algorithm for deep-learning applications
Guojing Cong and Onkar Bhardwarj
Large Scale Parallelization of Smoothed Particle Hydrodynamics Method on Heterogeneous Cluster
Yingrui Wang, Leisheng Li and Rong Tian
Graph Analytics and ML Session Chair: Guojing Cong 13:30-15:00 Council
Boosting the efficiency of HPCG and Graph500 with near-data processing
Erik Vermij, Leandro Fiorin, Christoph Hagleitner and Koen Bertels
GCN: GPU-based Cube CNN Framework for Hyperspectral Image Classification
Han Dong, Tao Li, Jiabing Leng, Lingyan Kong and Gang Bai
Nearly Balanced Work Partitioning for Heterogeneous Algorithms
Mallipeddi Hardhik, Dip Sankar Banerjee, Kiran Raj Ramamoorthy, Kishore Kothapalli and Kannan Srinathan
Enhancing Programming Runtime Systems Session Chair: Hans Vandierendonck 13:30-15:00 3.30
GLTO: On the Adequacy of Lightweight Thread Approaches for OpenMP Implementations
Adrián Castelló, Sangmin Seo, Rafael Mayo Gual, Pavan Balaji, Enrique S. Quintana-Orti and Antonio J. Peña
Locality-Aware Dynamic Task Graph Scheduling
Jordyn Maglalang, Sriram Krishnamoorthy and Kunal Agrawal
Practical Experience with Transactional Lock Elision
Tingzhe Zhou, Pantea Zardoshti and Michael Spear
Linear Algebra Algorithms Session Chair: James Lin 13:30-15:00 3.31
Variable-Size Batched LU for Small Matrices and its Integration into Block-Jacobi Preconditioning
Hartwig Anzt, Jack Dongarra, Goran Flegar and Enrique S. Quintana-Orti
High-performance and Memory-saving Sparse General Matrix-Matrix Multiplication for NVIDIA Pascal GPU
Yusuke Nagasaka, Akira Nukada and Satoshi Matsuoka
Constrained Tensor Factorization with Accelerated AO-ADMM
Shaden Smith, Alec Beri and George Karypis
Data and Networks Session Chair: Ram Kesavan 15:30-17:00 Council
Efficient Data Sharing on Heterogeneous Systems
Victor Garcia-Flores, Eduard Ayguade and Antonio J. Peña
HyPPI NoC: Bringing Hybrid Plasmonics to an Opto-Electronic Network-on-Chip
Vikram Narayana, Shuai Sun, Armin Mehrabian, Volker Sorger and Tarek El-Ghazawi
ES2: Aiming at an Optimal Virtual I/O Event Path
Xiaokang Hu, Wang Zhang, Jian Li, Ruhui Ma, Haibing Guan and Feng Wu
GPU & Runtime Systems Session Chair: Pavan Balaji 15:30-17:00 3.30
MPI-GDS: High Performance MPI Designs with GPUDirect-aSync for CPU-GPU Control Flow Decoupling
Akshay Venkatesh, Ching-Hsiang Chu, Khaled Hamidouche, Sreeram Potluri, Davide Rossetti and Dhabaleswar Panda
Efficient and Scalable Multi-Source Streaming Broadcast on GPU Clusters for Deep Learning
Ching-Hsiang Chu, Xiaoyi Lu, Ammar Ahmad Awan, Hari Subramoni, Jahanzeb Hashmi, Bracy Elton and Dhabaleswar Panda
Overlapping Data Transfers with Computation on GPU with Tiles
Burak Bastem, Didem Unat, Weiqun Zhang, Ann Almgren and John Shalf
Graphs and Networks Session Chair: Weitong Cai 15:30-17:00 3.31
Accelerating Graph Analytics by Utilising the Memory Locality of Graph Partitioning
Jiawen Sun, Hans Vandierendonck and Dimitrios Nikolopoulos
Parallel Algorithm for the Computation of Cycles in Relative Neighborhood Graphs
Hari Sundar and Parmeshwar Khurd
High Performance Query Processing for Web Scale RDF Data using BSP Style Communication and Balanced Distribution
Minho Bae, Junho Eum, Donghoon Kim and Sangyoon Oh
Keynote Presentation Session Chair: Daniel Katz 9:00-10:00 Reception
An Overview of Communication Avoiding Algorithms for Dense and Sparse Linear Algebra
Dr. Laura Grigori, who leads Alpines, a joint group between INRIA Paris and J.L. Lions Laboratory, UPMC.
Storage Session Chair: Matthew Curtis-Maury 10:30-12:00 Council
OptiMatch: Enabling an Optimal Match between Green Power and Various Workloads for Renewable-Energy Powered Storage Systems
Xiaoyang Qu, Jiguang Wan, Fengguang Song, Xiaozhao Zhuang, Fei Wu and Changsheng Xie
Favorable Block First: A Comprehensive Cache Scheme to Accelerate Partial Stripe Recovery of Triple Disk Failure Tolerant Arrays
Luyu Li, Houxiang Ji, Chentao Wu, Jie Li and Minyi Guo
Non-sequential Striping for Distributed Storage Systems with Different Redundancy Schemes
Yanwen Xie, Dan Feng and Fang Wang
IO & Cloud Session Chair: Sangyoon Oh 10:30-12:00 3.30
Predicting Response Latency Percentiles for Cloud Object Storage Systems
Yi Su, Dan Feng, Yu Hua and Zhan Shi
WA-Dataspaces: Exploring the Data Staging Abstractions for Wide-Area Distributed Scientific Workflows
Mehmet Fatih Aktaş, Javier Diaz-Montes, Ivan Rodero and Manish Parashar
Scalable Write Allocation in the WAFL File System
Matthew Curtis-Maury, Ram Kesavan and Mrinal Bhattacharjee
Numerical Applications Session Chair: Frank Takes 10:30-12:00 3.31
Parallel Construction of Simultaneous Deterministic Finite Automata on Shared-memory Multicores
Minyoung Jung, Jinwoo Park, Johann Blieberger and Bernd Burgstaller
Parallel Reconstruction of Three Dimensional Magnetohydrodynamic Equilibria in Plasma Confinement Devices
Sudip Seal, Mark Cianciosa, Steven Hirshman, Andreas Wingen, Robert Wilcox and Ezekial Unterberg
Performance Analysis and Optimization of Sparse Matrix-Vector Multiplication on Modern Multi- and Many-Core Processors
Athena Elafrou, Georgios Goumas and Nectarios Koziris
Networks Session Chair: Tarek El-Ghazawi 13:30-15:00 Council
Network aware Multi-user Computation Partitioning in Mobile Edge Clouds
Lei Yang, Zhenyu Wang, Jiannong Cao and Weigang Wu
Fading-Resistant Link Scheduling in Wireless Networks
Chenxi Qiu and Haiying Shen
Order/Radix Problem: Towards Low End-to-End Latency Interconnection Networks
Ryota Yasudo, Michihiro Koibuchi, Koji Nakano, Hiroki Matsutani and Hideharu Amano
Cloud Scheduling Session Chair: Manish Parashar 13:30-15:00 3.30
Dynamic QoS-Aware Controller for Resource Allocation in Lambda Platform
Mohammadreza Hoseinyfarahabady, Javid Taheri, Zahir Tari, Albert Y. Zomaya
CELIA: Cost-time Performance of Elastic Applications on Cloud
Sunimal Rathnayake, Dumitrel Loghin and Yong Meng Teo
The Cloud as an OpenMP Offloading Device
Hervé Yviquel and Guido Araujo
GPU Applications Session Chair: Antonio Pena 13:30-15:00 3.31
Simple and Fast Parallel Algorithms for the Voronoi Maps and the Euclidean Distance Map, with GPU implementations
Takumi Honda, Shinnosuke Yamamoto, Hiroaki Honda, Koji Nakano and Yasuaki Ito
High-Performance Recommender System Training using Co-Clustering on CPU/GPU Clusters
Kubilay Atasu, Thomas Parnell, Celestine Dünner, Michail Vlachos and Haralampos Pozidis
Exploiting GPUs for fast force-directed visualization of large-scale networks
Govert Brinkmann, Kristian Rietveld and Frank Takes
Data & IO Session Chair: Min Si 15:30-17:00 Council
A Coflow-based Co-optimization Framework for High-performance Data Analytics
Long Cheng, Ying Wang, Yulong Pei and Dick Epema
PDS: An I/O-Efficient Scaling Scheme for Parity Declustered Data Layout
Zhipeng Li, Yinlong Xu, Yongkun Li, Chengjin Tian and Youhui Bai
Data Caching in Next Generation Mobile Cloud Services, Online vs. Off-line
Yang Wang, Shuibing He, Xiaopeng Fan and Chengzhong Xu
Computation and Optimization Session Chair: Bernd Burgstaller 15:30-17:00 3.30
Towards Highly Efficient DGEMM on the Emerging SW26010 Many-core Processor
Lijuan Jiang, Chao Yang, Yulong Ao and Wenjing Ma
Optimizations of Two Compute-bound Scientific Kernels on SW26010 Many-core Processor
James Lin, Zhigeng Xu, Akira Nukada, Naoya Maruyama and Satoshi Matsuoka
Bitslice Vectors: A Software Approach to Customizable Data Precision on Processors with SIMD Extensions
Shixiong Xu and David Gregg
Data Analytics Session Chair: Taisuke Boku 15:30-17:00 3.31
Runtime Data Layout Scheduling for Machine Learning Dataset
Yang You
A Machine Learning Approach for Efficient Parallel Simulation of Beam Dynamics on GPUs
Kamesh Arumugam, Desh Ranjan, Mohammad Zubair, Balsa Terzic, Alexander Godunov and Tunazzina Islam
Multiple Pattern Matching for Network Security Applications: Acceleration through Vectorization
Charalampos Stylianopoulos, Magnus Almgren, Olaf Landsiedel and Marina Papatriantafilou
Keynote Presentation Session Chair: Simon McIntosh-Smith 9:00-10:00 Reception
Clouds, Things and Robots: The Transputer Revisited
Prof. David May, CTO and co-founder of XMOS, and Professor of Computer Science at Bristol University, United Kingdom
Graph Algorithms Session Chair: Hari Sundar 10:30-12:00 Council
Parallel Space-Time Kernel Density Estimation
Erik Saule, Dinesh Panchananam, Alexander Hohl, Wenwu Tang and Eric Delmelle
Parallel Algorithm for Single-Source Earliest-Arrival Problem in Temporal Graphs
Peng Ni, Masatoshi Hanai, Wen Jun Tan, Chen Wang and Wentong Cai
Greed is Good: Parallel Algorithms for Bipartite-Graph Partial Coloring on Multicore Architectures
Mustafa Kemal Taş, Kamer Kaya and Erik Saule
Performance & Power Tuning for Heterogeneous Platforms Session Chair: Rong Ge 10:30-12:00 3.30
A Scalable Hierarchical Semi-Separable Library for Heterogeneous Clusters
Isuru Fernando, Sanath Jayasena, Milinda Fernando and Hari Sundar
Autotuning GPU Kernels via Static and Predictive Analysis
Robert Lim, Boyana Norris and Allen Malony
A Pareto Framework for Data Analytics on Heterogeneous Systems: Implications for Green Energy Usage and Performance
Aniket Chakrabarti, Srinivasan Parthasarathy and Christopher Stewart
Various Parallel Algorithms Session Chair: Simon McIntosh-Smith 13:30-15:00 Council
Scheduling independent tasks in parallel under power constraints
Laurent Philippe, Ayham Kassab, Jean-Marc Nicod and Veronika Rehn-Sonigo
A Novel Minimum Time Parallel 2-D Discrete Wavelet Transform Algorithm for General Purpose Processors
Eduardo Moscoso Rubino, Alberto Jose Alvares, Raul Marin Prades and Pedro Sanz Valero
A Parallel TSP-Based Algorithm for Partitioning Graphs
Harshvardhan Das and Subodh Kumar
Resilience & Power Aware Scheduling Session Chair: Federico Silla 13:30-15:00 3.30
E-Storm: Replication-based State Management in Distributed Stream Processing Systems
Xunyun Liu, Aaron Harwood, Shanika Karunasekera, Benjamin Rubinstein and Rajkumar Buyya
Resilience for Stencil Computations with Latent Errors
Aiman Fang, Aurélien Cavelan, Yves Robert and Andrew Chien
Application-Aware Power Coordination on Power Bounded NUMA Multicore Systems
Rong Ge, Pengfei Zou and Xizhou Feng


P2S2 Workshop – Agenda Location: Council
Opening Remarks 8:15
Keynote Presentation
Dr. Antonio Pena, Barcelona Supercomputing Center
Coffee Break 9:30-10:00
Session 2: Scalable Distributed Computing
Session Chair: Min Si, Argonne National Laboratory
Improving Valiant Routing for Slim Fly Networks
Deyu Han, Zhaofeng Wang, David P. Bunde
Efficient Broadcasting Algorithm in Harary-like Networks
Puspal Bhabak, Hovhannes Harutyunyan, Peter Kropf
Programming Model for Developing Supercomputer Combinatorial Solvers
Ghaith Tarawneh, Andrey Mokhov, Matthew Naylor, Alex Rast, Simon W. Moore, David B. Thomas, Alex Yakovlev, Andrew Brown
Efficient Scalable Computing through Flexible Applications and Adaptive Workloads
Sergio Iserte, Rafael Mayo, Enriqie S. Quintano-Orti, Vicenc Beltran, Antonio J. Pena
Lunch Break 12:00-13:30
Session 3: On-node Optimizations
Session Chair: Taisuke Boku, University of Tsukuba
A Region-Aware Multi-Objective Auto-Tuner for Parallel Programs
Klaus Kofler, Juan J. Durillo, Philipp Gschwandtner, Thomas Fahringer
A Parallel Shared-Memory ARchitecture for OWL Ontology Classification
Zixi Quan, Volker Haarslev
Communication-Computation Overlapping with Dynammic Loop Scheduling for Preconditioned Parallel Iterative Soles on Multicore/Manycore Clusters
Kengo Nakajima, Toshihiro Hanawa
OpenMP memkind: An Extension for Heterogeneous Physical Memories
Xi Wang, John D. leidel, Yong Chen
Afternoon Break 15:30-16:00
Session 4: Invited Papers
Session Chair: John D. Leidel, Tactical Computing Laboratories
On a Storage System Software Stack for Extreme Scale Data Centric Computing
Sai Narasimhamurthy
Toward Highly Productive Parallel Programming on Large Scale Accelerated Comuting
Taisuke Boku, Hitoshi Murai, Masahiro Nakoo, Jinpil Lee Mitsuhisa Sato, Akihiro Tabuchi, Keisuke Tsugane
Towards Portable and Adaptable Asynchronous Communication for One-Sided Applications
Min Si, Jeff Hammond, Masamichi Takagi, Yutaka Ishikawa
Integrating Memory Perspective into the BSC Performance Tools
Harald Servat, Jesus Labarta, Hans-Christian Hoppe, Judit Gimenez, Antonio J. Pena
SRMPDS Workshop – Agenda Location: 3.31
Opening Remarks 8:00
Thermal-aware Job Scheduling of MapReduce Applications on High-Performance Clusters
Shubbhi Taneja, Yi Zhou, Mohammed Ibrahim Alghamdi, Xiao Qin
Anticipation Preference-Based Heuristic Scheduling in Grid Virtual Organizations
Victor Toporkov, Dmitry Yemelyanov, Anna Toporkova
Scheduling Optimization in Ophthalmology using Multi-Objective Integer Models
Alejandro Betancourt, Alberto Colino, Angelina Lazaro
Coffee Break 9:30-10:00
Optimizing Memory Management in Deeply Heterogeneous HPC Accelerators
Anna Pupykina, Giovanni Agosta
Blockchain-based multi-level scoring system for P2P clusters
Josef Gattermayer, Pavel Tvrdik
SWAS: Stealing Work using Approximate System-load Information
Stavros Tzilis, Miquel Pericàs, Pedro Trancoso, Ioannis Sourdis
AWASN Workshop – Agenda Location: 3.32
Opening Remarks 10:30
Effect on Group Detection Based on Human Proximity for Human Relationship Extraction in Daily Life
Yuko Hirabe, Manato Fujimoto, Yutaka Arakawa, Hirohiko Suwa, Keiichi Yasumoto
Trajectory Data Cleansing Using HMM
Qin Wang, Min-Te Sun and Kazuya Sakai
EMS Workshop – Agenda Location: 3.32
Opening Remarks 14:00
Using the integrated GPU to improve CPU sort performance
Grigore Lupescu, Emil Slusanschi, Nicolae Tapus
Hierarchical Read/Write Analysis for Pointer-Based OpenCL Programs on RRAM
Lin-Ya Yu, Shao-Chung Wang, Jenq-Kuen Lee
OpenCL 2.0 Compiler Adaptation on LLVM for PTX Simulators
Chun-Chieh Yang, Shao-Chung Wang, Min-Yi Hsu, Yuan-Ming Chang, Yuan-Shin Hwang, Jenq-Kuen Lee
Coffee Break TBC
Embedded Accelerators for Scientific High-Performance Computing: an Energy Study of Gaussian Elimination Workloads
Beau Johnston, Brian Lee, Luke Angove, Alistair Rendell
In-place Irregular Computation for Message-passing Chip-multiprocessors
Zhang Youhui, Zhang Youyang, Li Yanhua, Fei Xiang, Zheng Weimin
BIOHPC Workshop – Agenda Location: 3.30
Opening Remarks 8:00
Implementation of an efficient Blind Docking technique on HPC architectures for the discovery of allosteric inhibitors
Horacio Perez-Sanchez, Jose P. Ceron-Carrasco, Jose M. Cecilia
Heterogeneous Hardware Support in BEAGLE, a High-Performance Computing Library for Statistical Phylogenetics
Daniel L. Ayres, Michael P. Cummings
Coffee Break 9:30-10:00
Parallel Desolvation Energy Term Calculation for Blind Docking on GPU Architectures
Hocine Saadi, Nadia Nouali-Taboudjemat, Abdellatif Rahmoun, Baldomero Imbernón, Horacio Peréz-Sánchez, José M. Cecilia
A Parallel Cellular Automaton Tumor Growth Model with Dynamic Load Balancing for Multicore Programming
Alberto G. Salguero, Manuel I. Capel and Antonio J. Tomeu
PSTI Workshop – Agenda Location: 3.30
Opening Remarks 14:00
An Empirical Evaluation of Design Abstraction and Performance of Thrust Framework
Ajai V. George, Sankar Manoj, Sanket Rajan Gupte, Santonu Sarkar
Performance Analysis and Optimization of the FFTXlib on the Intel Knights Landing Architecture
Michael Wagner, Victor López, Julián Morillo, Carlo Cavazzoni, Fabio Affinito, Judit Giménez, Jesús Labarta
Towards a Better Expressiveness of the Speedup Metric in MPI Context
Jean-Baptiste Besnard, Allen D. Malony, Sameer Shende, Marc Pérache, Patrick Carribault, Julien Jaeger
HPC4BD Workshop – Agenda Location: 3.33
Opening Remarks 8:30
An Efficient Filter Strategy for Theta-Join Query in Distributed Environment
Wenjie Liu, Zhanhuai Li, Yuntao Zhou
Exploiting key-value data stores scalability for HPC
Cesare Cugnasco, Yolanda Becerra, Jordi Torres, Eduard Ayguadé
Coffee Break 9:30-10:00
Deriving Highly Efficient Implementations of Parallel PageRank
Bart van Strien, Kristian Rietveld, Harry Wijshoff
EDDS: An Enhanced Density-based Method for Evolutionary of Clustering Data Streams
Ammar Thaher Yaseen Al Abd Alazeez, Hongbo Du, Sabah Jassim
HUCAA Workshop – Agenda Location: 3.33
Opening Remarks 14:00
An Improved Abstract GPU Model with Data Transfer
Thomas C. Carroll, Prudence W.H. Wong
A Comparative Performance Analysis of Remote GPU Virtualization over Three Generations of GPUs
Carlos Reano, Federico Silla
Coffee Break 9:30-10:00
Towards a Scalable and Adaptable Resource Allocation Framework in Cloud Environments
Huanhuan Xiong, Christos Filelis-Papadopoulos, Dapeng Dong, Gabriel G. Castane, John P. Morrison
SharP: Towards Programming Extreme-Scale Systems with Hierarchical Heterogeneous Memory
Manjunath Gorentla Venkata, Ferrol Aderholdt, Zachary Parchman
Turning GPUs into Floating Devices over The Cluster: The Beauty of GPU Migration
Javier Prades, Federico Silla

