Last modified 6 years ago Last modified on 12/08/11 11:00:00

Projection and Modeling for Applications and Systems at Exascale (PROMISE)

The goal is to model both applications and systems to understand behavior, detect bottleneck, design and optimize performance for both applications and systems.

The Grand Question

  • What system design would meet the demand of exascale applications?
  • How applications can adapt to an exascale system?

The Grand Approach

  • Build Application models
  • Build system models
  • Bridge the two, allow quick application transformation and system redesign, project the performance.

The Grand Challenges

  • What not to model and cannot be modeled?
  • What's the modeling accuracy and what's the REQUIRED modeling accuracy?
  • How to design the framework?

Exhibit Questions

  • What's the projected performance of a kernel/a set of related kernels?
  • How much performance can be gained by using both CPU and GPU to calculate a kernel?
  • For hierarchical parallelism (with both coarse grained and fine grained parallelism), how to decompose the application to a parallel system?
  • How much performance can be gained from a larger cache?
  • What's the interplay between multiple kernels? How many kernels do we need to port to reduce CPU/GPU transfer?
  • How much benefit would overlap data transfer with GPU computation benefit?
  • How to schedule/distribute tasks among heterogeneous nodes?
  • Modeling and projection may vary for different inputs. Can we draw a decision tree where we say if input>N, then use this code, otherwise use the other code/system?

Application Modeling

The goal is to expose parallelism, computation intensity, and patterns in data accesses and control flow, so that we can estimate its behavior and potential transformations when adapting to an underlying system.


  • GFMC
  • CFD
  • Milc
  • Flash

Levels of Modeling

  • individual kernel
  • multiple kernels
  • individual thread
  • individual node / inter-core
  • inter-node

Details to Model

  • Data flow
  • Control flow
  • Communication patterns (part of data flow?)
  • Computation intensity
  • Parallelism

Modeling Challenges

  • To what degree shall we model? Algorithm, data structure, or even register usage?
  • How to narrow down the search space?
  • Can we make the model modular/object oriented?


  • How to model multiple kernels and their data input/output (segment the code into CPU part and GPU part)
  • Model the effects of double precision? control flow? cache accesses? registers?

System Modeling

Modules to be modeled

  • CPU (B/G, Intel): instruction mix, memory, register cache, interconnect
  • GPU / multi-GPU
  • I/O
  • network, GPU direct
  • A cluster of CPUs and GPUs

Modeling challenges

  • What to be modeled ? What are the factors that are sensitive to performance?


  • How to model caching effect in GPU?
  • How to mix Vitali's empirical model with the structural GPU model?
  • Can we extend Vitali's model to model a different CPU architecture?