= GenericIO =

GenericIO is a write-optimized library for writing self-describing scientific data files on large-scale parallel file systems.

== References ==

Habib, et al., HACC: Simulating Future Sky Surveys on State-of-the-Art Supercomputing Architectures, New Astronomy, 2015
[http://arxiv.org/abs/1410.2805].

== Source Code ==

A source archive is available here: [http://www.alcf.anl.gov/~hfinkel/genericio/genericio-20150608.tar.gz genericio-20160412.tar.gz] (last release:  [http://www.alcf.anl.gov/~hfinkel/genericio/genericio-20150608.tar.gz genericio-20150608.tar.gz]), or from git:

{{{
  git clone http://git.mcs.anl.gov/genericio.git
}}}

== Output file partitions (subfiles) ==

If you're running on an IBM BG/Q supercomputer, then the number of subfiles (partitions) chosen is based on the I/O nodes in an automatic way. Otherwise, by default, the GenericIO library picks the number of subfiles based on a fairly-naive hostname-based hashing scheme. This works reasonably-well on small clusters, but not on larger systems. On a larger system, you might want to set these environmental variables:

{{{
  GENERICIO_PARTITIONS_USE_NAME=0
  GENERICIO_RANK_PARTITIONS=256
}}}

Where the number of partitions (256 above) determines the number of subfiles used. If you're using a Lustre file system, for example, an optimal number of files is:

  # of files * stripe count  ~ # OSTs

On Titan, for example, there are 1008 OSTs, and a default stripe count of 4, so we use approximately 256 files.