Changes between Version 14 and Version 15 of WikiStart


Ignore:
Timestamp:
11/09/20 13:25:52 (3 years ago)
Author:
apope
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • WikiStart

    v14 v15  
    1 = MOVED TO: [https://xgitlab.cels.anl.gov/hacc/genericio/-/wikis/home xgitlab GenericIO wiki] = 
     1= MOVED TO: [https://xgitlab.cels.anl.gov/hacc/genericio/ xgitlab GenericIO] = 
    22 
    3 = GenericIO = 
    4  
    5 GenericIO is a write-optimized library for writing self-describing scientific data files on large-scale parallel file systems. 
    6  
    7 == References == 
    8  
    9 Habib, et al., HACC: Simulating Future Sky Surveys on State-of-the-Art Supercomputing Architectures, New Astronomy, 2015 
    10 [http://arxiv.org/abs/1410.2805]. 
    11  
    12 == Source Code == 
    13  
    14 A source archive is available here: [http://www.mcs.anl.gov/~turam/genericio/genericio-20190417.tar.gz genericio-20190417.tar.gz] (previous releases:  [http://www.alcf.anl.gov/~hfinkel/genericio/genericio-20170925.tar.gz genericio-20170925.tar.gz] [http://www.alcf.anl.gov/~hfinkel/genericio/genericio-20160829.tar.gz genericio-20160829.tar.gz] [http://www.alcf.anl.gov/~hfinkel/genericio/genericio-20150608.tar.gz genericio-20160412.tar.gz] [http://www.alcf.anl.gov/~hfinkel/genericio/genericio-20150608.tar.gz genericio-20150608.tar.gz]), or from git: 
    15  
    16 {{{ 
    17   git clone http://git.mcs.anl.gov/genericio.git 
    18 }}} 
    19  
    20 == Output file partitions (subfiles) == 
    21  
    22 If you're running on an IBM BG/Q supercomputer, then the number of subfiles (partitions) chosen is based on the I/O nodes in an automatic way. Otherwise, by default, the GenericIO library picks the number of subfiles based on a fairly-naive hostname-based hashing scheme. This works reasonably-well on small clusters, but not on larger systems. On a larger system, you might want to set these environmental variables: 
    23  
    24 {{{ 
    25   GENERICIO_PARTITIONS_USE_NAME=0 
    26   GENERICIO_RANK_PARTITIONS=256 
    27 }}} 
    28  
    29 Where the number of partitions (256 above) determines the number of subfiles used. If you're using a Lustre file system, for example, an optimal number of files is: 
    30  
    31   # of files * stripe count  ~ # OSTs 
    32  
    33 On Titan, for example, there are 1008 OSTs, and a default stripe count of 4, so we use approximately 256 files. 
    34  
    35 == Benchmarks == 
    36  
    37 Once you build the library and associated programs (using make), you can run, for example: 
    38  
    39 {{{ 
    40   $ mpirun -np 8 ./mpi/GenericIOBenchmarkWrite /tmp/out.gio 123456 2 
    41   Wrote 9 variables to /tmp/out (4691036 bytes) in 0.2361s: 18.9484 MB/s 
    42 }}} 
    43  
    44 {{{ 
    45   $ mpirun -np 8 ./mpi/GenericIOBenchmarkRead /tmp/out.gio 
    46   Read 9 variables from /tmp/out (4688028 bytes) in 0.223067s: 20.0426 MB/s [excluding header read] 
    47 }}} 
    48  
    49 The read benchmark always reads all of the input data. The output benchmark takes two numerical parameters, one if the number of data rows to write, and the second is a random seed (which slightly perturbs the per-rank output sizes, but not by much). Each row is 36 bytes for these benchmarks. 
    50  
    51 The write benchmark can be passed the -c parameter to enable output compression. Both benchmarks take an optional -a parameter to request that homogeneous aggregates (i.e. "float4") be used instead of using separate arrays for each position/velocity component. 
    52  
    53 == Python module == 
    54  
    55 The repository includes a genericio Python module that can read genericio-formatted files and return numpy arrays. This is included in the standard build. To use it, once you've built genericio, you can read genericio data as follows: 
    56  
    57 {{{ 
    58 $ export PYTHONPATH=${GENERICIO_DIR}/python 
    59 $ python 
    60 >>> import genericio 
    61 >>> genericio.gio_inspect('m000-99.fofproperties') 
    62 Number of Elements: 1691 
    63 [data type] Variable name 
    64 --------------------------------------------- 
    65 [i 32] fof_halo_count 
    66 [i 64] fof_halo_tag 
    67 [f 32] fof_halo_mass 
    68 [f 32] fof_halo_mean_x 
    69 [f 32] fof_halo_mean_y 
    70 [f 32] fof_halo_mean_z 
    71 [f 32] fof_halo_mean_vx 
    72 [f 32] fof_halo_mean_vy 
    73 [f 32] fof_halo_mean_vz 
    74 [f 32] fof_halo_vel_disp 
    75  
    76 (i=integer,f=floating point, number bits size) 
    77 >>> genericio.gio_read('m000-99.fofproperties','fof_halo_mass') 
    78 array([[  4.58575588e+13], 
    79        [  5.00464689e+13], 
    80        [  5.07078771e+12], 
    81        ...,  
    82        [  1.35221006e+13], 
    83        [  5.29125710e+12], 
    84        [  7.12849857e+12]], dtype=float32) 
    85  
    86 }}}