######################################################################## This is the DARPA/DOE HPC Challenge Benchmark version 1.2.0 October 2003 Produced by Jack Dongarra and Piotr Luszczek Innovative Computing Laboratory University of Tennessee Knoxville and Oak Ridge National Laboratory See the source files for authors of specific codes. Compiled on May 11 2008 at 10:53:06 Current time (1210525032) is Mon May 12 01:57:12 2008 Hostname: 'pcc01' ######################################################################## ============================================================================ HPLinpack 1.0a -- High-Performance Linpack benchmark -- January 20, 2004 Written by A. Petitet and R. Clint Whaley, Innovative Computing Labs., UTK ============================================================================ An explanation of the input/output parameters follows: T/V : Wall time / encoded variant. N : The order of the coefficient matrix A. NB : The partitioning blocking factor. P : The number of process rows. Q : The number of process columns. Time : Time in seconds to solve the linear system. Gflops : Rate of execution for solving the linear system. The following parameter values will be used: N : 7500 NB : 80 PMAP : Row-major process mapping P : 1 Q : 1 PFACT : Right NBMIN : 4 NDIV : 2 RFACT : Crout BCAST : 1ringM DEPTH : 1 SWAP : Mix (threshold = 64) L1 : transposed form U : transposed form EQUIL : yes ALIGN : 8 double precision words ---------------------------------------------------------------------------- - The matrix A is randomly generated for each test. - The following scaled residual checks will be computed: 1) ||Ax-b||_oo / ( eps * ||A||_1 * N ) 2) ||Ax-b||_oo / ( eps * ||A||_1 * ||x||_1 ) 3) ||Ax-b||_oo / ( eps * ||A||_oo * ||x||_oo * N ) - The relative machine precision (eps) is taken to be 1.110223e-16 - Computational tests pass if scaled residuals are less than 16.0 Begin of PTRANS section. M: 3750 N: 3750 MB: 80 NB: 80 P: 1 Q: 1 TIME M N MB NB P Q TIME CHECK GB/s RESID ---- ----- ----- --- --- --- --- -------- ------ -------- ----- WALL 3750 3750 80 80 1 1 0.54 PASSED 0.210 0.00 CPU 3750 3750 80 80 1 1 0.54 PASSED 0.210 0.00 WALL 3750 3750 80 80 1 1 0.54 PASSED 0.210 0.00 CPU 3750 3750 80 80 1 1 0.54 PASSED 0.210 0.00 WALL 3750 3750 80 80 1 1 0.54 PASSED 0.210 0.00 CPU 3750 3750 80 80 1 1 0.54 PASSED 0.210 0.00 WALL 3750 3750 80 80 1 1 0.54 PASSED 0.210 0.00 CPU 3750 3750 80 80 1 1 0.54 PASSED 0.210 0.00 WALL 3750 3750 80 80 1 1 0.54 PASSED 0.210 0.00 CPU 3750 3750 80 80 1 1 0.54 PASSED 0.210 0.00 Finished 5 tests, with the following results: 5 tests completed and passed residual checks. 0 tests completed and failed residual checks. 0 tests skipped because of illegal input values. END OF TESTS. Current time (1210525050) is Mon May 12 01:57:30 2008 End of PTRANS section. Begin of HPL section. ============================================================================ HPLinpack 1.0a -- High-Performance Linpack benchmark -- January 20, 2004 Written by A. Petitet and R. Clint Whaley, Innovative Computing Labs., UTK ============================================================================ An explanation of the input/output parameters follows: T/V : Wall time / encoded variant. N : The order of the coefficient matrix A. NB : The partitioning blocking factor. P : The number of process rows. Q : The number of process columns. Time : Time in seconds to solve the linear system. Gflops : Rate of execution for solving the linear system. The following parameter values will be used: N : 7500 NB : 80 PMAP : Row-major process mapping P : 1 Q : 1 PFACT : Right NBMIN : 4 NDIV : 2 RFACT : Crout BCAST : 1ringM DEPTH : 1 SWAP : Mix (threshold = 64) L1 : transposed form U : transposed form EQUIL : yes ALIGN : 8 double precision words ---------------------------------------------------------------------------- - The matrix A is randomly generated for each test. - The following scaled residual checks will be computed: 1) ||Ax-b||_oo / ( eps * ||A||_1 * N ) 2) ||Ax-b||_oo / ( eps * ||A||_1 * ||x||_1 ) 3) ||Ax-b||_oo / ( eps * ||A||_oo * ||x||_oo * N ) - The relative machine precision (eps) is taken to be 1.110223e-16 - Computational tests pass if scaled residuals are less than 16.0 ============================================================================ T/V N NB P Q Time Gflops ---------------------------------------------------------------------------- WR11C2R4 7500 80 1 1 83.82 3.357e+00 ---------------------------------------------------------------------------- ||Ax-b||_oo / ( eps * ||A||_1 * N ) = 0.0143299 ...... PASSED ||Ax-b||_oo / ( eps * ||A||_1 * ||x||_1 ) = 0.0227387 ...... PASSED ||Ax-b||_oo / ( eps * ||A||_oo * ||x||_oo * N ) = 0.0048247 ...... PASSED ============================================================================ Finished 1 tests with the following results: 1 tests completed and passed residual checks, 0 tests completed and failed residual checks, 0 tests skipped because of illegal input values. ---------------------------------------------------------------------------- End of Tests. ============================================================================ Current time (1210525137) is Mon May 12 01:58:57 2008 End of HPL section. Begin of StarDGEMM section. Scaled residual: 0.00931701 Node(s) with error 0 Minimum Gflop/s 3.649542 Average Gflop/s 3.649542 Maximum Gflop/s 3.649542 Current time (1210525184) is Mon May 12 01:59:44 2008 End of StarDGEMM section. Begin of SingleDGEMM section. Scaled residual: 0.00933652 Node(s) with error 0 Node selected 0 Single DGEMM Gflop/s 3.845152 Current time (1210525229) is Mon May 12 02:00:29 2008 End of SingleDGEMM section. Begin of StarSTREAM section. ------------------------------------------------------------- This system uses 8 bytes per DOUBLE PRECISION word. ------------------------------------------------------------- Array size = 18750000, Offset = 0 Total memory required = 0.4191 GiB. Each test is run 10 times, but only the *best* time for each is used. ------------------------------------------------------------- Your clock granularity/precision appears to be 1 microseconds. Each test below will take on the order of 112485 microseconds. (= 112485 clock ticks) Increase the size of the arrays if this shows that you are not getting at least 20 clock ticks per test. ------------------------------------------------------------- WARNING -- The above is only a rough guideline. For best results, please be sure you know the precision of your system timer. ------------------------------------------------------------- Function Rate (GB/s) Avg time Min time Max time Copy: 2.1664 0.1389 0.1385 0.1390 Scale: 1.9919 0.1507 0.1506 0.1511 Add: 2.4362 0.1851 0.1847 0.1855 Triad: 2.3551 0.1912 0.1911 0.1913 ------------------------------------------------------------- Results Comparison: Expected : 21624389648437501952.000000 4324877929687499776.000000 5766503906250000384.000000 Observed : 21624389650922651648.000000 4324877930758327808.000000 5766503904477968384.000000 Solution Validates ------------------------------------------------------------- Node(s) with error 0 Minimum Copy GB/s 2.166394 Average Copy GB/s 2.166394 Maximum Copy GB/s 2.166394 Minimum Scale GB/s 1.991897 Average Scale GB/s 1.991897 Maximum Scale GB/s 1.991897 Minimum Add GB/s 2.436185 Average Add GB/s 2.436185 Maximum Add GB/s 2.436185 Minimum Triad GB/s 2.355110 Average Triad GB/s 2.355110 Maximum Triad GB/s 2.355110 Current time (1210525236) is Mon May 12 02:00:36 2008 End of StarSTREAM section. Begin of SingleSTREAM section. ------------------------------------------------------------- This system uses 8 bytes per DOUBLE PRECISION word. ------------------------------------------------------------- Array size = 18750000, Offset = 0 Total memory required = 0.4191 GiB. Each test is run 10 times, but only the *best* time for each is used. ------------------------------------------------------------- Your clock granularity/precision appears to be 1 microseconds. Each test below will take on the order of 111926 microseconds. (= 111926 clock ticks) Increase the size of the arrays if this shows that you are not getting at least 20 clock ticks per test. ------------------------------------------------------------- WARNING -- The above is only a rough guideline. For best results, please be sure you know the precision of your system timer. ------------------------------------------------------------- Function Rate (GB/s) Avg time Min time Max time Copy: 2.1665 0.1546 0.1385 0.1415 Scale: 1.9936 0.1674 0.1505 0.1511 Add: 2.4362 0.2056 0.1847 0.1855 Triad: 2.3553 0.2125 0.1911 0.1914 ------------------------------------------------------------- Results Comparison: Expected : 21624389648437501952.000000 4324877929687499776.000000 5766503906250000384.000000 Observed : 21624389650922651648.000000 4324877930758327808.000000 5766503904477968384.000000 Solution Validates ------------------------------------------------------------- Node(s) with error 0 Node selected 0 Single STREAM Copy GB/s 2.166453 Single STREAM Scale GB/s 1.993648 Single STREAM Add GB/s 2.436185 Single STREAM Triad GB/s 2.355307 Current time (1210525244) is Mon May 12 02:00:44 2008 End of SingleSTREAM section. Begin of MPIRandomAccess section. Running on 1 processors (PowerofTwo) Total Main table size = 2^25 = 33554432 words PE Main table size = 2^25 = 33554432 words/PE Default number of updates (RECOMMENDED) = 134217728 CPU time used = 25.577599 seconds Real time used = 25.914288 seconds 0.005179294 Billion(10^9) Updates per second [GUP/s] 0.005179294 Billion(10^9) Updates/PE per second [GUP/s] Verification: CPU time used = 13.648853 seconds Verification: Real time used = 13.648193 seconds Found 0 errors in 33554432 locations (passed). Current time (1210525284) is Mon May 12 02:01:24 2008 End of MPIRandomAccess section. Begin of StarRandomAccess section. Main table size = 2^25 = 33554432 words Number of updates = 134217728 CPU time used = 12.744796 seconds Real time used = 12.744084 seconds 0.010531767 Billion(10^9) Updates per second [GUP/s] Found 0 errors in 33554432 locations (passed). Node(s) with error 0 Minimum GUP/s 0.010532 Average GUP/s 0.010532 Maximum GUP/s 0.010532 Current time (1210525309) is Mon May 12 02:01:49 2008 End of StarRandomAccess section. Begin of SingleRandomAccess section. Main table size = 2^25 = 33554432 words Number of updates = 134217728 CPU time used = 12.736796 seconds Real time used = 12.739226 seconds 0.010535783 Billion(10^9) Updates per second [GUP/s] Found 0 errors in 33554432 locations (passed). Node(s) with error 0 Node selected 0 Single GUP/s 0.010536 Current time (1210525335) is Mon May 12 02:02:15 2008 End of SingleRandomAccess section. Begin of MPIFFT section. Number of nodes: 1 Vector size: 4194304 Generation time: 0.348 Tuning: 0.506 Computing: 1.435 Inverse FFT: 1.526 max(|x-x0|): 1.665e-15 Gflop/s: 0.321 Current time (1210525340) is Mon May 12 02:02:20 2008 End of MPIFFT section. Begin of StarFFT section. Vector size: 8388608 Generation time: 0.696 Tuning: 0.001 Computing: 1.955 Inverse FFT: 1.842 max(|x-x0|): 2.184e-15 Node(s) with error 0 Minimum Gflop/s 0.493548 Average Gflop/s 0.493548 Maximum Gflop/s 0.493548 Current time (1210525345) is Mon May 12 02:02:25 2008 End of StarFFT section. Begin of SingleFFT section. Vector size: 8388608 Generation time: 0.695 Tuning: 0.001 Computing: 1.965 Inverse FFT: 1.847 max(|x-x0|): 2.184e-15 Node(s) with error 0 Node selected 0 Single FFT Gflop/s 0.491048 Current time (1210525350) is Mon May 12 02:02:30 2008 End of SingleFFT section. Begin of LatencyBandwidth section. Current time (1210525350) is Mon May 12 02:02:30 2008 End of LatencyBandwidth section. Begin of Summary section. VersionMajor=1 VersionMinor=2 VersionMicro=0 VersionRelease=f LANG=C Success=1 sizeof_char=1 sizeof_short=2 sizeof_int=4 sizeof_long=8 sizeof_void_ptr=8 sizeof_size_t=8 sizeof_float=4 sizeof_double=8 sizeof_s64Int=8 sizeof_u64Int=8 sizeof_struct_double_double=16 CommWorldProcs=1 MPI_Wtick=1.000000e-06 HPL_Tflops=0.00335659 HPL_time=83.8156 HPL_eps=1.11022e-16 HPL_RnormI=2.29231e-11 HPL_Anorm1=1921.14 HPL_AnormI=1923.26 HPL_Xnorm1=4726.51 HPL_XnormI=2.96683 HPL_N=7500 HPL_NB=80 HPL_nprow=1 HPL_npcol=1 HPL_depth=1 HPL_nbdiv=2 HPL_nbmin=4 HPL_cpfact=R HPL_crfact=C HPL_ctop=1 HPL_order=R HPL_dMACH_EPS=1.110223e-16 HPL_dMACH_SFMIN=2.225074e-308 HPL_dMACH_BASE=2.000000e+00 HPL_dMACH_PREC=2.220446e-16 HPL_dMACH_MLEN=5.300000e+01 HPL_dMACH_RND=1.000000e+00 HPL_dMACH_EMIN=-1.021000e+03 HPL_dMACH_RMIN=2.225074e-308 HPL_dMACH_EMAX=1.024000e+03 HPL_dMACH_RMAX=1.797693e+308 HPL_sMACH_EPS=5.960464e-08 HPL_sMACH_SFMIN=1.175494e-38 HPL_sMACH_BASE=2.000000e+00 HPL_sMACH_PREC=1.192093e-07 HPL_sMACH_MLEN=2.400000e+01 HPL_sMACH_RND=1.000000e+00 HPL_sMACH_EMIN=-1.250000e+02 HPL_sMACH_RMIN=1.175494e-38 HPL_sMACH_EMAX=1.280000e+02 HPL_sMACH_RMAX=3.402823e+38 dweps=1.110223e-16 sweps=5.960464e-08 HPLMaxProcs=1 HPLMinProcs=1 DGEMM_N=4329 StarDGEMM_Gflops=3.64954 SingleDGEMM_Gflops=3.84515 PTRANS_GBs=0.209902 PTRANS_time=0.535731 PTRANS_residual=0 PTRANS_n=3750 PTRANS_nb=80 PTRANS_nprow=1 PTRANS_npcol=1 MPIRandomAccess_N=33554432 MPIRandomAccess_time=25.9143 MPIRandomAccess_CheckTime=13.6482 MPIRandomAccess_Errors=0 MPIRandomAccess_ErrorsFraction=0 MPIRandomAccess_ExeUpdates=134217728 MPIRandomAccess_GUPs=0.00517929 MPIRandomAccess_TimeBound=-1 MPIRandomAccess_Algorithm=0 RandomAccess_N=33554432 StarRandomAccess_GUPs=0.0105318 SingleRandomAccess_GUPs=0.0105358 STREAM_VectorSize=18750000 STREAM_Threads=1 StarSTREAM_Copy=2.16639 StarSTREAM_Scale=1.9919 StarSTREAM_Add=2.43619 StarSTREAM_Triad=2.35511 SingleSTREAM_Copy=2.16645 SingleSTREAM_Scale=1.99365 SingleSTREAM_Add=2.43619 SingleSTREAM_Triad=2.35531 FFT_N=8388608 StarFFT_Gflops=0.493548 SingleFFT_Gflops=0.491048 MPIFFT_N=4194304 MPIFFT_Gflops=0.321458 MPIFFT_maxErr=1.66533e-15 MPIFFT_Procs=1 MaxPingPongLatency_usec=-1 RandomlyOrderedRingLatency_usec=-1 MinPingPongBandwidth_GBytes=-1 NaturallyOrderedRingBandwidth_GBytes=-1 RandomlyOrderedRingBandwidth_GBytes=-1 MinPingPongLatency_usec=-1 AvgPingPongLatency_usec=-1 MaxPingPongBandwidth_GBytes=-1 AvgPingPongBandwidth_GBytes=-1 NaturallyOrderedRingLatency_usec=-1 FFTEnblk=16 FFTEnp=8 FFTEl2size=1048576 M_OPENMP=-1 omp_get_num_threads=0 omp_get_max_threads=0 omp_get_num_procs=0 MemProc=-1 MemSpec=-1 MemVal=-1 MPIFFT_time0=1.90735e-06 MPIFFT_time1=0.17767 MPIFFT_time2=0.204256 MPIFFT_time3=0.0726149 MPIFFT_time4=0.756234 MPIFFT_time5=0.151247 MPIFFT_time6=1.19209e-06 CPS_HPCC_FFT_235=0 CPS_HPCC_FFTW_ESTIMATE=0 CPS_HPCC_MEMALLCTR=0 CPS_HPL_USE_GETPROCESSTIMES=0 CPS_RA_SANDIA_NOPT=0 CPS_RA_SANDIA_OPT2=0 CPS_USING_FFTW=0 End of Summary section. ######################################################################## End of HPC Challenge tests. Current time (1210525350) is Mon May 12 02:02:30 2008 ########################################################################