######################################################################## This is the DARPA/DOE HPC Challenge Benchmark version 1.2.0 October 2003 Produced by Jack Dongarra and Piotr Luszczek Innovative Computing Laboratory University of Tennessee Knoxville and Oak Ridge National Laboratory See the source files for authors of specific codes. Compiled on May 11 2008 at 10:53:06 Current time (1211742262) is Mon May 26 04:04:22 2008 Hostname: 'pcc01' ######################################################################## ============================================================================ HPLinpack 1.0a -- High-Performance Linpack benchmark -- January 20, 2004 Written by A. Petitet and R. Clint Whaley, Innovative Computing Labs., UTK ============================================================================ An explanation of the input/output parameters follows: T/V : Wall time / encoded variant. N : The order of the coefficient matrix A. NB : The partitioning blocking factor. P : The number of process rows. Q : The number of process columns. Time : Time in seconds to solve the linear system. Gflops : Rate of execution for solving the linear system. The following parameter values will be used: N : 23000 NB : 100 PMAP : Row-major process mapping P : 2 4 Q : 4 2 PFACT : Right NBMIN : 4 NDIV : 2 RFACT : Crout BCAST : 1ringM DEPTH : 1 SWAP : Mix (threshold = 64) L1 : transposed form U : transposed form EQUIL : yes ALIGN : 8 double precision words ---------------------------------------------------------------------------- - The matrix A is randomly generated for each test. - The following scaled residual checks will be computed: 1) ||Ax-b||_oo / ( eps * ||A||_1 * N ) 2) ||Ax-b||_oo / ( eps * ||A||_1 * ||x||_1 ) 3) ||Ax-b||_oo / ( eps * ||A||_oo * ||x||_oo * N ) - The relative machine precision (eps) is taken to be 1.110223e-16 - Computational tests pass if scaled residuals are less than 16.0 Begin of PTRANS section. M: 11500 N: 11500 MB: 100 NB: 100 P: 2 4 Q: 4 2 TIME M N MB NB P Q TIME CHECK GB/s RESID ---- ----- ----- --- --- --- --- -------- ------ -------- ----- WALL 11500 11500 100 100 2 4 6.22 PASSED 0.170 0.00 CPU 11500 11500 100 100 2 4 1.90 PASSED 0.556 0.00 WALL 11500 11500 100 100 2 4 4.50 PASSED 0.170 0.00 CPU 11500 11500 100 100 2 4 1.77 PASSED 0.597 0.00 WALL 11500 11500 100 100 2 4 3.61 PASSED 0.170 0.00 CPU 11500 11500 100 100 2 4 1.29 PASSED 0.821 0.00 WALL 11500 11500 100 100 2 4 5.90 PASSED 0.170 0.00 CPU 11500 11500 100 100 2 4 2.01 PASSED 0.526 0.00 WALL 11500 11500 100 100 2 4 5.82 PASSED 0.170 0.00 CPU 11500 11500 100 100 2 4 1.93 PASSED 0.548 0.00 WALL 11500 11500 100 100 4 2 6.27 PASSED 0.169 0.00 CPU 11500 11500 100 100 4 2 2.22 PASSED 0.477 0.00 WALL 11500 11500 100 100 4 2 6.43 PASSED 0.164 0.00 CPU 11500 11500 100 100 4 2 2.10 PASSED 0.503 0.00 WALL 11500 11500 100 100 4 2 6.79 PASSED 0.156 0.00 CPU 11500 11500 100 100 4 2 2.50 PASSED 0.422 0.00 WALL 11500 11500 100 100 4 2 6.63 PASSED 0.156 0.00 CPU 11500 11500 100 100 4 2 2.44 PASSED 0.433 0.00 WALL 11500 11500 100 100 4 2 6.03 FAILED 0.156 420209.56 CPU 11500 11500 100 100 4 2 2.34 FAILED 0.451 420209.56 Finished 10 tests, with the following results: 9 tests completed and passed residual checks. 1 tests completed and failed residual checks. 0 tests skipped because of illegal input values. END OF TESTS. Current time (1211742372) is Mon May 26 04:06:12 2008 End of PTRANS section. Begin of HPL section. ============================================================================ HPLinpack 1.0a -- High-Performance Linpack benchmark -- January 20, 2004 Written by A. Petitet and R. Clint Whaley, Innovative Computing Labs., UTK ============================================================================ An explanation of the input/output parameters follows: T/V : Wall time / encoded variant. N : The order of the coefficient matrix A. NB : The partitioning blocking factor. P : The number of process rows. Q : The number of process columns. Time : Time in seconds to solve the linear system. Gflops : Rate of execution for solving the linear system. The following parameter values will be used: N : 23000 NB : 100 PMAP : Row-major process mapping P : 2 4 Q : 4 2 PFACT : Right NBMIN : 4 NDIV : 2 RFACT : Crout BCAST : 1ringM DEPTH : 1 SWAP : Mix (threshold = 64) L1 : transposed form U : transposed form EQUIL : yes ALIGN : 8 double precision words ---------------------------------------------------------------------------- - The matrix A is randomly generated for each test. - The following scaled residual checks will be computed: 1) ||Ax-b||_oo / ( eps * ||A||_1 * N ) 2) ||Ax-b||_oo / ( eps * ||A||_1 * ||x||_1 ) 3) ||Ax-b||_oo / ( eps * ||A||_oo * ||x||_oo * N ) - The relative machine precision (eps) is taken to be 1.110223e-16 - Computational tests pass if scaled residuals are less than 16.0 ============================================================================ T/V N NB P Q Time Gflops ---------------------------------------------------------------------------- WR11C2R4 23000 100 2 4 376.79 2.153e+01 ---------------------------------------------------------------------------- ||Ax-b||_oo / ( eps * ||A||_1 * N ) = 25066486.5704129 ...... FAILED ||Ax-b||_oo / ( eps * ||A||_1 * ||x||_1 ) = 53949925.8129770 ...... FAILED ||Ax-b||_oo / ( eps * ||A||_oo * ||x||_oo * N ) = 9803806.3202825 ...... FAILED ||Ax-b||_oo . . . . . . . . . . . . . . . . . = 0.373599 ||A||_oo . . . . . . . . . . . . . . . . . . . = 5827.145943 ||A||_1 . . . . . . . . . . . . . . . . . . . = 5836.795619 ||x||_oo . . . . . . . . . . . . . . . . . . . = 2.561046 ||x||_1 . . . . . . . . . . . . . . . . . . . = 10686.375976 ============================================================================ T/V N NB P Q Time Gflops ---------------------------------------------------------------------------- WR11C2R4 23000 100 4 2 394.13 2.058e+01 ---------------------------------------------------------------------------- ||Ax-b||_oo / ( eps * ||A||_1 * N ) = 187371130.3255046 ...... FAILED ||Ax-b||_oo / ( eps * ||A||_1 * ||x||_1 ) = 181331182.6709508 ...... FAILED ||Ax-b||_oo / ( eps * ||A||_oo * ||x||_oo * N ) = 31912348.3180181 ...... FAILED ||Ax-b||_oo . . . . . . . . . . . . . . . . . = 2.792642 ||A||_oo . . . . . . . . . . . . . . . . . . . = 5827.145943 ||A||_1 . . . . . . . . . . . . . . . . . . . = 5836.795619 ||x||_oo . . . . . . . . . . . . . . . . . . . = 5.881153 ||x||_1 . . . . . . . . . . . . . . . . . . . = 23766.105388 ============================================================================ Finished 2 tests with the following results: 0 tests completed and passed residual checks, 2 tests completed and failed residual checks, 0 tests skipped because of illegal input values. ---------------------------------------------------------------------------- End of Tests. ============================================================================ Current time (1211743153) is Mon May 26 04:19:13 2008 End of HPL section. Begin of StarDGEMM section. Scaled residual: 0.00780208 Node(s) with error 0 Minimum Gflop/s 3.396643 Average Gflop/s 3.860182 Maximum Gflop/s 4.285028 Current time (1211743217) is Mon May 26 04:20:17 2008 End of StarDGEMM section. Begin of SingleDGEMM section. Node(s) with error 0 Node selected 4 Single DGEMM Gflop/s 4.304754 Current time (1211743267) is Mon May 26 04:21:07 2008 End of SingleDGEMM section. Begin of StarSTREAM section. ------------------------------------------------------------- This system uses 8 bytes per DOUBLE PRECISION word. ------------------------------------------------------------- Array size = 22041666, Offset = 0 Total memory required = 0.4927 GiB. Each test is run 10 times, but only the *best* time for each is used. ------------------------------------------------------------- Your clock granularity/precision appears to be 1 microseconds. Each test below will take on the order of 89633 microseconds. (= 89633 clock ticks) Increase the size of the arrays if this shows that you are not getting at least 20 clock ticks per test. ------------------------------------------------------------- WARNING -- The above is only a rough guideline. For best results, please be sure you know the precision of your system timer. ------------------------------------------------------------- Function Rate (GB/s) Avg time Min time Max time Copy: 2.8783 0.1227 0.1225 0.1229 Scale: 2.7884 0.1267 0.1265 0.1270 Add: 3.3522 0.1580 0.1578 0.1582 Triad: 3.3919 0.1561 0.1560 0.1564 ------------------------------------------------------------- Results Comparison: Expected : 25420670617851564032.000000 5084134123570312192.000000 6778845498093750272.000000 Observed : 25420670626439462912.000000 5084134125187556352.000000 6778845495926718464.000000 Solution Validates ------------------------------------------------------------- Node(s) with error 0 Minimum Copy GB/s 1.302050 Average Copy GB/s 2.276894 Maximum Copy GB/s 2.925020 Minimum Scale GB/s 1.307495 Average Scale GB/s 2.242335 Maximum Scale GB/s 2.850173 Minimum Add GB/s 1.509462 Average Add GB/s 2.654397 Maximum Add GB/s 3.430990 Minimum Triad GB/s 1.514830 Average Triad GB/s 2.650628 Maximum Triad GB/s 3.391852 Current time (1211743281) is Mon May 26 04:21:21 2008 End of StarSTREAM section. Begin of SingleSTREAM section. Node(s) with error 0 Node selected 7 Single STREAM Copy GB/s 2.775676 Single STREAM Scale GB/s 2.717648 Single STREAM Add GB/s 3.107871 Single STREAM Triad GB/s 3.058885 Current time (1211743288) is Mon May 26 04:21:28 2008 End of SingleSTREAM section. Begin of MPIRandomAccess section. Running on 8 processors (PowerofTwo) Total Main table size = 2^28 = 268435456 words PE Main table size = 2^25 = 33554432 words/PE Default number of updates (RECOMMENDED) = 1073741824 CPU time used = 177.395087 seconds Real time used = 512.216004 seconds 0.002096268 Billion(10^9) Updates per second [GUP/s] 0.000262033 Billion(10^9) Updates/PE per second [GUP/s] Verification: CPU time used = 57.575598 seconds Verification: Real time used = 154.285936 seconds Found 0 errors in 268435456 locations (passed). Current time (1211743955) is Mon May 26 04:32:35 2008 End of MPIRandomAccess section. Begin of StarRandomAccess section. Main table size = 2^25 = 33554432 words Number of updates = 134217728 CPU time used = 10.964685 seconds Real time used = 10.965271 seconds 0.012240256 Billion(10^9) Updates per second [GUP/s] Found 0 errors in 33554432 locations (passed). Node(s) with error 0 Minimum GUP/s 0.007301 Average GUP/s 0.010115 Maximum GUP/s 0.012244 Current time (1211743992) is Mon May 26 04:33:12 2008 End of StarRandomAccess section. Begin of SingleRandomAccess section. Node(s) with error 0 Node selected 2 Single GUP/s 0.012241 Current time (1211744014) is Mon May 26 04:33:34 2008 End of SingleRandomAccess section. Begin of MPIFFT section. Number of nodes: 8 Vector size: 33554432 Generation time: 0.336 Tuning: 0.457 Computing: 9.200 Inverse FFT: 10.190 max(|x-x0|): 2.025e-15 Gflop/s: 0.456 Current time (1211744035) is Mon May 26 04:33:55 2008 End of MPIFFT section. Begin of StarFFT section. Vector size: 8388608 Generation time: 0.672 Tuning: 0.001 Computing: 1.644 Inverse FFT: 1.516 max(|x-x0|): 2.184e-15 Node(s) with error 0 Minimum Gflop/s 0.489899 Average Gflop/s 0.547558 Maximum Gflop/s 0.603275 Current time (1211744041) is Mon May 26 04:34:01 2008 End of StarFFT section. Begin of SingleFFT section. Node(s) with error 0 Node selected 7 Single FFT Gflop/s 0.529298 Current time (1211744046) is Mon May 26 04:34:06 2008 End of SingleFFT section. Begin of LatencyBandwidth section. ------------------------------------------------------------------ Latency-Bandwidth-Benchmark R1.5.1 (c) HLRS, University of Stuttgart Written by Rolf Rabenseifner, Gerrit Schulz, and Michael Speck, Germany Details - level 2 ----------------- MPI_Wtime granularity. Max. MPI_Wtick is 0.000001 sec wtick is set to 0.000001 sec Message Length: 8 Latency min / avg / max: 0.047684 / 0.047684 / 0.047684 msecs Bandwidth min / avg / max: 0.168 / 0.168 / 0.168 MByte/s MPI_Wtime granularity is ok. message size: 8 max time : 10.000000 secs latency for msg: 0.047684 msecs estimation for ping pong: 4.291534 msecs max number of ping pong pairs = 2330 max client pings = max server pongs = 48 stride for latency = 1 Message Length: 8 Latency min / avg / max: 0.004560 / 0.038977 / 0.047684 msecs Bandwidth min / avg / max: 0.168 / 0.296 / 1.754 MByte/s Message Length: 2000000 Latency min / avg / max: 17.199039 / 17.199039 / 17.199039 msecs Bandwidth min / avg / max: 116.286 / 116.286 / 116.286 MByte/s MPI_Wtime granularity is ok. message size: 2000000 max time : 30.000000 secs latency for msg: 17.199039 msecs estimation for ping pong: 137.592316 msecs max number of ping pong pairs = 218 max client pings = max server pongs = 14 stride for latency = 1 Message Length: 2000000 Latency min / avg / max: 2.486944 / 16.136306 / 17.204523 msecs Bandwidth min / avg / max: 116.248 / 164.200 / 804.200 MByte/s Message Size: 8 Byte Natural Order Latency: 0.031996 msec Natural Order Bandwidth: 0.250033 MB/s Avg Random Order Latency: 0.033675 msec Avg Random Order Bandwidth: 0.237565 MB/s Message Size: 2000000 Byte Natural Order Latency: 47.667980 msec Natural Order Bandwidth: 41.956886 MB/s Avg Random Order Latency: 84.244199 msec Avg Random Order Bandwidth: 23.740507 MB/s Execution time (wall clock) = 53.291 sec on 8 processes - for cross ping_pong latency = 0.280 sec - for cross ping_pong bandwidth = 7.327 sec - for ring latency = 0.340 sec - for ring bandwidth = 45.344 sec ------------------------------------------------------------------ Latency-Bandwidth-Benchmark R1.5.1 (c) HLRS, University of Stuttgart Written by Rolf Rabenseifner, Gerrit Schulz, and Michael Speck, Germany Major Benchmark results: ------------------------ Max Ping Pong Latency: 0.047684 msecs Randomly Ordered Ring Latency: 0.033675 msecs Min Ping Pong Bandwidth: 116.248500 MB/s Naturally Ordered Ring Bandwidth: 41.956886 MB/s Randomly Ordered Ring Bandwidth: 23.740507 MB/s ------------------------------------------------------------------ Detailed benchmark results: Ping Pong: Latency min / avg / max: 0.004560 / 0.038977 / 0.047684 msecs Bandwidth min / avg / max: 116.248 / 164.200 / 804.200 MByte/s Ring: On naturally ordered ring: latency= 0.031996 msec, bandwidth= 41.956886 MB/s On randomly ordered ring: latency= 0.033675 msec, bandwidth= 23.740507 MB/s ------------------------------------------------------------------ Benchmark conditions: The latency measurements were done with 8 bytes The bandwidth measurements were done with 2000000 bytes The ring communication was done in both directions on 8 processes The Ping Pong measurements were done on - 56 pairs of processes for latency benchmarking, and - 56 pairs of processes for bandwidth benchmarking, out of 8*(8-1) = 56 possible combinations on 8 processes. (1 MB/s = 10**6 byte/sec) ------------------------------------------------------------------ Current time (1211744099) is Mon May 26 04:34:59 2008 End of LatencyBandwidth section. Begin of Summary section. VersionMajor=1 VersionMinor=2 VersionMicro=0 VersionRelease=f LANG=C Success=0 sizeof_char=1 sizeof_short=2 sizeof_int=4 sizeof_long=8 sizeof_void_ptr=8 sizeof_size_t=8 sizeof_float=4 sizeof_double=8 sizeof_s64Int=8 sizeof_u64Int=8 sizeof_struct_double_double=16 CommWorldProcs=8 MPI_Wtick=1.000000e-06 HPL_Tflops=0.0215296 HPL_time=376.789 HPL_eps=1.11022e-16 HPL_RnormI=0.373599 HPL_Anorm1=5836.8 HPL_AnormI=5827.15 HPL_Xnorm1=10686.4 HPL_XnormI=2.56105 HPL_N=23000 HPL_NB=100 HPL_nprow=2 HPL_npcol=4 HPL_depth=1 HPL_nbdiv=2 HPL_nbmin=4 HPL_cpfact=R HPL_crfact=C HPL_ctop=1 HPL_order=R HPL_dMACH_EPS=1.110223e-16 HPL_dMACH_SFMIN=2.225074e-308 HPL_dMACH_BASE=2.000000e+00 HPL_dMACH_PREC=2.220446e-16 HPL_dMACH_MLEN=5.300000e+01 HPL_dMACH_RND=1.000000e+00 HPL_dMACH_EMIN=-1.021000e+03 HPL_dMACH_RMIN=2.225074e-308 HPL_dMACH_EMAX=1.024000e+03 HPL_dMACH_RMAX=1.797693e+308 HPL_sMACH_EPS=5.960464e-08 HPL_sMACH_SFMIN=1.175494e-38 HPL_sMACH_BASE=2.000000e+00 HPL_sMACH_PREC=1.192093e-07 HPL_sMACH_MLEN=2.400000e+01 HPL_sMACH_RND=1.000000e+00 HPL_sMACH_EMIN=-1.250000e+02 HPL_sMACH_RMIN=1.175494e-38 HPL_sMACH_EMAX=1.280000e+02 HPL_sMACH_RMAX=3.402823e+38 dweps=1.110223e-16 sweps=5.960464e-08 HPLMaxProcs=8 HPLMinProcs=8 DGEMM_N=4694 StarDGEMM_Gflops=3.86018 SingleDGEMM_Gflops=4.30475 PTRANS_GBs=0.170024 PTRANS_time=5.82325 PTRANS_residual=0 PTRANS_n=11500 PTRANS_nb=100 PTRANS_nprow=2 PTRANS_npcol=4 MPIRandomAccess_N=268435456 MPIRandomAccess_time=512.216 MPIRandomAccess_CheckTime=154.286 MPIRandomAccess_Errors=0 MPIRandomAccess_ErrorsFraction=0 MPIRandomAccess_ExeUpdates=1073741824 MPIRandomAccess_GUPs=0.00209627 MPIRandomAccess_TimeBound=-1 MPIRandomAccess_Algorithm=0 RandomAccess_N=33554432 StarRandomAccess_GUPs=0.010115 SingleRandomAccess_GUPs=0.0122412 STREAM_VectorSize=22041666 STREAM_Threads=1 StarSTREAM_Copy=2.27689 StarSTREAM_Scale=2.24234 StarSTREAM_Add=2.6544 StarSTREAM_Triad=2.65063 SingleSTREAM_Copy=2.77568 SingleSTREAM_Scale=2.71765 SingleSTREAM_Add=3.10787 SingleSTREAM_Triad=3.05889 FFT_N=8388608 StarFFT_Gflops=0.547558 SingleFFT_Gflops=0.529298 MPIFFT_N=33554432 MPIFFT_Gflops=0.455911 MPIFFT_maxErr=2.0254e-15 MPIFFT_Procs=8 MaxPingPongLatency_usec=47.6837 RandomlyOrderedRingLatency_usec=33.6749 MinPingPongBandwidth_GBytes=0.116248 NaturallyOrderedRingBandwidth_GBytes=0.0419569 RandomlyOrderedRingBandwidth_GBytes=0.0237405 MinPingPongLatency_usec=4.55976 AvgPingPongLatency_usec=38.9772 MaxPingPongBandwidth_GBytes=0.8042 AvgPingPongBandwidth_GBytes=0.1642 NaturallyOrderedRingLatency_usec=31.9958 FFTEnblk=16 FFTEnp=8 FFTEl2size=1048576 M_OPENMP=-1 omp_get_num_threads=0 omp_get_max_threads=0 omp_get_num_procs=0 MemProc=-1 MemSpec=-1 MemVal=-1 MPIFFT_time0=1.90735e-06 MPIFFT_time1=2.76052 MPIFFT_time2=0.278877 MPIFFT_time3=2.6669 MPIFFT_time4=0.783747 MPIFFT_time5=2.66164 MPIFFT_time6=1.90735e-06 CPS_HPCC_FFT_235=0 CPS_HPCC_FFTW_ESTIMATE=0 CPS_HPCC_MEMALLCTR=0 CPS_HPL_USE_GETPROCESSTIMES=0 CPS_RA_SANDIA_NOPT=0 CPS_RA_SANDIA_OPT2=0 CPS_USING_FFTW=0 End of Summary section. ######################################################################## End of HPC Challenge tests. Current time (1211744099) is Mon May 26 04:34:59 2008 ########################################################################