######################################################################## This is the DARPA/DOE HPC Challenge Benchmark version 1.2.0 October 2003 Produced by Jack Dongarra and Piotr Luszczek Innovative Computing Laboratory University of Tennessee Knoxville and Oak Ridge National Laboratory See the source files for authors of specific codes. Compiled on May 11 2008 at 10:53:06 Current time (1211590604) is Sat May 24 09:56:44 2008 Hostname: 'pcc01' ######################################################################## ============================================================================ HPLinpack 1.0a -- High-Performance Linpack benchmark -- January 20, 2004 Written by A. Petitet and R. Clint Whaley, Innovative Computing Labs., UTK ============================================================================ An explanation of the input/output parameters follows: T/V : Wall time / encoded variant. N : The order of the coefficient matrix A. NB : The partitioning blocking factor. P : The number of process rows. Q : The number of process columns. Time : Time in seconds to solve the linear system. Gflops : Rate of execution for solving the linear system. The following parameter values will be used: N : 17500 NB : 100 PMAP : Row-major process mapping P : 2 Q : 2 PFACT : Right NBMIN : 4 NDIV : 2 RFACT : Crout BCAST : 1ringM DEPTH : 1 SWAP : Mix (threshold = 64) L1 : transposed form U : transposed form EQUIL : yes ALIGN : 8 double precision words ---------------------------------------------------------------------------- - The matrix A is randomly generated for each test. - The following scaled residual checks will be computed: 1) ||Ax-b||_oo / ( eps * ||A||_1 * N ) 2) ||Ax-b||_oo / ( eps * ||A||_1 * ||x||_1 ) 3) ||Ax-b||_oo / ( eps * ||A||_oo * ||x||_oo * N ) - The relative machine precision (eps) is taken to be 1.110223e-16 - Computational tests pass if scaled residuals are less than 16.0 Begin of PTRANS section. M: 8750 N: 8750 MB: 100 NB: 100 P: 2 Q: 2 TIME M N MB NB P Q TIME CHECK GB/s RESID ---- ----- ----- --- --- --- --- -------- ------ -------- ----- WALL 8750 8750 100 100 2 2 3.00 PASSED 0.204 0.00 CPU 8750 8750 100 100 2 2 1.85 PASSED 0.331 0.00 WALL 8750 8750 100 100 2 2 2.99 PASSED 0.204 0.00 CPU 8750 8750 100 100 2 2 1.82 PASSED 0.337 0.00 WALL 8750 8750 100 100 2 2 3.00 PASSED 0.204 0.00 CPU 8750 8750 100 100 2 2 1.88 PASSED 0.325 0.00 WALL 8750 8750 100 100 2 2 3.00 PASSED 0.204 0.00 CPU 8750 8750 100 100 2 2 1.86 PASSED 0.329 0.00 WALL 8750 8750 100 100 2 2 3.00 PASSED 0.204 0.00 CPU 8750 8750 100 100 2 2 1.83 PASSED 0.335 0.00 Finished 5 tests, with the following results: 5 tests completed and passed residual checks. 0 tests completed and failed residual checks. 0 tests skipped because of illegal input values. END OF TESTS. Current time (1211590663) is Sat May 24 09:57:43 2008 End of PTRANS section. Begin of HPL section. ============================================================================ HPLinpack 1.0a -- High-Performance Linpack benchmark -- January 20, 2004 Written by A. Petitet and R. Clint Whaley, Innovative Computing Labs., UTK ============================================================================ An explanation of the input/output parameters follows: T/V : Wall time / encoded variant. N : The order of the coefficient matrix A. NB : The partitioning blocking factor. P : The number of process rows. Q : The number of process columns. Time : Time in seconds to solve the linear system. Gflops : Rate of execution for solving the linear system. The following parameter values will be used: N : 17500 NB : 100 PMAP : Row-major process mapping P : 2 Q : 2 PFACT : Right NBMIN : 4 NDIV : 2 RFACT : Crout BCAST : 1ringM DEPTH : 1 SWAP : Mix (threshold = 64) L1 : transposed form U : transposed form EQUIL : yes ALIGN : 8 double precision words ---------------------------------------------------------------------------- - The matrix A is randomly generated for each test. - The following scaled residual checks will be computed: 1) ||Ax-b||_oo / ( eps * ||A||_1 * N ) 2) ||Ax-b||_oo / ( eps * ||A||_1 * ||x||_1 ) 3) ||Ax-b||_oo / ( eps * ||A||_oo * ||x||_oo * N ) - The relative machine precision (eps) is taken to be 1.110223e-16 - Computational tests pass if scaled residuals are less than 16.0 ============================================================================ T/V N NB P Q Time Gflops ---------------------------------------------------------------------------- WR11C2R4 17500 100 2 2 276.09 1.294e+01 ---------------------------------------------------------------------------- ||Ax-b||_oo / ( eps * ||A||_1 * N ) = 0.1453237 ...... PASSED ||Ax-b||_oo / ( eps * ||A||_1 * ||x||_1 ) = 0.0234162 ...... PASSED ||Ax-b||_oo / ( eps * ||A||_oo * ||x||_oo * N ) = 0.0047088 ...... PASSED ============================================================================ Finished 1 tests with the following results: 1 tests completed and passed residual checks, 0 tests completed and failed residual checks, 0 tests skipped because of illegal input values. ---------------------------------------------------------------------------- End of Tests. ============================================================================ Current time (1211590945) is Sat May 24 10:02:25 2008 End of HPL section. Begin of StarDGEMM section. Scaled residual: 0.00919694 Node(s) with error 0 Minimum Gflop/s 3.876489 Average Gflop/s 3.876955 Maximum Gflop/s 3.877375 Current time (1211591014) is Sat May 24 10:03:34 2008 End of StarDGEMM section. Begin of SingleDGEMM section. Node(s) with error 0 Node selected 2 Single DGEMM Gflop/s 3.877216 Current time (1211591084) is Sat May 24 10:04:44 2008 End of SingleDGEMM section. Begin of StarSTREAM section. ------------------------------------------------------------- This system uses 8 bytes per DOUBLE PRECISION word. ------------------------------------------------------------- Array size = 25520833, Offset = 0 Total memory required = 0.5704 GiB. Each test is run 10 times, but only the *best* time for each is used. ------------------------------------------------------------- Your clock granularity/precision appears to be 1 microseconds. Each test below will take on the order of 108295 microseconds. (= 108295 clock ticks) Increase the size of the arrays if this shows that you are not getting at least 20 clock ticks per test. ------------------------------------------------------------- WARNING -- The above is only a rough guideline. For best results, please be sure you know the precision of your system timer. ------------------------------------------------------------- Function Rate (GB/s) Avg time Min time Max time Copy: 2.8395 0.1439 0.1438 0.1439 Scale: 2.7339 0.1495 0.1494 0.1497 Add: 3.3372 0.1838 0.1835 0.1840 Triad: 3.2587 0.1883 0.1880 0.1886 ------------------------------------------------------------- Results Comparison: Expected : 29433196637050781696.000000 5886639327410156544.000000 7848852436546875392.000000 Observed : 29433196652089057280.000000 5886639329604941824.000000 7848852433962343424.000000 Solution Validates ------------------------------------------------------------- Node(s) with error 0 Minimum Copy GB/s 2.837244 Average Copy GB/s 2.838591 Maximum Copy GB/s 2.839592 Minimum Scale GB/s 2.732880 Average Scale GB/s 2.733782 Maximum Scale GB/s 2.734639 Minimum Add GB/s 3.337184 Average Add GB/s 3.344543 Maximum Add GB/s 3.348258 Minimum Triad GB/s 3.258653 Average Triad GB/s 3.262779 Maximum Triad GB/s 3.265761 Current time (1211591091) is Sat May 24 10:04:51 2008 End of StarSTREAM section. Begin of SingleSTREAM section. Node(s) with error 0 Node selected 1 Single STREAM Copy GB/s 2.839592 Single STREAM Scale GB/s 2.735643 Single STREAM Add GB/s 3.349083 Single STREAM Triad GB/s 3.263670 Current time (1211591099) is Sat May 24 10:04:59 2008 End of SingleSTREAM section. Begin of MPIRandomAccess section. Running on 4 processors (PowerofTwo) Total Main table size = 2^28 = 268435456 words PE Main table size = 2^26 = 67108864 words/PE Default number of updates (RECOMMENDED) = 1073741824 CPU time used = 260.952309 seconds Real time used = 705.186365 seconds 0.001522636 Billion(10^9) Updates per second [GUP/s] 0.000380659 Billion(10^9) Updates/PE per second [GUP/s] Verification: CPU time used = 41.990624 seconds Verification: Real time used = 71.417327 seconds Found 0 errors in 268435456 locations (passed). Current time (1211591876) is Sat May 24 10:17:56 2008 End of MPIRandomAccess section. Begin of StarRandomAccess section. Main table size = 2^26 = 67108864 words Number of updates = 268435456 CPU time used = 26.117632 seconds Real time used = 26.120160 seconds 0.010276945 Billion(10^9) Updates per second [GUP/s] Found 0 errors in 67108864 locations (passed). Node(s) with error 0 Minimum GUP/s 0.010269 Average GUP/s 0.010277 Maximum GUP/s 0.010286 Current time (1211591929) is Sat May 24 10:18:49 2008 End of StarRandomAccess section. Begin of SingleRandomAccess section. Node(s) with error 0 Node selected 2 Single GUP/s 0.010270 Current time (1211591982) is Sat May 24 10:19:42 2008 End of SingleRandomAccess section. Begin of MPIFFT section. Number of nodes: 4 Vector size: 33554432 Generation time: 0.664 Tuning: 0.907 Computing: 6.422 Inverse FFT: 6.508 max(|x-x0|): 2.025e-15 Gflop/s: 0.653 Current time (1211591998) is Sat May 24 10:19:58 2008 End of MPIFFT section. Begin of StarFFT section. Vector size: 16777216 Generation time: 1.327 Tuning: 0.001 Computing: 3.485 Inverse FFT: 3.265 max(|x-x0|): 2.026e-15 Node(s) with error 0 Minimum Gflop/s 0.557732 Average Gflop/s 0.570439 Maximum Gflop/s 0.579821 Current time (1211592007) is Sat May 24 10:20:07 2008 End of StarFFT section. Begin of SingleFFT section. Node(s) with error 0 Node selected 3 Single FFT Gflop/s 0.571810 Current time (1211592017) is Sat May 24 10:20:17 2008 End of SingleFFT section. Begin of LatencyBandwidth section. ------------------------------------------------------------------ Latency-Bandwidth-Benchmark R1.5.1 (c) HLRS, University of Stuttgart Written by Rolf Rabenseifner, Gerrit Schulz, and Michael Speck, Germany Details - level 2 ----------------- MPI_Wtime granularity. Max. MPI_Wtick is 0.000001 sec wtick is set to 0.000001 sec Message Length: 8 Latency min / avg / max: 0.047177 / 0.047177 / 0.047177 msecs Bandwidth min / avg / max: 0.170 / 0.170 / 0.170 MByte/s MPI_Wtime granularity is ok. message size: 8 max time : 10.000000 secs latency for msg: 0.047177 msecs estimation for ping pong: 4.245937 msecs max number of ping pong pairs = 2355 max client pings = max server pongs = 48 stride for latency = 1 Message Length: 8 Latency min / avg / max: 0.046492 / 0.046899 / 0.047326 msecs Bandwidth min / avg / max: 0.169 / 0.171 / 0.172 MByte/s Message Length: 2000000 Latency min / avg / max: 17.195463 / 17.195463 / 17.195463 msecs Bandwidth min / avg / max: 116.310 / 116.310 / 116.310 MByte/s MPI_Wtime granularity is ok. message size: 2000000 max time : 30.000000 secs latency for msg: 17.195463 msecs estimation for ping pong: 137.563705 msecs max number of ping pong pairs = 218 max client pings = max server pongs = 14 stride for latency = 1 Message Length: 2000000 Latency min / avg / max: 17.188907 / 17.195404 / 17.200947 msecs Bandwidth min / avg / max: 116.273 / 116.310 / 116.354 MByte/s Message Size: 8 Byte Natural Order Latency: 0.033212 msec Natural Order Bandwidth: 0.240879 MB/s Avg Random Order Latency: 0.032975 msec Avg Random Order Bandwidth: 0.242610 MB/s Message Size: 2000000 Byte Natural Order Latency: 20.394027 msec Natural Order Bandwidth: 98.067928 MB/s Avg Random Order Latency: 20.363225 msec Avg Random Order Bandwidth: 98.216269 MB/s Execution time (wall clock) = 14.662 sec on 4 processes - for cross ping_pong latency = 0.132 sec - for cross ping_pong bandwidth = 1.729 sec - for ring latency = 0.333 sec - for ring bandwidth = 12.468 sec ------------------------------------------------------------------ Latency-Bandwidth-Benchmark R1.5.1 (c) HLRS, University of Stuttgart Written by Rolf Rabenseifner, Gerrit Schulz, and Michael Speck, Germany Major Benchmark results: ------------------------ Max Ping Pong Latency: 0.047326 msecs Randomly Ordered Ring Latency: 0.032975 msecs Min Ping Pong Bandwidth: 116.272669 MB/s Naturally Ordered Ring Bandwidth: 98.067928 MB/s Randomly Ordered Ring Bandwidth: 98.216269 MB/s ------------------------------------------------------------------ Detailed benchmark results: Ping Pong: Latency min / avg / max: 0.046492 / 0.046899 / 0.047326 msecs Bandwidth min / avg / max: 116.273 / 116.310 / 116.354 MByte/s Ring: On naturally ordered ring: latency= 0.033212 msec, bandwidth= 98.067928 MB/s On randomly ordered ring: latency= 0.032975 msec, bandwidth= 98.216269 MB/s ------------------------------------------------------------------ Benchmark conditions: The latency measurements were done with 8 bytes The bandwidth measurements were done with 2000000 bytes The ring communication was done in both directions on 4 processes The Ping Pong measurements were done on - 12 pairs of processes for latency benchmarking, and - 12 pairs of processes for bandwidth benchmarking, out of 4*(4-1) = 12 possible combinations on 4 processes. (1 MB/s = 10**6 byte/sec) ------------------------------------------------------------------ Current time (1211592032) is Sat May 24 10:20:32 2008 End of LatencyBandwidth section. Begin of Summary section. VersionMajor=1 VersionMinor=2 VersionMicro=0 VersionRelease=f LANG=C Success=1 sizeof_char=1 sizeof_short=2 sizeof_int=4 sizeof_long=8 sizeof_void_ptr=8 sizeof_size_t=8 sizeof_float=4 sizeof_double=8 sizeof_s64Int=8 sizeof_u64Int=8 sizeof_struct_double_double=16 CommWorldProcs=4 MPI_Wtick=1.000000e-06 HPL_Tflops=0.0129427 HPL_time=276.092 HPL_eps=1.11022e-16 HPL_RnormI=1.25911e-09 HPL_Anorm1=4459.43 HPL_AnormI=4457.21 HPL_Xnorm1=108607 HPL_XnormI=30.8776 HPL_N=17500 HPL_NB=100 HPL_nprow=2 HPL_npcol=2 HPL_depth=1 HPL_nbdiv=2 HPL_nbmin=4 HPL_cpfact=R HPL_crfact=C HPL_ctop=1 HPL_order=R HPL_dMACH_EPS=1.110223e-16 HPL_dMACH_SFMIN=2.225074e-308 HPL_dMACH_BASE=2.000000e+00 HPL_dMACH_PREC=2.220446e-16 HPL_dMACH_MLEN=5.300000e+01 HPL_dMACH_RND=1.000000e+00 HPL_dMACH_EMIN=-1.021000e+03 HPL_dMACH_RMIN=2.225074e-308 HPL_dMACH_EMAX=1.024000e+03 HPL_dMACH_RMAX=1.797693e+308 HPL_sMACH_EPS=5.960464e-08 HPL_sMACH_SFMIN=1.175494e-38 HPL_sMACH_BASE=2.000000e+00 HPL_sMACH_PREC=1.192093e-07 HPL_sMACH_MLEN=2.400000e+01 HPL_sMACH_RND=1.000000e+00 HPL_sMACH_EMIN=-1.250000e+02 HPL_sMACH_RMIN=1.175494e-38 HPL_sMACH_EMAX=1.280000e+02 HPL_sMACH_RMAX=3.402823e+38 dweps=1.110223e-16 sweps=5.960464e-08 HPLMaxProcs=4 HPLMinProcs=4 DGEMM_N=5051 StarDGEMM_Gflops=3.87695 SingleDGEMM_Gflops=3.87722 PTRANS_GBs=0.204233 PTRANS_time=2.99903 PTRANS_residual=0 PTRANS_n=8750 PTRANS_nb=100 PTRANS_nprow=2 PTRANS_npcol=2 MPIRandomAccess_N=268435456 MPIRandomAccess_time=705.186 MPIRandomAccess_CheckTime=71.4173 MPIRandomAccess_Errors=0 MPIRandomAccess_ErrorsFraction=0 MPIRandomAccess_ExeUpdates=1073741824 MPIRandomAccess_GUPs=0.00152264 MPIRandomAccess_TimeBound=-1 MPIRandomAccess_Algorithm=0 RandomAccess_N=67108864 StarRandomAccess_GUPs=0.0102766 SingleRandomAccess_GUPs=0.01027 STREAM_VectorSize=25520833 STREAM_Threads=1 StarSTREAM_Copy=2.83859 StarSTREAM_Scale=2.73378 StarSTREAM_Add=3.34454 StarSTREAM_Triad=3.26278 SingleSTREAM_Copy=2.83959 SingleSTREAM_Scale=2.73564 SingleSTREAM_Add=3.34908 SingleSTREAM_Triad=3.26367 FFT_N=16777216 StarFFT_Gflops=0.570439 SingleFFT_Gflops=0.57181 MPIFFT_N=33554432 MPIFFT_Gflops=0.653123 MPIFFT_maxErr=2.0254e-15 MPIFFT_Procs=4 MaxPingPongLatency_usec=47.3261 RandomlyOrderedRingLatency_usec=32.9747 MinPingPongBandwidth_GBytes=0.116273 NaturallyOrderedRingBandwidth_GBytes=0.0980679 RandomlyOrderedRingBandwidth_GBytes=0.0982163 MinPingPongLatency_usec=46.4916 AvgPingPongLatency_usec=46.8989 MaxPingPongBandwidth_GBytes=0.116354 AvgPingPongBandwidth_GBytes=0.11631 NaturallyOrderedRingLatency_usec=33.2117 FFTEnblk=16 FFTEnp=8 FFTEl2size=1048576 M_OPENMP=-1 omp_get_num_threads=0 omp_get_max_threads=0 omp_get_num_procs=0 MemProc=-1 MemSpec=-1 MemVal=-1 MPIFFT_time0=9.53674e-07 MPIFFT_time1=1.45007 MPIFFT_time2=0.579417 MPIFFT_time3=1.27775 MPIFFT_time4=1.60457 MPIFFT_time5=1.41079 MPIFFT_time6=2.14577e-06 CPS_HPCC_FFT_235=0 CPS_HPCC_FFTW_ESTIMATE=0 CPS_HPCC_MEMALLCTR=0 CPS_HPL_USE_GETPROCESSTIMES=0 CPS_RA_SANDIA_NOPT=0 CPS_RA_SANDIA_OPT2=0 CPS_USING_FFTW=0 End of Summary section. ######################################################################## End of HPC Challenge tests. Current time (1211592032) is Sat May 24 10:20:32 2008 ########################################################################