A talk about vectorized computing. See Slides.
Method | N = 1 | N = 2 | N = 4 | N = 8 | N = 16 | N = 32 | N = 64 | N = 128 | N = 256 | N = 512 | N = 1024 | N = 2048 |
---|---|---|---|---|---|---|---|---|---|---|---|---|
k_torch_gpu | 0.000072956 | 0.000113964 | 0.000185013 | 0.000362158 | 0.000622988 | 0.001224041 | 0.002696037 | 0.004850149 | 0.009469986 | 0.035413027 | 0.242728949 | 1.889190912 |
k_tf_gpu | 0.000643015 | 0.000817060 | 0.000756979 | 0.000926971 | 0.001060009 | 0.002470016 | 0.002572060 | 0.004024029 | 0.008621931 | 0.039995909 | 0.251790047 | 1.896428108 |
for_c | 0.000000113 | 0.000000167 | 0.000000217 | 0.000000861 | 0.000004889 | 0.000035312 | 0.000230443 | 0.001611961 | 0.012511251 | 0.099369949 | 0.753857444 | 7.459837317 |
k_tf | 0.000280857 | 0.000349045 | 0.000442982 | 0.000526905 | 0.000730991 | 0.001588821 | 0.005894899 | 0.014322042 | 0.057121992 | 0.350581884 | 1.829533100 | N/A |
k_numpy | 0.000006914 | 0.000011921 | 0.000021935 | 0.000036001 | 0.000084877 | 0.000257969 | 0.001464844 | 0.008228064 | 0.049238920 | 0.366661072 | 3.682672977 | N/A |
for_java | 0.000001093 | 0.000001717 | 0.000009855 | 0.000084246 | 0.000318320 | 0.002018801 | 0.002554220 | 0.009146232 | 0.069185611 | 0.494508200 | 3.779792773 | N/A |
k_torch | 0.000030994 | 0.000026941 | 0.000050068 | 0.000090837 | 0.000199080 | 0.000466108 | 0.001630068 | 0.008022070 | 0.081236839 | 0.663809061 | 4.677886009 | N/A |
for_go | 0.000000347 | 0.000000490 | 0.000001089 | 0.000003800 | 0.000030790 | 0.000244552 | 0.001886645 | 0.013611769 | 0.141340720 | 0.867058905 | 7.085346021 | N/A |
for_python | 0.000002861 | 0.000006199 | 0.000025034 | 0.000159979 | 0.001207113 | 0.009351969 | 0.075846910 | 0.586315155 | 4.611817837 | N/A | N/A | N/A |
Method | N = 1 | N = 2 | N = 4 | N = 8 | N = 16 | N = 32 | N = 64 | N = 128 | N = 256 | N = 512 | N = 1024 | N = 2048 | N = 4096 | N = 8192 | N = 16384 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
mat_torch_gpu | 0.000042200 | 0.000043154 | 0.000039816 | 0.000041962 | 0.000041962 | 0.000059843 | 0.000072956 | 0.000087023 | 0.000123978 | 0.000519991 | 0.002180099 | 0.014719963 | 0.112880945 | 0.894821167 | 7.136865139 |
mat_tf_gpu | 0.000778913 | 0.000824928 | 0.000689983 | 0.000641108 | 0.000914812 | 0.000649929 | 0.000662088 | 0.000764132 | 0.001091957 | 0.002025843 | 0.006282806 | 0.027856112 | 0.168075085 | 1.094542980 | N/A |
mat_torch | 0.000009060 | 0.000016212 | 0.000008821 | 0.000010014 | 0.000011206 | 0.000020027 | 0.000046015 | 0.000057936 | 0.000475883 | 0.001228809 | 0.012756109 | 0.030473948 | 0.245604038 | 1.761484146 | N/A |
mat_numpy | 0.000002861 | 0.000011921 | 0.000004053 | 0.000004053 | 0.000004053 | 0.000010014 | 0.000032902 | 0.000119925 | 0.000396013 | 0.001561880 | 0.007282019 | 0.057345867 | 0.338021994 | 2.231496096 | N/A |
mat_tf | 0.000308990 | 0.000387907 | 0.000346899 | 0.000319004 | 0.000363111 | 0.000308037 | 0.000550985 | 0.000591993 | 0.001109838 | 0.003041029 | 0.013917923 | 0.098945141 | 0.722618103 | 6.150835991 | N/A |
full_torch_gpu | 0.000050068 | 0.000049829 | 0.000051975 | 0.000051022 | 0.000056982 | 0.000055075 | 0.000074863 | 0.000285149 | 0.001986027 | 0.016274929 | 0.150403023 | N/A | N/A | N/A | N/A |
full_tf_gpu | 0.000661850 | 0.000688076 | 0.000707150 | 0.000804901 | 0.000716925 | 0.000770092 | 0.000781059 | 0.001224041 | 0.005397081 | 0.027981997 | 0.212766171 | N/A | N/A | N/A | N/A |
for_c | 0.000000195 | 0.000000305 | 0.000000363 | 0.000001091 | 0.000007134 | 0.000054733 | 0.000500424 | 0.003967320 | 0.028752040 | 0.241914231 | 1.795387497 | N/A | N/A | N/A | N/A |
for_java | 0.000000745 | 0.000001102 | 0.000003384 | 0.000022905 | 0.000146022 | 0.001175193 | 0.001350706 | 0.002579463 | 0.026038331 | 0.262206680 | 2.097773528 | N/A | N/A | N/A | N/A |
full_numpy | 0.000013113 | 0.000015974 | 0.000019073 | 0.000020981 | 0.000036955 | 0.000179052 | 0.001208067 | 0.006085157 | 0.058477879 | 0.478839159 | 3.935982943 | N/A | N/A | N/A | N/A |
full_torch | 0.000014067 | 0.000014067 | 0.000015020 | 0.000017166 | 0.000028849 | 0.000179052 | 0.001137018 | 0.006094217 | 0.062695980 | 0.513103008 | 4.077303886 | N/A | N/A | N/A | N/A |
full_tf | 0.000339985 | 0.000383854 | 0.000440121 | 0.000437021 | 0.000643969 | 0.000765085 | 0.002549887 | 0.011502981 | 0.087670803 | 0.648772001 | 5.210550070 | N/A | N/A | N/A | N/A |
for_go | 0.000000316 | 0.000000352 | 0.000000655 | 0.000002310 | 0.000018363 | 0.000176084 | 0.001328841 | 0.009536538 | 0.098996241 | 0.682685794 | 7.725187447 | N/A | N/A | N/A | N/A |
ij_numpy | 0.000015020 | 0.000030041 | 0.000103951 | 0.000366926 | 0.001419067 | 0.005738974 | 0.022729158 | 0.083940983 | 0.377239943 | 1.731176138 | N/A | N/A | N/A | N/A | N/A |
ij_torch | 0.000014067 | 0.000030041 | 0.000098944 | 0.000388861 | 0.001510143 | 0.005847216 | 0.024742126 | 0.105310917 | 0.463705063 | 2.087431908 | N/A | N/A | N/A | N/A | N/A |
for_python | 0.000001907 | 0.000005007 | 0.000016928 | 0.000097990 | 0.000771046 | 0.005473852 | 0.041965008 | 0.344128847 | 2.681012154 | N/A | N/A | N/A | N/A | N/A | N/A |
ij_torch_gpu | 0.000072956 | 0.000236034 | 0.000869989 | 0.003289223 | 0.013335943 | 0.053617954 | 0.212279081 | 0.866909027 | 3.467227936 | N/A | N/A | N/A | N/A | N/A | N/A |
ij_tf | 0.000290155 | 0.000429869 | 0.000504971 | 0.000839949 | 0.001430035 | 0.003952980 | 0.012485027 | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A |
ij_tf_gpu | 0.000667095 | 0.000996828 | 0.001271009 | 0.002653122 | 0.006880999 | 0.025680065 | 0.057451963 | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A |
Object | Environments |
---|---|
CPU | Intel(R) Xeon(R) CPU E5-2650 v3 @ 2.30GHz x20 |
Memory | 65860112 KB |
GPU | Tesla K40m 11439MiB cuda8.0 cudnn6.0 |
OS | CentOS Linux release 7.0.1406 (Core) |
Python | Python 2.7.13 :: Anaconda custom (64-bit) |
NumPy | 1.13.3 |
PyTorch | 0.2.0_3 |
TensorFlow | 1.4.0 |
GCC | 4.8.5 |
Java | 1.8.0_112 |
Go | 1.9.2 |