Last week, Google published the results of testing the Tensor Processing Unit (TPU) branded solutions based on customized LSIs and compared them with the NVIDIA K80 accelerators and Intel Xeon processors. According to the results obtained in Google, to solve problems with deep machine learning TPU Google are 30-80 times better than the serial products of Intel and NVIDIA. Yesterday, NVIDIA questioned the validity of Google's findings. More precisely , it suggested comparing the capabilities
of Google's TPU with the current P40 accelerators, not the K80 adapters that were released five years ago.
According to NVIDIA, the P40 accelerators double the Google TPU by the ratio of decisions taken per second with a delay level of less than 10 ms. To be more precise, the operation for making a decision (inferences) already includes operations on machine training. At the same time, NVIDIA is confident that a combination of high performance in single-precision calculations at the level of 12 teraflops and a 10-fold advantage of the P40 when accessing memory allows the NVIDIA solution to demonstrate
superiority over the Google TPU.
At the same time, when working with 8-bit operations typical of tensor calculus, Google's TPU promises the most optimal execution of the computational load. The TPU consists of 98 executive blocks INT8, and in the GPU P40 there are only 48. Due to this TPU, Google shows a good result when processing tensor arrays with a consumption of 75 watts compared to the 250-watt NVIDIA P40 accelerator. It is worth noting that NVIDIA supported the decision of Google to create an accelerator for deep machine
learning using tensors. According to experts, this is the most correct way for processing large data sets for a wide range of tasks related to machine learning. Intel, by the way, also creates an accelerator for processing tensors (Lake Crest).
Related Products :
|