OctoML has announced faster machine learning model inferencing on Apple’s M1 chip than Apple’s latest Core ML 4 inference engine.
OctoML’s results showcased lower model latency than any of Apple’s own developed software, ranging from a 30% improvement against Apple’s latest Core ML 4 inference engine to a 13x improvement on Apple’s standard Core ML 3.
All comparisons were based on the BERT-base model, a common deep learning model used widely for natural language processing tasks, and conducted on both the Mac Mini CPU and GPU.
Apple’s latest Core ML 4 resulted in 139 milliseconds of latency on the CPU and 59 milliseconds in latency on GPU. In contrast, OctoML’s work delivered model latency of 108 milliseconds on the CPU and 42 milliseconds on the GPU.
According to OctoML, these performance improvements represent a 22% improvement on the CPU and nearly 30% improvement on the GPU and are especially notable because they were produced automatically and only weeks after Apple’s public launch of the M1 chip.
Other performance comparisons included Keras with MLCompute and TensorFlow with Graphdef. For Keras, the Apple M1’s latency on CPU was 579 milliseconds and on GPU was 1,767 milliseconds.
For TensorFlow, the M1 demonstrated 512 milliseconds of latency on the CPU and 543 milliseconds on the GPU.