GPUs are essential in providing the computational power required to deploy AI models for large-scale, pre-trained models in various machine learning areas such as computer vision, natural language processing, and multimodal learning. Currently, AI practitioners now have minimal options in terms of choosing high-performance GPU inference solutions due to the nature of their platform. A machine learning system built for one company’s GPU must be completely reimplemented to run on hardware from a different technology vendor. Because of hardware dependencies in complex runtime environments, it is difficult to maintain the code that makes up these solutions. Additionally, AI production pipelines often need rapid development. Although proprietary software toolkits like TensorRT provide customization options, they often fail to meet this demand. Further reducing development speed, a proprietary solution may make it difficult to debug code quickly.
Meta AI created AIT Template (AIT)AMD and NVIDIA GPU technology, a unified open source heuristics solution with distinct acceleration backends to tackle these industry challenges. In a range of popular AI models, including convolutional neural networks, switches, and diffusers, it offers nearly identical performance to that of the Tensor Core (NVIDIA GPU) and Matrix Core (AMD GPU) architectures. The team improved performance by up to 12x on NVIDIA GPUs when using AIT and 4x on AMD GPUs when using PyTorch reckless mode. Currently, AITemplate is enabled on NVIDIA’s A100 and AMD’s MI200 GPU systems, both of which are commonly used in technology companies’ data centers, research facilities, and cloud computing service providers.
AITemplate is a Python system that converts AI models into high-performance C++ GPU template code for faster inference. A front layer that performs various graph transformations for graph optimization and a back layer that produces C++ kernel templates for the target GPU that makes up the system. The vision behind the framework is to support high speed while remaining simple. The project includes several performance advancements, such as improved core fusion, an optimization technology that unites multiple cores into one core to run more efficiently, and advanced optimizations for the switcher block. These improvements greatly increase the use of AMD’s Matrix Cores and NVIDIA’s Tensor Cores, resulting in improved performance. Additionally, AIT keeps its reliance on external libraries to a minimum.
With its support for three advanced optimizations – vertical and horizontal merger and memory – AITemplate boasts one of the company’s most advanced kernel integration systems. Moreover, ease of deployment makes AITemplate a viable solution. A stand-alone binary that contains an AI model is created. This duo has good backward compatibility because it can work in any environment with the same hardware and newer CUDA 11 / ROCM 5 versions. In addition, AITemplate offers previously used templates (eg VisionTransformer, BERT, Stable Propagation, ResNet, MaskRCNN). This simplifies deployment procedures and makes it easier for professionals to deploy pre-trained PyTorch templates. The Python Jinja2 model and the GPU Tensor Core/Matrix Core C++ model are two layers of template systems that make up the AIT template. After profiling in Python, the system converts the Jinja2 template into C++ code to determine the optimal kernel setup. The model’s final binary code is generated by compiling the source code generated using the C++ GPU compiler. Users can convert their models from a variety of frameworks, including PyTorch, to AITemplate due to the front-end design, which is similar to PyTorch.
In addition to increasing the number of platforms available for AI, Meta AI hopes to develop technologies that can also help solve environmental problems by lowering carbon emissions. According to studies, the use of GPUs can affect carbon emissions. AITemplate speeds up GPU execution, which can further reduce emissions. To summarize, AITemplate delivers high-end performance for current generation AMD and NVIDIA GPUs with minimal system complexity. However, according to the researchers, they are still at the beginning of developing a high-performance AI inference engine. They are actively trying to improve the AIT template with new optimizations and full support for dynamic shapes. Their long-term goals include expanding the AIT template to include more hardware platforms from different technology vendors. Meta aims to create an AI inference ecosystem that is greener and more efficient, with more outstanding performance, flexibility, and backend options, and the development of AITemplate is a stepping stone in this direction.
This Article is written as a research summary article by Marktechpost Staff based on the research article 'Faster, more flexible inference on GPUs using AITemplate, a revolutionary new inference engine'. All Credit For This Research Goes To Researchers on This Project. Check out the code and reference article. Please Don't Forget To Join Our ML Subreddit
Khushbu Gupta is a Consultant Intern at MarktechPost. She is currently pursuing her Bachelor of Technology degree from the Indian Institute of Technology (IIT), Goa. She is passionate about the fields of machine learning, natural language processing, and web development. You enjoy learning more about the technical field by participating in many challenges.