A guide to setting up KataGo and KaTrain with TensorRT
Someone asked for a guide to setting up KataGo/KaTrain with TensorRT so I figured it was worth a post capturing my notes. KataGo is significantly faster via TensorRT than OpenCL. Ask if you want me to run a quick benchmark.
To use TensorRT you need the CUDA toolkit (so yeah sorry this only applies to people with relatively modern Nvidia GPUs). Here are the install guides if you want to use follow them: CUDA Toolkit, TensorRT. Saddly you need to make a free Nvidia account.
The steps are:
Install CUDA Toolkit. Technically you need 12.4 but they've gotten better about backward compatibility so I'm guessing all the newer versions will work.
You used to need to set environment variables but I think the installer now does it for use. I have
CUDA_PATH_V12_4 set to C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4
Download and install TensorRT. You need the Zip version instructions for windows.
Add TensorRT to your path via set env variables above. I have both
You are mostly done. Download the latest KataGo. Get something that looks like TRT8.6.1-CUDA12.1 (or higher numbers). Download it somewhere, rename the katago.exe file to something a bit more descriptive.
Download the latest stable model (checkout the fancy new b28 models)
Only once the benchmark is running properly (it should say it's using TensorRT somewhere I think), copy everything over to your Katrain directory (it's usually in your user directory as .katrain).
Enjoy the firepower of your fully armed and operational battlestation
Lightvector/icosaplex is amazing and we all owe him!
Watchout - the initialization time with TensorRT is stupidly long vs CPU or even OpenCL (surely this is fixable somehow?)作者: 大桥英雄 时间: 2024-7-30 13:10
注意:必须有N卡才行。2个文档已经很清楚了,互补一下刚刚好。作者: 大桥英雄 时间: 2024-7-30 13:24
KataGo is really way faster with TensorRT
I just figured out how to run KataGo with TensorRT instead of OpenCL, I expected slightly better performance but it turned out to be more than twice as fast: from 290 to 673 visits/s on my laptop (3050 RTX). I want to share in case someone else is missing out on this.
This only works with nVidia graphics cards, and I only know Linux, not Windows or macOS.
So there are 3 so called compute backends: OpenCL, CUDA and TensorRT. This is clearly explained in the KataGo repository, it's nothing new. But there are two cruxes:
Our beloved KaTrain lets you choose between KataGo versions and then automatically downloads and configures them, but as far as I know it never shows the faster CUDA and TensorRT options, only OpenCL.
While KataGo provides precompiled binaries for CUDA and TensorRT, those are compiled for specific versions of CUDA/cuDNN/TensorRT which can make them difficult to run if your Linux distro provides newer versions.
I got hung up on KataGo's documentation saying that it has to be version this and that of the nVidia stuff, and didn't realize that if you just compile it yourself then it works fine with the latest versions of everything.
All I had to do differently was to add add_compile_options(-fpermissive) to CMakeLists.txt because the latest version of gcc is stricter and failed otherwise. So it was just cmake . -DUSE_BACKEND=TENSORRT + make -j 8
----------------------------------------------------------------------
KataGo 借助 TensorRT 确实速度更快
我刚刚弄清楚了如何使用 TensorRT 而不是 OpenCL 来运行 KataGo,我期望性能会稍微好一些,但结果却快了两倍多:在我的笔记本电脑 (3050 RTX) 上,访问次数从 290 次增加到 673 次/秒。我想分享一下,以防其他人错过这个。