Installation error (GGEMSOpenCLManager)

Hi!

First of all thanks a lot for providing this tool! It looks like it could be quite helpful for me, but unfortunately I have troubles getting it up and running.
I followed the installation guide for Windows and left the OpenGL visualization flag in CMake unchecked for now. Compiling and installing with nmake worked, and so did the setting of the environment variables.
However, when I verify the installation in a Python console, I get the following error when running the line opencl_manager = GGEMSOpenCLManager():

I installed Cuda 11.6 and have a NVIDIA GeForce RTX 3080 Ti.

Do you have any idea why the installation has not worked for me?

Thank you!
Anneke

Hi Anneke,

Thank you for using GGEMS!
I have never encountered this error. Your installation of GGEMS seems correct, but I think that the problem comes rather from the installation of the NVIDIA driver. It’s like GGEMS couldn’t find any OpenCL devices on your PC.

In your terminal try the command ‘nvidia-smi’ to see if you see your graphics card.

In python, after the command ‘from ggems import *’, run the command ‘GGEMSVerbosity(3)’, maybe you will have more information regarding your error.

Also I have never used python with a Conda environment. If you can try GGEMS on a standard python console. But I don’t think that solves the problem.

Kind regards,
Didier

Hi Didier,

thanks a lot for your quick reply!

I can see the NVIDIA graphics card.

… as I can not put two images as a new user in one post, here is the second part of my answer:

Changing the verbosity level does not bring much new information. And running the code with a standard python console did not solve the problem, unfortunately:

Can I assume that GGEMS tries to run the code on the GPU or may it also try to use the CPU?

Kind regards,
Anneke

Hi Anneke,

OpenCL initializes all the devices it finds if the drivers are correctly installed.

The problem comes from the GGEMSOpenCLManager::GetOpenCLDevices method in the src/GGEMSOpenCLManager.cc file. But I don’t know which function throws an error, because I can’t reproduce your error on my PC. If you look in the file, there is a problem with one of the ‘getInfo’ methods.

If you can, try to see which function does not work. To do a quick test comment out these lines (I’ve had trouble with these functions before):

CheckOpenCLError(devices_[i]->getInfo(CL_DEVICE_VERSION, &char_data), "GGEMSOpenCLManager", "GGEMSOpenCLManager");
device_version_.push_back(std::string(char_data));

CheckOpenCLError(devices_[i]->getInfo(CL_DRIVER_VERSION, &char_data), "GGEMSOpenCLManager", "GGEMSOpenCLManager");
device_driver_version_.push_back(std::string(char_data));

CheckOpenCLError(devices_[i]->getInfo(CL_DEVICE_OPENCL_C_VERSION, &char_data), "GGEMSOpenCLManager", "GGEMSOpenCLManager");
device_opencl_c_version_.push_back(std::string(char_data));

To debug, can you print the number of platforms and devices that OpenCL finds? In C++, these lines must be placed in the src/GGEMSOpenCLManager.cc file (line 189):

std::cout << "n platform: " << platforms_.size() << std::endl;
std::cout << "n device: " << devices_.size() << std::endl;

I want to be sure that OpenCL detects at least 1 platform and 1 device

I keep trying to figure out where the problem is.

Kind regards
Didier

Hi Didier,

indeed, the CheckOpenCLError function caused the problems in my case. If GGEMSOpenCLManager.cc in l. 282-285 is changed to

if (device_native_vector_width_half_[i] != 0) {
devices_[i]->getInfo(CL_DEVICE_HALF_FP_CONFIG, &device_fp_config);
device_half_fp_config_.push_back(device_fp_config);
}

I do not get any errors anymore.

For the matter of completeness: OpenCL detected 5 platform and 5 devices.

Thanks a lot for your help!
Anneke

Thanks Anneke.
Surely you have 16-bit float support on your PC, I had never entered this condition to test it. I don’t have a device that supports 16bit float.
To correct your problem more generally, I think you have to add 16-bit support (like 64-bit) directly in the file ‘include/GGEMS/tools/GGEMSTypes.hh’ line 118

#if defined(cl_khr_fp16)
#pragma OPENCL EXTENSION cl_khr_fp16: enable
#endif

I will make a commit to fix this issue.

Thanks
Didier

Ah, I see. Thank you!
I pulled and built the newest version of the repo but unfortunately, I still get the CL_INVALID_VALUE error. Removing the CheckOpenCLError function again solves the problem.

Kind regards,
Anneke

Ok, I removed that part of the code. The CL_DEVICE_HALF_FP_CONFIG variable can no longer be used with the ‘getInfo’ method with OpenCL version 3. I hadn’t checked.

Thanks again for this bug
Didier