Madgwick.xyz

September 18, 2018 (2018-09-18)
Updated: March 15, 2019 (2019-03-15)

Notes on AMDs (now deprecated) HC API

On this page I put up some notes and examples on programming using AMDs HC API (based on C++ AMP). This API is now deprecated and this page will no longer be updated. Unfortunately, despite having potential, not much was written about C++ AMP and AMDs HC derivative barely even had basic documentation. The best sources of guidance covered C++ AMP and were on Microsoft's website.

Examples of two different methods to use array_lists globally

When answering an issue I saw on the HCC github1 I wrote some example code to check that my ideas worked properly so I thought I would repeat it here.

Direct use of a global array_view is not allowed in restricted [[hc]] sections, but confusingly it is allowed on the host (anything outside [[hc]] sections). I was able to find a solution for C++ AMP using pointers2 but I also found an easier way by using an array_list constructor which performs a shallow copy. The poster on github said "I need global array with some constants computed once", their problem was that direct access to the global from inside [[hc]] GPU sections wasn't allowed.

The example program below demonstrates a global array_view being used in two functions plus main, by using the shallow copy method:

#include <hc.hpp>
#include <iostream>
#pragma GCC diagnostic warning "-Wall"

//Initial data initialization only happens once globally
float initialVals[] = {2,4,8,16,32}; 
hc::array_view<float> globalView(5,initialVals);

hc::array_view<float> doWorkNo1()//Example 1 of using global array_view
{
	hc::array_view<float> newLocalView(globalView);//create a LOCAL shallow copy of the global
	hc::array_view<float> result(5);//store for results
	hc::parallel_for_each(newLocalView.get_extent(), [=] (hc::index<1> idx) [[hc]]
	{
		result[idx] = newLocalView[idx]*2;//double the value in the array
	});
	return result;
}
hc::array_view<float> doWorkNo2()//Example 2 of using global array_view
{
	hc::array_view<float> newLocalView(globalView);//create a LOCAL shallow copy of the global
	hc::array_view<float> result(5);//store for results
	hc::parallel_for_each(newLocalView.get_extent(), [=] (hc::index<1> idx) [[hc]]
	{
		result[idx] = hc::fast_math::exp(newLocalView[idx]);//exp of the value in the array
	});
	return result;
}
int main()
{
	hc::array_view<float> newLocalView(globalView);//create a LOCAL shallow copy of the global
	std::cout << "GLOBAL array_view Contents: ";
	for(int i = 0; i < 5; i++) {
		std::cout << newLocalView[i] << ",";} //print out global array and check its all in there OK
	std::cout << std::endl;
	hc::array_view<float> func1Result = doWorkNo1();//get result of Example 1
	hc::array_view<float> func2Result = doWorkNo2();//get result of Example 2
	std::cout << "Example 1: ";
	for(int i = 0; i < 5; i++) {
		std::cout << func1Result[i] << ","; //print Example 1 Results
	}
	std::cout << std::endl << "Example 2: ";
	for(int i = 0; i < 5; i++) {
		std::cout << func2Result[i] << ","; //print Example 2 Results
	}
	std::cout << std::endl;
	return 0;
}

Here is another example program but this one uses the global pointer method instead:

#include <hc.hpp>
#include <iostream>
#pragma GCC diagnostic warning "-Wall"

//Declare & initialise the array with global scope
float initialVals[] = {2,4,8,16,32};
hc::array_view<float> *globalView = new hc::array_view<float>(5,initialVals);

hc::array_view<float> doWorkNo1()//Example 1 of using global array_view
{
	hc::array_view<float> newLocalView = *globalView;//create a LOCAL pointer to the global
	hc::array_view<float> result(5);//store for results
	hc::parallel_for_each(newLocalView.get_extent(), [=] (hc::index<1> idx) [[hc]]
	{
		result[idx] = newLocalView[idx]*2;//double the value in the array
	});
	return result;
}
hc::array_view<float> doWorkNo2()//Example 2 of using global array_view
{
	hc::array_view<float> newLocalView = *globalView;//create a LOCAL pointer to the global
	hc::array_view<float> result(5);//store for results
	hc::parallel_for_each(newLocalView.get_extent(), [=] (hc::index<1> idx) [[hc]]
	{
		result[idx] = hc::fast_math::exp(newLocalView[idx]);//exp of the value in the array
	});
	return result;
}
int main()
{
	hc::array_view<float> newLocalView = *globalView;//create a LOCAL pointer to the global
	std::cout << "GLOBAL array_view Contents: ";
	for(int i = 0; i < 5; i++) {
		std::cout << newLocalView[i] << ",";} //print out global array and check its all in there OK
	std::cout << std::endl;
	hc::array_view<float> func1Result = doWorkNo1();//get result of Example 1
	hc::array_view<float> func2Result = doWorkNo2();//get result of Example 2
	std::cout << "Example 1: ";
	for(int i = 0; i < 5; i++) {
		std::cout << func1Result[i] << ","; //print Example 1 Results
	}
	std::cout << std::endl << "Example 2: ";
	for(int i = 0; i < 5; i++) {
		std::cout << func2Result[i] << ","; //print Example 2 Results
	}
	std::cout << std::endl;
	return 0;
}

Why you should always initialise variables when using HC

When writing HC code you must make sure that all variables are initialised or you will find the code behaves strangely.

It is bad practice, and will not work correctly to do this:

int i,j;
j = i + 1

i is undefined as it wasn’t initialised. Depending on the compiler a value of zero may be assumed in which case j becomes 1. When compiling code that will run on the gpu the compiler doesn’t assume that you meant i to be 0. Instead i doesn’t point to anything and if you try to add to it nothing will happen. If compiled as part of a [[hc]] block the example above will give j = 0. Because j = i + 1 is interpreted as j = undefined + 1 and undefined + 1 = undefined and the integer representation of undefined in HCC seems to be 0.

Normally if the compiler doesn’t assume a value of 0 then you will get a random number the same as whatever was in the memory I happened to be pointing at. This doesn’t seem to be the case for GPU code in HC, presumably due to different memory management.

It seems like an easy error to avoid but I spent a long time wondering why code which worked on the CPU didn’t work on the GPU. The reason was I had a for loop which started like this:

for (int i; i < 5; i++)

The CPU compiler assumed that int i should be initialised as 0. And the loop worked fine. The GPU compiler didn’t assume that and refused to run the loop at all, skipping straight over it, causing my results to be different than when I ran the exact same code on the CPU.


References