## Memory Allocators

OpenMP memory allocators can be used to allocate memory with specific allocator traits.  In the following example an OpenMP allocator is used to specify an alignment for arrays  _x_  and  _y_ . The general approach for attributing traits to variables allocated by OpenMP is to create or specify a pre-defined  _memory space_ , create an array of  _traits_ , and then form an  _allocator_  from the memory space and trait. The allocator is then specified in an OpenMP allocation (using an API  _omp_alloc()_  function for C/C++ code and an __allocators__ directive for Fortran code in the  _allocators.1_  example).

In the example below the  _xy_memspace_  variable is declared and assigned the default memory space ( _omp_default_mem_space_ ). Next, an array for  _traits_  is created. Since only one trait will be used, the array size is  _1_ . A trait is a structure in C/C++ and a derived type in Fortran, containing 2 components: a key and a corresponding value (key-value pair). The trait key used here is  _omp_atk_alignment_  (an enum for C/C++ and a parameter for Fortran) and the trait value of 64 is specified in the  _xy_traits_  declaration. These declarations are followed by a call to the  _omp_init_allocator()_  function to combine the memory space ( _xy_memspace_ ) and the traits ( _xy_traits_ ) to form an allocator ( _xy_alloc_ ).

In the C/C++ code the API   _omp_allocate()_  function is used to allocate space, similar to  _malloc_ , except that the allocator is specified as the second argument. In Fortran an __allocators__ directive is used to specify an allocator for the following Fortran  _allocate_  statement. A variable list in the __allocate__ clause may be supplied if the allocator is to be applied to a subset of variables in the Fortran allocate statement. Here, the  _xy_alloc_  allocator is specified in the modifier of the __allocator__ clause, and the set of all variables used in the  _allocate_  statement is specified in the list.

In [None]:
//%compiler: clang
//%cflags: -fopenmp

/*
* name: allocators.1
* type: C
* version: omp_5.0
*/
#include    <omp.h>
#include  <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#define N 1000

int main()
{
   float  *x, *y;
   float s=2.0;

   omp_memspace_handle_t  xy_memspace = omp_default_mem_space;
   omp_alloctrait_t       xy_traits[1]= {omp_atk_alignment, 64};
   omp_allocator_handle_t xy_alloc    =
                           omp_init_allocator(xy_memspace,1,xy_traits);


   x=(float *)omp_alloc(N*sizeof(float), xy_alloc);
   y=(float *)omp_alloc(N*sizeof(float), xy_alloc);

   if( ((intptr_t)(y))%64 != 0 || ((intptr_t)(x))%64 != 0 )
   { printf("ERROR: x|y not 64-Byte aligned\n"); exit(1); }

   #pragma omp parallel
   {
      #pragma omp for simd simdlen(16) aligned(x,y:64)
      for(int i=0; i<N; i++){ x[i]=i+1; y[i]=i+1; } // initialize

      #pragma omp for simd simdlen(16) aligned(x,y:64)
      for(int i=0; i<N; i++) y[i] = s*x[i] + y[i];
    }

   printf("y[0],y[N-1]: %5.0f %5.0f\n",y[0],y[N-1]);
   // output y[0],y[N-1]: 3 3000

   omp_free(x, xy_alloc);
   omp_free(y, xy_alloc);
   omp_destroy_allocator(xy_alloc);

   return 0;
}

In [None]:
!!%compiler: gfortran
!!%cflags: -fopenmp

! name: allocators.1
! type: F-free
! version: omp_5.2
program main
 use omp_lib

 integer, parameter :: N=1000
 real, allocatable  :: x(:),y(:)
 real               :: s = 2.0e0
 integer            :: i

 integer(omp_memspace_handle_kind ) :: xy_memspace = omp_default_mem_space
 type(   omp_alloctrait           ) :: xy_traits(1) = &
                                    [omp_alloctrait(omp_atk_alignment,64)]
 integer(omp_allocator_handle_kind) :: xy_alloc

   xy_alloc   =    omp_init_allocator(   xy_memspace, 1, xy_traits)

   !$omp allocators allocate(allocator(xy_alloc): x, y)
   allocate(x(N),y(N))
                         !! loc is non-standard, but found everywhere
                         !! remove these lines if not available
   if(modulo(loc(x),64) /= 0 .and. modulo(loc(y),64) /=0 ) then
      print*,"ERROR: x|y not 64-byte aligned"; stop
   endif

   !$omp parallel

      !$omp do simd simdlen(16) aligned(x,y: 64) !! 64B aligned
      do i=1,N  !! initialize
        x(i)=i
        y(i)=i
      end do

      !$omp do simd simdlen(16) aligned(x,y: 64) !! 64B aligned
      do i = 1,N
         y(i) = s*x(i) + y(i)
      end do

   !$omp end parallel

   write(*,'("y(1),y(N):",2f6.0)') y(1),y(N) !!output: y... 3. 3000.

   deallocate(x,y)
   call omp_destroy_allocator(xy_alloc)

end program

When using the __allocators__ construct with optional clauses in Fortran code,  users should be aware of the behavior of a reallocation.

In the following example, the  _a_  variable is allocated with 64-byte alignment through the __align__ clause of the __allocators__ construct. The alignment of the newly allocated object,  _a_ , in the (reallocation) assignment  _a = b_  will not be reallocated with the 64-byte alignment, but with the 32-byte alignment prescribed by the trait of the  _my_alloctr_   allocator. It is best to avoid this problem by constructing and using an allocator (not the __align__ clause) with the required alignment in  the __allocators__ construct. Note that in the subsequent deallocation of  _a_  the deallocation must precede the destruction of the allocator used in the allocation of  _a_ .

In [None]:
!!%compiler: gfortran
!!%cflags: -fopenmp

! name: allocators.2
! type: F-free
! version: omp_5.2
program main
   use omp_lib
   implicit none

   integer, parameter :: align_32=32
   real, allocatable  :: a(:,:)
   real               :: b(10,10)

   integer(omp_memspace_handle_kind ) :: my_memspace
   type(   omp_alloctrait           ) :: my_traits(1)
   integer(omp_allocator_handle_kind) :: my_alloctr

   my_memspace  =  omp_default_mem_space
   my_traits    = [omp_alloctrait(omp_atk_alignment,align_32)]
!                                     allocator alignment ^^
   my_alloctr   =  omp_init_allocator(my_memspace, 1, my_traits)

   !$omp allocators allocate(allocator(my_alloctr), align(64): a)
   allocate(a(5,5)) ! 64-byte aligned by clause <---------^^

   a = b  ! reallocation occurs with 32-byte alignment
          ! uses just my_alloctr (32-byte align from allocator)

   deallocate(a)  ! Uses my_alloctr in deallocation.
   call omp_destroy_allocator(my_alloctr)

end program main

When creating and using an __allocators__ construct within a Fortran procedure for allocating storage (and subsequently freeing the allocator storage with an  __omp_destroy_allocator__ construct), users should be aware of the necessity of using an explicit Fortran deallocation instead of relying on auto-deallocation.

In the following example, a user-defined allocator is used in the allocation of the  _c_  variable, and then the allocator is destroyed. Auto-deallocation at the end of the  _broken_auto_deallocation_  procedure will fail without the allocator, hence an explicit deallocation should be used  (before the __omp_destroy_allocator__ construct). Note that an allocator may be specified directly in the __allocate__ clause without using the __allocator__ complex modifier, so long as no other modifier  is specified in the clause.

In [None]:
!!%compiler: gfortran
!!%cflags: -fopenmp

! name: allocators.3
! type: F-free
! version: omp_5.2
subroutine broken_auto_deallocation
   use omp_lib
   implicit none
   integer, parameter :: align_32=32
   real, allocatable  :: c(:)

   integer(omp_memspace_handle_kind ) :: my_memspace
   type(   omp_alloctrait           ) :: my_traits(1)
   integer(omp_allocator_handle_kind) :: my_alloctr

   my_memspace  =  omp_default_mem_space
   my_traits    = [omp_alloctrait(omp_atk_alignment,align_32)]
   my_alloctr   =  omp_init_allocator(my_memspace, 1, my_traits)

   !$omp allocators allocate(my_alloctr: c)
   allocate(c(100))

   !...

   call omp_destroy_allocator(my_alloctr)
   ! Auto-deallocation of c fails,
   ! because my_alloctr is no longer available.

end subroutine

The __allocate__ directive is a convenient way to apply an OpenMP  allocator to the allocation of declared variables.

This example illustrates the allocation of specific types of storage in a program  for use in libraries, privatized variables, and with offloading.

Two groups of variables, { _v1, v2_ } and { _v3, v4_ }, are used with the __allocate__  directive, and the { _v5, v6_ } pair is used with the __allocate__ clause.  Here we explicitly use predefined allocators __omp_high_bw_mem_alloc__ and __omp_default_mem_alloc__ with the __allocate__ directive in CASE 1. Similar effects are achieved for private variables of a task by using the __allocate__ clause, as shown in CASE 2.

Note, when the __allocate__ directive does not specify an __allocator__ clause, an implementation-defined default, stored in the  _def-allocator-var_  ICV, is used (not illustrated here). Users can set and get the default allocator with the __omp_set_default_allocator__ and __omp_get_default_allocator__ API routines.

In [None]:
//%compiler: clang
//%cflags: -fopenmp

/*
* name: allocators.4
* type: C
* version: omp_5.1
*/
#include <omp.h>
#include <stdio.h>

void my_init(double *,double *,int, double *,double *,int, \
             double *,double *,int);
void lib_saxpy(double *,double *,double,int);
void my_gather(double *,double *,int);

#pragma omp begin declare target
void my_gpu_vxv(double *, double *, int);
#pragma omp end  declare target

#define Nhb 1024*1024      // high bandwith
#define Nbg 1024*1024*64   // big memory, default
#define Nll 1024*1024      // low latency memory

void test_allocate() {

  double  v1[Nhb], v2[Nhb];
  double  v3[Nbg], v4[Nbg];
  double  v5[Nll], v6[Nll];

/_* CASE 1: USING ALLOCATE DIRECTIVE _*/
  #pragma omp allocate(v1,v2) allocator(omp_high_bw_mem_alloc)
  #pragma omp allocate(v3,v4) allocator(omp_default_mem_alloc)

  my_init(v1,v2,Nhb, v3,v4,Nbg, v5,v6,Nll);

  lib_saxpy(v1,v2,5.0,Nhb);

  #pragma omp target map(to: v3[0:Nbg], v4[0:Nbg]) map(from:v3[0:Nbg])
  my_gpu_vxv(v3,v4,Nbg);

/_* CASE 2: USING ALLOCATE CLAUSE _*/
  #pragma omp task private(v5,v6) \
                   allocate(allocator(omp_low_lat_mem_alloc): v5,v6)
  {
    my_gather(v5,v6,Nll);
  }

}

In [None]:
!!%compiler: gfortran
!!%cflags: -fopenmp

! name: allocators.4
! type: F-free
! version: omp_5.1
subroutine test_allocate
   use omp_lib

   interface
     subroutine my_gpu_vxv(va,vb,n)
     !$omp declare target
     integer :: n
     double precision  :: va(n), vb(n)
     end subroutine
   end interface

   integer,parameter :: Nhb=1024*1024,   & !! high bandwith
                        Nbg=1024*1024*64,& !! big memory, default
                        Nll=1024*1024      !! low latency memory

   double precision  ::  v1(Nhb), v2(Nhb)
   double precision  ::  v3(Nbg), v4(Nbg)
   double precision  ::  v5(Nll), v6(Nll)

 !_* CASE 1: USING ALLOCATE DIRECTIVE _*!
   !$omp allocate(v1,v2) allocator(omp_high_bw_mem_alloc)
   !$omp allocate(v3,v4) allocator(omp_default_mem_alloc)

   call my_init(v1,v2,Nhb, v3,v4,Nbg, v5,v6,Nll)

   call lib_saxpy(v1,v2,5.0,Nhb)

   !$omp target map(to: v3, v4) map(from:v3)
      call my_gpu_vxv(v3,v4,Nbg)
   !$omp end target

 !_* CASE 2: USING ALLOCATE CLAUSE _*!
   !$omp task private(v5,v6) &
   !$omp&     allocate(allocator(omp_low_lat_mem_alloc): v5,v6)
      call my_gather(v5,v6,Nll)
   !$omp end task

end subroutine test_allocate