8 releases (4 major breaking)

5.0.0 Sep 8, 2024
4.0.1 Sep 2, 2024
3.0.0 Sep 1, 2024
2.1.1 Aug 30, 2024
1.0.0 Aug 29, 2024

#166 in Concurrency

Download history 224/week @ 2024-08-24 376/week @ 2024-08-31 138/week @ 2024-09-07

379 downloads per month

MIT license

2.5MB
520 lines

Contains (ELF lib, 8MB) examples/libEMCompute.so

EMCompute

Build

Why EMCompute?

  • its fast
  • its simple
  • its cross-platform
  • its configurable
  • it supports shading languages

Getting started

Tutorial

  • this tour is written for v2.1.1 and beyond , after that small changes happend to the api which can be followed from the NEWS section of this md file
  • first things which you have to create is struct of type CKernel which stands for Computing Kernel , this struct will act as an manifest of your task which must be done by GPU
typedef struct CKernel {
  uint32_t x;
  uint32_t y;
  uint32_t z;
  const char *code;
  const char *code_entry_point;
  struct GPUComputingConfig config;
} CKernel;
  • x , y and z fields are used to specify max number of workgroups in each dimension . (and in your kernel code you will specify how much threads each workgroup must have)
  • field code must contain your wgsl compute shader code (other shading languages will be supported soon)
  • code_entry_point field will must be set to a function which must be called by GPU for your task
  • config field will tell to API how GPU must treat with our tasks
  • for setting it manually check the comments on header files or https://docs.rs/EMCompute/latest/EMCompute/ but for making it easier you can use :
void set_kernel_default_config(struct CKernel *kernel);
  • you pass pointer of your CKernel var and its config will be set , its useful because it will meet needs of most of our tasks
  • now its gathering data time for GPU for that you have to use DataBinder and GroupOfBinders structs
typedef struct DataBinder {
  uint32_t bind;
  uintptr_t data_len;
  uint8_t *data;
} DataBinder;
  • in bind field you will provide bind index which in your kernel code exists to data goes there
  • data_len field must be : sizeof(your type) * real_len_of_your_array / sizeof(uint8_t)
  • data field must be a pointer to array of your data
  • now in GroupOfBinders you will set the group index and a pointer to all DataBinders which are in same group
typedef struct GroupOfBinders {
  uint32_t group;
  struct DataBinder *datas;
  uintptr_t datas_len;
} GroupOfBinders;
  • now we have to create an array for GroupOfBinders and pass it to the compute function and done we are finished
int32_t compute(struct CKernel kernel,
                struct GroupOfBinders *data_for_gpu,
                uintptr_t gpu_data_len);
  • it will return number which if not 0 ; error happened
  • as an example :
#include <stdio.h>
#include <stdint.h>
#include <stdlib.h>  
#include "EMCompute.h"

int main() {
  CKernel kernel;
  kernel.x = 60000;  
  kernel.y = 1000;
  kernel.z = 100;

  kernel.code = 
    "@group(0)@binding(0) var<storage, read_write> v_indices: array<u32>; "
    "@group(0)@binding(1) var<storage, read_write> v_indices0: array<u32>; "
    "@compute @workgroup_size(10 , 1 , 1)" 
    "fn main(@builtin(global_invocation_id) global_id: vec3<u32>) { "
    "  let idx = global_id.x % 60000; "
    "   "
    "v_indices[idx] = v_indices[idx] + v_indices0[idx]; "
    "  "
    "}";

  set_kernel_default_config(&kernel);
  kernel.code_entry_point = "main";

  // Initialize data
  uint32_t example_data[60000];
  uint32_t example_data0[60000];

  for (int i = 0; i < 60000; ++i) {
    example_data[i] = 1;
    example_data0[i] = 1;
  }

  // Bind data
  DataBinder data;
  data.bind = 0;
  data.data = (uint8_t *)example_data;
  data.data_len = sizeof(uint32_t)*60000/sizeof(uint8_t);

  DataBinder data0;
  data0.bind = 1;
  data0.data = (uint8_t *)example_data0;
  data0.data_len = sizeof(uint32_t)*60000/sizeof(uint8_t);

  DataBinder group0[] = {data, data0};
  GroupOfBinders wrapper;
  wrapper.group = 0;
  wrapper.datas = group0;
  wrapper.datas_len = 2;

  GroupOfBinders groups[] = {wrapper};

  // for (int i = 0 ; i< 1000000 ;++i){
    compute(kernel, groups, 1);
  // }

  // Check results
  printf("example_data[4]: %d\n", example_data[4]);
  printf("example_data0[4]: %d\n", example_data0[4]);

  return 0;
}

NEWS

  • since version 2.0.0 the API does caching to prevent allocating GPU res and improve performance
  • since version 2.1.0 you can deallocate API caches by calling free_compute_cache() function to deallocate the caches
  • since version 3.0.0 customize and setting_cache_index fields added to CKernel struct , compute function now gets pointer to CKernel var
  • since version 3.0.0 GPUCustomSettings , GPUMemoryCustom and GPUSpeedCustom added to API for setting customization for custom_speed and custom_memory (for more details about them read comments on .h , .hpp and .pyx files)
  • since version 3.0.0 if custom_speed or custom_memory are set in config field of CKernel , equivalent fields of customize will be checked
  • since version 3.0.0 caching method changed and is controled by setting_cache_index , if you set it to negative values which set_kernel_default_config function does ; api will allocate new gpu resources , so if you have used a config before you must keep track of it unless you want to go out of memory
  • since version 4.0.0 structures of CKernel changed , kernel code and configs must be registered before using compute function . for seeing changes in details refer to https://github.com/SkillfulElectro/EMCompute/tree/main/header_files and read the comments on them or https://docs.rs/EMCompute/latest/EMCompute/
  • since version 5.0.0 not much changes happened , you only must uint8_t** to data field of DataBinder

Contribution

  • if you find any problem or bug , ill be happy with your pull req or issue report

Dependencies

~4–35MB
~542K SLoC