I have many structs (classes) and standalone functions that I like to compile separately and then link to the CUDA kernel, but I am getting the "External calls are not supported" error while compiling (not linking) the kernel. nvcc forces to always use inline functions from the kernel. This is very frustrating!! If somebody have figured ...
Hi,
as far as i know i can use C++ templates in CUDA device code. So if i'm using map to create a dictionary will the operation of inserting new values be atomic?
I want to count the number of appearances of a certain values, i.e. create a code-dictionary with probabilities of the codes.
Thanks
Macs
...
Ok, so far, I can create an array on the host computer (of type float), and copy it to the gpu, then bring it back to the host as another array (to test if the copy was successful by comparing to the original).
I then create a CUDA array from the array on the GPU. Then I bind that array to a CUDA texture.
I now want to read that text...
I'm trying to reduce the number of instructions and constant memory reads for a CUDA kernel.
As a result, I have realised that I can pull out the tile sizes from constant memory and turn them into macros. How do I define macros that evaluate to constants during preprocessing so that I can simply adjust three values and reduce the number...
The CUDA programming guide states that
"Bandwidth is one of the most important gating factors for performance. Almost all changes to code should be made in the context of how they affect bandwidth."
It goes on to calculate theoretical bandwidth which is in the order of hundreds of gigabytes per second. I am at a loss as to why ho...
Should i learn OpenCL if i only want to program NVIDIA GPUs ?
...
Steps for using textures and arrays in CUDA?
...
How threads are organized to be executed by a GPU?
...
The standard convention seems to be to give CUDA source-code files a .cu extension, to distinguish them from C files with a .c extension. What's the corresponding convention for CUDA-specific header files? Is there one?
...
I create a VS project using CUDA VS Wizard, and I try to build a cuda program using Thrust, the test program is quite simple:
// ignore headers
int main(void)
{
thrust::device_vector<double> X;
X.resize(100);
}
I will got some compile error like:
1>C:\DOCUME~1\ADMINI~1\LOCALS~1\Temp/tmpxft_00003cc0_00000000-3_sample.cudafe1.stub.c(2...
Hi,
since i needed to sort large arrays of numbers with CUDA, i came along with using thrust. So far, so good...but what when i want to call a "handwritten" kernel, having a thrust::host_vector containing the data?
My approach was (backcopy is missing):
int CUDA_CountAndAdd_Kernel(thrust::host_vector<float> *samples, thrust::host_vect...
Given the following piece of code, generating a kind of code dictionary with CUDA using thrust (C++ template library for CUDA):
thrust::device_vector<float> dCodes(codes->begin(), codes->end());
thrust::device_vector<int> dCounts(counts->begin(), counts->end());
thrust::device_vector<int> newCounts(counts->size());
for (int i = 0; i < ...
As I would like my GPU to do some of calculation for me, I am interested in the topic of measuring a speed of 'texture' upload and download - because my 'textures' are the data that GPU should crunch.
I know that transfer from main memory to GPU memory is the preffered way to go, so I expect such application to be efficient only if ther...
i have read that there were 100X acceleration on certain problems when you use NVIDIA GPU instead of CPU.
what are the best performance acceleration timings using cuda on different problems.
please state the problem and the acceleration factor along with links for papers if possible.
...
the current GPU threads are somehow limited (memory limit, limit of data structures, no recursion...).
do you think it would be feasible to implement a graph theory problem on GPU. for example vertex cover? dominating set? independent set? max clique?....
is it also feasible to have branch-and-bound algorithms on GPUs? Recursive bac...
Hi.
I'm just starting out learning OpenCL. I'm trying to get a feel for what performance gains to expect when moving functions/algorithms to the GPU.
The most basic kernel given in most tutorials is a kernel that takes two arrays of numbers and sums the value at the corresponding indexes and adds them to a third array, like so:
__ker...
I encountered a strange problem where increasing my occupancy by increasing the number of threads reduced performance.
I created the following program to illustrate the problem:
#include <stdio.h>
#include <stdlib.h>
#include <cuda_runtime.h>
#include <cutil.h>
__global__ void less_threads(float * d_out) {
int num_inliers;
for...
I'm trying to set my simulation params in constant memory but without luck (CUDA.NET).
cudaMemcpyToSymbol function returns cudaErrorInvalidSymbol. The first parameter in cudaMemcpyToSymbol is string... Is it symbol name? actualy I don't understand how it could be resolved. Any help appreciated.
//init, load .cubin
float[] arr = new f...
I am playing around with cuda.
At the moment I have a problem. I am testing a large array for particular responses, and when I get the response, I have to copy the data onto another array.
For example, my test array of 5 elements looks like this:
[ ][ ][v1][ ][ ][v2]
Result must look like this:
[v1][v2]
The problem is how do I calc...
I've written the following code:
N_Vector_cuda v;
if (N <= 0) return(NULL);
v = (N_Vector_cuda) malloc(sizeof *v);
if (v == NULL) return(NULL);
v->inc = 1;
v->elemsize = sizeof(real);
v->status = cublasAlloc(N, v->elemsize, (void**)&(v->data));
if (v->status != CUBLAS_STATUS_SUCCESS)
{
free(v);
return(NULL);
}
v->length = ...