I'm fairly new to OpenCL so please bear with me.
In the first iteration of my code, I used basic memory buffers for large datasets and declared them global. However now that I'm looking to improve the timing, I wanted to use texture memory for this. In the CUDA version, we use cudaBindTexture and tex1Dfetch to obtain the data for a larg...
hi i have the following code...
int *a, *b;
int *d;
int N = 2000;
size_t size = N*sizeof(int);
a = (int *) malloc(size);
b = (int *) malloc(size);
...
cudaMalloc((void **) &d, size);
it works just fine... now assume i have the following
char **t = malloc(2000* sizeof *t);
for(...)
{
...
t[i] = (char *)mal...
I am trying to test some typical cuda functions during the configure process. How can I write it in my configure.ac? Something like:
AC_TRY_COMPILE([],
[
__global__ static void test_cuda() {
const int tid = threadIdx.x;
const int bid = blockIdx.x;
__syncthreads();
}
],
[cuda_comp=ok],[cuda_comp=no])
But nvcc is not defined...
I have a CUDA program that works fine, but that is currently all written in one file. I'd like to split this big file into several smaller ones, in order to make it easier to maintain and navigate.
The new structure is :
foo.cuh
foo.cu
bar.cuh
bar.cu
main.cu
The .cuh header files contain structs and function prototypes, and the .cu f...
Problem
I'm trying to create an CUDA application that is well integrated with .net. The design goal is to have several CUDA functions that can be called from managed code. Data should also be able to persist on a device between function calls, so that it can be passed to multiple CUDA functions.
It is of importance that each individual...
i am doing a research about gpu programming and want to learn more about cuda.
i read a lot about it(from Wikipedia+Nvidia and other references) but i still have some questions:
1- is the following true: a gpu has multiprocessors, every multiprocessor have streaming processors, and every streaming processor can run blocks of threads at ...
I am running Windows 7 64bit, with Visual Studio 2008. I installed the CUDA driver's and SDK. The SDK comes with quite a few examples including compiled executable and source code. The compiled executable's run wonderfully. When I open the vc90 solution's and go to build in Win32 configuration I get this error:
Error 1 fatal error...
I'm trying to implement a critical section in CUDA using atomic instructions, but I ran into some trouble. I have created the test program to show the problem:
#include <cuda_runtime.h>
#include <cutil_inline.h>
#include <stdio.h>
__global__ void k_testLocking(unsigned int* locks, int n) {
int id = threadIdx.x % n;
while (atomi...
I am looking for some good beginners tutorial for learning the basics of CUDA.
...
I want to start learning how to program in CUDA, not just the language, but program-design -- things like -- from what I've heard -- writing kernels without conditionals so that all the threads run the same instructions and there's minimal synchronization overhead.
And from what I've heard, the python wrapper is a lot more intuitive to ...
I'm writing my own graphics library (yep, its homework:) and use cuda to do all rendering and calculations fast.
I have problem with drawing filled triangles. I wrote it such a way that one process draw one triangle. It works pretty fine when there are a lot of small triangles on the scene, but it breaks performance totally when triangl...
This is an incredibly basic question, but how do I start a new CUDA app in visual studio 2008? I have found tons and tons of documentation about CUDA related matters, but nothing about how to start a new project. I am working with Windows 7 x64 Visual Studio 2008 C++. I would really like to find some sort of really really basic Hello ...
I create a new Win32 Console App as an empty project
I am running Windows 7 64bit with Visual Studio 2008 C++. I am trying to get the sample code from the bottom of this article to build: http://www.ddj.com/architect/207200659
I add CUDA Build Rule v2.3.0 to the project's custom build rules. It is the only thing with a checkbox in th...
Is there convenient way for using asserts within the kernels invocation on device mode?
Thanks, in advance.
...
I am trying separate a CUDA program into two separate .cu files in effort to edge closer to writing a real app in C++. I have a simple little program that:
Allocates a memory on the host and the device.
Initializes the host array to a series of numbers.
Copies the host array to a device array
Finds the square of all the elements in the...
This post closely resembles my earlier post: http://stackoverflow.com/questions/2090974/how-to-separate-cuda-code-into-multiple-files/2092091#2092091 I am afraid I made such a blunder of what I was actually asking that it will be too confusing to try and correct it there.
I am basing this code loosely off the cppIntegration example fro...
I have the bare bones of a GLUT app. When I compile it for Win32 it works fine, but if I compile it for x64 I get this error:
The application was unable to start correctly (0xc000007b). Click OK to close the application.
I have glut64.lib as an input for the Linker, which comes from the nVidia CUDA sdk at "C:\ProgramData\NVIDIA Corp...
Hello,
My problem is the following:
I need to generate lot of random numbers in parallel using Binomial Distribution on CUDA. All the Random Number Generators on CUDA are based on the Uniform Distribution (as far I know), what is also useful since all the algorithms for Binomial Distribution needs to use Uniform variates.
Is there any...
hi wanted to calculate the degreee between two triangles where every point of them has a 3d coordinate.... i.e.
triangle 1: point1(x1,y1,z1), point2(x2,y2,z2), point3(x3,y3,z3).
triangle 2: point1(x1,y1,z1), point2(x2,y2,z2), point4(x4,y4,z4).
yes, the triangles always share exactly same two points. is there a way to calculate the degr...
I am allocating some float arrays (pretty large, ie 9,000,000 elements) on the GPU using cudaMalloc((void**)&(storage->data), size * sizeof(float)). In the end of my program, I free this memory using cudaFree(storage->data);.
The problem is that the first deallocation is really slow, around 10 seconds, whereas the others are nearly inst...