I've noticed that CUDA applications tend to have a rough maximum run-time of 5-15 seconds before they will fail and exit out. I realize it's ideal to not have CUDA application run that long but assuming that it is the correct choice to use CUDA and due to the amount of sequential work per thread it must run that long, is there any way t...
Normally, when I use Visual Studio to do a build, I see warnings and errors shown in the output pane, e.g.
1>------ Build started: Project: pdcuda, Configuration: Release x64 ------
Compiling...
foo.cpp
Linking...
foo.obj : error LNK2001: unresolved external symbol "foo"
========== Build: 0 succeeded, 1 failed, 0 up-to-date, 0 skipped =...
Is there a #define compiler (nvcc) macro of CUDA which I can use? (Like _WIN32 for Windows and so on.)
I need this for header code that will be common between nvcc and VC++ compilers. I know I can go ahead and define my own and pass it as an argument to the nvcc compiler (-D), but it would be great if there is one already defined.
...
Hello everyone,
I want to write an algorithm that can take parts of a picture and match them to another picture of the same object.
For example, If I gave the computer a picture of a vase and a picture of a scene with the vase in it, I'd expect it to determine where in the image the vase is.
How would I begin to develop an algorithm li...
CUDA vs Direct X 10 for parallel mathematics. any thoughs you have about it ?
...
I'm a business major, two-thirds of the way through my degree program, with a little PHP experience, having taken one introductory C++ class, and now regretting my choice of business over programming/computer science.
I am interested in learning more advanced programming; specifically C, and eventually progressing to using the CUDA arch...
Hey people
I've struggled with this all day, I am trying to get a random number generator for threads in my CUDA code. I have looked through all forums and yes this topic comes up a fair bit but I've spent hours trying to unravel all sorts of codes to no avail. If anyone knows of a simple method, probably a device kernel that can be c...
For the longest time I've been interested in building a cluster of heterogeneous nodes in an attempt to have a home super computer since I am very interested in doing AI research.
However, the issue is even though I have a myriad of hardware, (2x dual quad rack mount servers, 8 285GTX Gpus, 6x PS3s 2x Hacked 360s (they can run linux) a...
Hi,
I'm a CS undergrad student and wanted to finalize my project idea soon.I am mostly interested in graphics based projects which work with help of GPUs like GPGPUS (http://en.wikipedia.org/wiki/GPGPU) or actual graphic processing using GPUs.My supervisor suggested me to look for topics related to parallel computing like in GPGPUs a...
How do I allocate and transfer(to and from Host) 2D arrays in device memory in Cuda?
...
In a CUDA kernel, I have code similar to the following. I am trying to calculate one numerator per thread, and accumulate the numerators over the block to calculate a denominator, and then return the ratio. However, CUDA is setting the value of denom to whatever value is calculated for numer by the thread in the block with the largest th...
Your CPU may be a quad-core, but did you know that some graphics cards today have over 200 cores? We've already seen what GPU's in today's graphics cards can do when it comes to graphics. Now they can be used for non-graphical tasks as well, and in my opinion the results are nothing short of amazing. An algorithm that lends itself wel...
How do i get started with CUDA development on Ubuntu 9.04? Are there any prebuilt binaries? Are the default accelerated drivers sufficient?
My thought is to actually work with OpenCL but that seems to be hard to do right now so i thought that i would start with CUDA and then port my application to OpenCL when that is more readily avail...
I want to use constant memory which will be accessed by all threads across all of my kernels.
The declaration is something like this
extern constant float smooth [8 * 1024];
I am copying data to this variable using
cudaMemcpyToSymbol("smooth", smooth_local, smooth_size, 0, cudaMemcpyHostToDevice);
smooth_size = 7K bytes
It was givi...
I'm looking for some introductory examples to OpenCL which illustrate the types of applications that can experience large (e.g., 50x-1000x) increases in speed. Cuda has lots of nice examples, but I haven't found the same thing for OpenCL.
A nice example might be global optimization of complex functions via particle swarms, simulated an...
I have a class (see example bellow) which acts as a .NET wrapper for a CUDA memory structure,
allocated using cudaMalloc() and referenced using a member field of type IntPtr.
(The class uses DllImport of a native C DLL which wraps various CUDA functionality.)
The dispose methods checks if the pointer is IntPtr.Zero and if not calls cuda...
Me and some peers are working on a game (Rigs ofRods) and are trying to integrate OpenCL for physics calculation. At the same time we are trying to do some much needed cleanup of our data structures. I guess I should say we are trying to cleanup our data structures and be mindful of OpenCL requirements.
One of the problems with using op...
I am trying to run CUDA in emulation mode in Visual Studio 2008.
It is showing this problem at runtime-
cudaSafeCall() Runtime API error in file , line abc : feature is not implemented
for example in one case it turned out to be this one-
cutilSafeCall(cudaGLRegisterBufferObject(pbo));
and if i commented this one out then-
cutilSafeC...
Hi guys,
So it looks like multicore and all its associated complications are here to stay. I am planning a software project that will definitely benefit from parallelism. The problem is that I have very little experience writing concurrent software. I studied it at University and understand the concepts and theory very well but have zer...
I am trying to call cudppSort to sort a set of keys/values. I'm using the following code to set up the sort algorithm:
CUDPPConfiguration config;
config.op = CUDPP_ADD;
config.datatype = CUDPP_UINT;
config.algorithm = CUDPP_SORT_RADIX;
config.options = CUDPP_OPTION_KEY_VALUE_PAIRS | CUDPP_OPTION_FORWARD | CUDPP_OPTION_EXCLUSIVE;
CUDPPH...