gpgpu

how to optimize cuda program for get better performance?

Hi, I write matlab program(cuda) for generate key. how to optimize cuda program for get better performance? ...

OpenCL and CUDA

Should i learn OpenCL if i only want to program NVIDIA GPUs ? ...

Generating bitmap on PixelShader

Hi, I have 16bit grayscale data on which I would like to make such operations: For every pixel: 1) Compute sample 's' downsampling 16bit->8bit using LUT 2) Store sample in RGB (24bit - 8bit per sample) texture R=s,G=s,B=s At the end I would like to have data that I could use in a windows DIB directly ( unsigned short 8bit per sample RGB...

What's the most trivial function that would benfit from being computed on a GPU?

Hi. I'm just starting out learning OpenCL. I'm trying to get a feel for what performance gains to expect when moving functions/algorithms to the GPU. The most basic kernel given in most tutorials is a kernel that takes two arrays of numbers and sums the value at the corresponding indexes and adds them to a third array, like so: __ker...

GPU YUV to RGB. Worth the effort?

Hello, I have to convert several full PAL videos (720x576@25) from YUV 4:2:2 to RGB, in real time, and probably a custom resize for each. I have thought of using the GPU, as I have seen some example that does just this (except that it's 4:4:4 so the bpp is the same in source and destiny)-- http://www.fourcc.org/source/YUV420P-OpenGL-GLS...

Why aren't we programming on the GPU?

So I finally took the time to learn CUDA and get it installed and configured on my computer and I have to say, I'm quite impressed! Here's how it does rendering the Mandelbrot set at 1280 x 678 pixels on my home PC with a Q6600 and a GeForce 8800GTS (max of 1000 iterations): Maxing out all 4 CPU cores with OpenMP: 2.23 fps Running the...

Best approach for GPGPU/CUDA/OpenCL in Java?

General-purpose computing on graphics processing units (GPGPU) is a very attractive concept to harness the power of the GPU for any kind of computing. I'd love to use GPGPU for image processing, particles, and fast geometric operations. Right now, it seems the two contenders in this space are CUDA and OpenCL. I'd like to know: Is Op...

NVIDIA GPUs and PhysX engine

How is the NVIDIA PhysX engine implemented in the NVIDIA GPUs: It's a co-processor or the physical algorithms are implemented as fragment programs to be executed in the GPU pipeline ? ...

Fastest sort of fixed length 6 int array

Answering to another StackOverflow question (this one) I stumbled upon an interresting sub-problem. What is the fastest way to sort an array of 6 ints ? As the question is very low level (will be executed by a GPU): we can't assume libraries are available (and the call itself has it's cost), only plain C to avoid emptying instruction ...

GLSL shader render to texture not saving alpha value

UPDATE: Danvil solved it in a comment below. My texture format was GL_RGB not GL_RGBA which of course means that the alpha values aren't kept. Don't know why I didn't realize... Thanks Danvil. I am rendering to a texture using a GLSL shader and then sending that texture as input to a second shader. For the first texture I am using RGB c...

Financial applications on GPGPU

I want to know what sort of financial applications can be implemented using a GPGPU. I'm aware of Option pricing/ Stock price estimation using Monte Carlo simulation on GPGPU using CUDA. Can someone enumerate the various possibilities of utilizing GPGPU for any application in Finance domain, ...

Double precision floating point in CUDA

Does CUDA support double precision floating point numbers ? Also need reasons for the same. ...

DirectCompute versus OpenCL for GPU programming?

I have some (financial) tasks which should map well to GPU computing, but I'm not really sure if I should go with OpenCL or DirectCompute. I did some GPU computing, but it was a long time ago (3 years). I did it through OpenGL since there was not really any alternative back then. I've seen some OpenCL presentations and it looks really n...

Easiest way to sign/certify text file in C++?

I want to verify if the text log files created by my program being run at my customer's site have been tampered with. How do you suggest I go about doing this? I searched a bunch here and google but couldn't find my answer. Thanks! Edit: After reading all the suggestions so far here are my thoughts. I want to keep it simple, and since...

Do the parallel-for in .net 4.0 takes privilege of GPU computing automatically?

Do the parallel-for in .net 4.0 takes privilege of GPU computing automatically? Or I have to configure with some drivers so that it uses GPU. ...

Learning GPGPU programming

My hands have been itching to learn GPGPU programming for some time. I finally have some time on my hands so I want to use it as wisely as possible. I'm really interested in your guys experience with GPGPU programming, any pointers, references to good literature, links to sites, interesting projects etc. My interests lie mainly in scie...

cuda optimization techniques

i have written a CUDA code to solve an NP-Complete problem, but the performance was not as i suspected. i know about "some" optimization techniques (using shared memroy,textures,zerocopy...) What are the most important optimization techniques Cuda programmers should know about? ...

Does Global Work Size Need to be Multiple of Work Group Size in OpenCL?

Hello: Does Global Work Size (Dimensions) Need to be Multiple of Work Group Size (Dimensions) in OpenCL? If so, is there a standard way of handling matrices not a multiple of the work group dimensions? I can think of two possibilities: Dynamically set the size of the work group dimensions to a factor of the global work dimensions. (thi...

GPGPU before CUDA and OpenCL

I've been reading about CUDA and OpenCL and have learned that before these frameworks developers could only use low level APIs like OPENGL and D3D. Unfortunately I haven't been able to find much information about it. Was it a widespread or commercial practice or was it just something they used in research and military labs? I'm sure so...

How can I programmatically determine a GPU's memory bus width and clock rate?

How can I programmatically determine a GPU's memory bus width and memory clock rate? I want to use these numbers to compute the maximum theoretical memory bandwidth. I'm mostly interested in NVIDIA GPUs. ...