parallel-processing

MPI4Py Scatter sendbuf Argument Type?

I'm having trouble with the Scatter function in the MPI4Py Python module. My assumption is that I should be able to pass it a single list for the sendbuffer. However, I'm getting a consistent error message when I do that, or indeed add the other two arguments, recvbuf and root: File "code/step3.py", line 682, in subbox_grid i = m...

Where to find beta testers for an auto-parallelization tool?

i'm helping a client set up a beta test program for a new auto-parallelization tool, but it's been tricky finding a large-enough set of good testers. we're not looking for QA folks here. we're looking for developers who have an interest in the problem of parallelizing legacy code, in particular in the face of multicore, and who can giv...

Immutability and static variables

I am designing some immutable classes but I have to have some variables like say .Count to have the total count of the instances. But would having a static variable affect multi-threading? Because methods like Add, Remove, etc have to update the .Count value. Maybe I should make it lazy property? ...

Embarrassingly parallelizable tasks in .NET

I am working on a problem where I need to perform a lot of embarrassingly parallelizable tasks. The task is created by reading data from the database but a collection of all tasks would exceed the amount of memory on the machine so tasks have to be created, processed and disposed. I am wondering what would be a good approach to solve thi...

Too many calls to mprotect

I am working on a parallel app (C, pthread). I traced the system calls because at some point I have bad parallel performances. My traces shown that my program calls mprotect() many many times ... enough to significantly slow down my program. I do allocate a lot of memory (with malloc()) but there is only a reasonable number of calls to ...

OpenMP - terrible performance - a simple issue of overhead, or is there a program flaw? (C)

I have here what I understand to be a relatively simple OpenMP construct. The issue is that the program runs about 100-300x faster with 1 thread when compared to 2 threads. 87% of the program is spent in gomp_send_wait() and another 9.5% in gomp_send_post. The program gives correct results, but I wonder if there is a flaw in the cod...

"Beginner" distributed processing project.

For the longest time I've been interested in building a cluster of heterogeneous nodes in an attempt to have a home super computer since I am very interested in doing AI research. However, the issue is even though I have a myriad of hardware, (2x dual quad rack mount servers, 8 285GTX Gpus, 6x PS3s 2x Hacked 360s (they can run linux) a...

Mutable vs Immutable for parallel applications

In the application I am writing, I need to write lots of base types, which will most likely be immutable. But I am wondering how mutable types compare in parallel applications to immutable ones. You can use locks with mutable objects, right? How does it compare to other techniques used with immutable types in parallel applications? You...

Which apps are too slow? or: Is multi-core needed?

Just like cars, speed is cool, but: "is speed needed? Will people to pay for it?" Word processing, email and spreadsheets are fast enough, even on underpowered netbooks (they've been fast enough for a decade.) Provided you can play HD video and sound, do people need it to be faster? It seems that games can always use more power, and i...

Maximum Increase in Processing Speed via Parallelism

Are there any cases in which anything more than a linear speed increase comes from parallelising an algorithm ? ...

Package for distributing calculations

Do you know of any package for distributing calculations on several computers and/or several cores on each computer? The calculation code is in c++, the package needs to be able to cope with data >2GB and work on a windows x64 machine. Shareware would be nice, but isn't a requirement. ...

Cost/benefit of parallelization based on code size?

How do you figure out whether it's worth parallelizing a particular code block based on its code size? Is the following calculation correct? Assume: Thread pool consisting of one thread per CPU. CPU-bound code block with execution time of X milliseconds. Y = min(number of CPUs, number of concurrent requests) Therefore: Cost: code ...

Fast Interleaving of Data

I'm working with some piece of hardware (the hardware itself is not important) and I need to split some block data intro separate pieces in order to make the thing run faster. So I have, for example a contiguous block of memory X words long. For visualzation, I'm arranging it into 50 word lines below: 001 002 003 004 005 006 007 ....

Is there a good podcast about concurrency?

Hi All, Concurrency is one of the hot topics on quite a few technology podcasts. Yet I couldn't find a podcast dedicated to concurrency programming fundamentals, techniques etc. If there's no podcast that specializes on concurrency which of technology podcasts highlights this topic best? ...

Better multithreading: single functions or collection functions

I don't know if I worded it correctly, but for a simple example let's say we have a collection of Point3 values (say 1M). We have a method called Offset that adds another Point3 value on these values, returning new Point3 values. Let's say the method is static. The Point3 type is immutable. The question is, should I have a method like...

Directory walker on modern operating systems slower when it's multi-threaded?

Hello! Once I had the theory that on modern operating systems multithreaded read access on the HDD should perform better. I thought that: the operating system queues all read requests, and rearranges them in such a way, that it could read from the HDD more sequentially. The more requests it would get, the better it could rearrange the...

Efficient way to save data to disk while running a computationally intensive task

Hi, I'm working on a piece of scientific software that is very cpu-intensive (its proc bound), but it needs to write data to disk fairly often (i/o bound). I'm adding parallelization to this (OpenMP) and I'm wondering what the best way to address the write-to-disk needs. There's no reason the simulation should wait on the HDD (which is...

Parallel algorithms and data structures

Inkeeping with my interests in algorithms (see here), I would like to know if there are (contrary to my previous question), algorithms and data structures that are mainstream in parallel programming. It is probably early to ask about mainstream parallel algos and ds, but some of the gurus here may have had good experiences/bad experience...

Parallel API for C/C++ on Windows

We're developing something for the Windows platform and we'd like to harness the multiple cores present in PCs nowadays. I know that in VS2010, there is the Concurrency Runtime. It's still on Beta, though. For the meantime that we do need to release quality code now, what is a good option for an API that will allow smooth transition la...

ideas for graphic based projects using GPUs?

Hi, I'm a CS undergrad student and wanted to finalize my project idea soon.I am mostly interested in graphics based projects which work with help of GPUs like GPGPUS (http://en.wikipedia.org/wiki/GPGPU) or actual graphic processing using GPUs.My supervisor suggested me to look for topics related to parallel computing like in GPGPUs a...