reliability

How reliable are unix domain sockets?

I'm trying to figure out a protocol to use with domain sockets and can't find information on how blindly the domain sockets can be trusted. Can data be lost? Are messages always received in the same order as sent? Even when using datagram sockets? Are transfers atomic? When reading the socket, can I trust that I get the whole message o...

What will it take for Transactional Memory to be viable?

I've been doing some work on transactional memory and its viability for systems programming (databases, operating systems, servers, etc.). My own experience employing transactions, together with having seen how few communities use transactions in real code, has raised a question: What would convince you, a developer writing production co...

NO-SQL reliable for small bussines app?

I'm deciding between go for a NON-SQL engine or a regular SQL one for a document managment system for small bussines. I have experience with firebird/sql server and found a good track of reliability (specially with firebird). This market is full of crappy "servers" (clon-made PC, the mayority), cheap harddisk, rarely use of RAID or any...

Watchdog built into the same process as the program it controls

I run a Visual C++ console test program inside the daily build. Every now and then the test would call some function that was changed by other developers improperly, descend into an infinite loop and hang thus blocking the build. I need a watchdog solution as simple as possible. Here's what I came up with. In the test program entry poin...

How to improve email sending and delivery reliability?

The current application uses simple Java Mail to send couple emails a day but some of the emails never make it to the client. Based on the application server logs there has been couple mail server time outs but that does not explain all the cases of the missing emails. Adding a retry feature would help with the time out problem but are ...

Fail Fast vs. Robustness

Our product is a distributed system. The modules I work on are fairly new, quite rigorous, well tested. They were developed with recent best practices in mind. Other modules can be considered as legacy software. While I'm vigilant about everything that happens within modules I'm responsible for, I'm under constant pressure to work with ...

How do you efficiently handle reference counts between a server and clients?

Say a server stores objects with a reference count. Clients, when they connect (over sockets), send messages to increment and decrement the counts on those objects. The only behavior pattern that's guaranteed is that if a client works with a particular object, it will increment it, some time will pass, and it will decrement it (clients n...

When running a shell script, how can you protect it from overwriting or truncating files?

If while an application is running one of the shared libraries it uses is written to or truncated, then the application will crash. Moving the file or removing it wholesale with 'rm' will not cause a crash, because the OS (Solaris in this case but I assume this is true on Linux and other *nix as well) is smart enough to not delete the in...

What is the practical difference between transport and message reliability in WCF?

I am looking at differences between using WPF in .NET or using Silverlight 4 for the GUI front end of an app that connects to WCF services. I have read that net.tcp binding in Silverlight 4 only supports transport level reliability. With a WPF desktop app we can use message level reliability. What is the actual difference? If transport...

Design considerations for high-reliability service

I am writing a c# windows service which will perform some background processing - basically it is a consumer for a work queue. It needs to not go down (stop processing new items), and if it does go down I need to be notified. What are some design guidelines and considerations for a) ensuring that such a service is as reliable as possi...

When does ExecuteCodeWithGuaranteedCleanup actually guarantee cleanup?

I have been reading about Reliability Features in .NET and have written the following class to explore ExecuteCodeWithGuaranteedCleanup class Failing { public void Fail() { RuntimeHelpers.PrepareConstrainedRegions(); try { } finally { RuntimeHelpers.ExecuteCodeWithGuara...

How can the reliability of Software be checked through analysis?

How can we analyze the software reliability? How to check the reliabilty of any application or product? ...

CA2000 passing object reference to base constructor in C#

I receive a warning when I run some code through Visual Studio's Code Analysis utility which I'm not sure how to resolve. Perhaps someone here has come across a similar issue, resolved it, and is willing to share their insight. I'm programming a custom-painted cell used in a DataGridView control. The code resembles: public class DataGr...

Methodology/Template for calculating Application reliability five Nines/Six Nines ?

any concrete suggestions for computing application/System reliability ? ...

Performing application reliability using iis 6/7

I have web-services applications, running on Windows Server 2003. These hosts (each of them on separate appPool) contains multiple operations (consulting services). Does exist an approach to perform reliability on these hosts, in terms of appPools (like customizing the pools): If an worker process fails, another will be started in its pl...

Can you safely rely upon Yahoo Pipes to offload ETL for your application?

Yahoo Pipes are a very intriguing choice for sort of a poor-man's server-free ETL solution, but would it be a good idea to build an application around one or many Pipes? I've really only used them for toy things here and there, with the only thing I've used longer than a week or two being one amalgamated and filtered RSS feed that I've ...

How may I teach that SOAP is not a reliable transport?

I need to teach that a HTTP SOAP call may be received but the caller may not get the response due to a network failure (among other problems). (This problem made WS-ReliableMessaging be developed) How would you guys show this problem to a web service developer so they can develop taking into account that duplicate messages may be receiv...

Linux HA / cluster: what are the differences between Pacemaker, Heartbeat, Corosync, wackamole?

Can you help me understand Linux HA? Pacemaker, Heartbeat, Corosync seem to be part of a whole HA stack, but how do they fit together? How does wackamole differ from Pacemaker/Heartbeat/Corosync? I've seen opinions that wackamole is better than Heartbeat because it's peer-based. Is that valid? The last release of wackamole was 2.5 ye...

How to test reliability of my own (small) embedded operating system ?

I've written a small operating system for embedded project running on small to medium target. I added some automated unit test with a high test code coverage (>95%), but the scope is only the static part. I got some code metrics as complexity and readability. I'm testing my code with a rule checker with MiSRA support, and of course fixe...

Force memcached to write to all servers in pool

Hi everyone, I have thought a bit on how to make sure that a particular key is distributed to ALL memcached servers in a pool. My current, untested solution is to make another instance of memcached, something like this: $cluster[] = array('host' => '192.168.1.1', 'port' => '11211', 'weight' => 50); $this->tempMemcached = new Memcache...