optimization

Is there a standard way to detect bit width of hardware ?

Variables of type int are allegedly "one machine-type word in length" but in embedded systems, C compilers for 8 bit micro use to have int of 16 bits!, (8 bits for unsigned char) then for more bits, int behave normally: in 16 bit micros int is 16 bits too, and in 32 bit micros int is 32 bits, etc.. So, is there a standar way to test it,...

What compilers can detect pure mathematical functions and optimize them (without telling you so)?

I have seen that GCC is not able to detect pure mathematical functions and it needs you to provide the attribute "const" to indicate that. What compilers can detect pure mathematical functions and optimize them (without telling you so)? ...

.Net optimizations for absolute, no-holds bar fastest running code possible?

First off, code-readability goes out the window for this question. I'm all for code readability but speed comes first here. When your code absolutely, without a doubt, no exceptions has to run as fast as possible in the .Net framework, what are some optimizations that can be done? I know there are flags for the compiler to optimize it a...

What's faster/more efficient: Continuous Deleting OR Continuous Updating + Intermittent Deleting?

I have a cron that runs through many rows, deleting the "bad" ones (according to my criteria). I'm just wondering what would be the best to optimize the script. I can do one of the following: Have the same cron instantly delete the "bad" rows upon finding them. Have the same cron instantly update the "bad" rows to status "1", meaning b...

Parallelized resource loading vs DNS lookup speed

A common technique for reducing page loading times is to parallelize multiple static resource downloads by retrieving them from different hostnames (even if they all resolve to the same server). However, the browser needs to issue a DNS lookup request for each of these hostnames, which could take a significant time. Can you propose a met...

CSS Sprites Repeating Images

Hi, I was wondering if there is any way to use just one image for repeating and non-repeating images using css sprites. So in this case I would like to combine all the images on a page no matter what width and height and if they will be used as repeating or non-repeating images. I know the standard is to create 1 image using all the non...

SSE2 intrinsics: access memory directly

Many SSE instructions allow the source operand to be a 16-byte aligned memory address. For example, the various (un)pack instructions. PUNCKLBW has the following signature: PUNPCKLBW xmm1, xmm2/m128 Now this doesn't seem to be possible at all with intrinsics. It looks like it's mandatory to use _mm_load* intrinsics to read anything...

Optimize managed to native calls

What can be done to speed up calling native methods from managed code? I'm writing a program which needs to be able to manage arbitrarily-sized lists of objects and retrieve information from them at high speed, which it feeds into scripts. Scripts are bits of compiled C# code. I'm writing a basic interface layer from the C++ (native) DL...

javascript/jquery - $(document).ready() and script locations

I'd like to know how $(document).ready() works, along with scripts in general. Say I have scripts that are at the bottom of the page (for performance reasons I'm told?). As an example: say you have a link and you need to prevent it's default action (preventDefault()). If the script is at the bottom of the page, isn't it possible that the...

Optimize SQL Query

Hi, I have following query on a MySQL DB: SELECT * , r.id, x.real_name AS u_real_name, u.real_name AS v_real_name, y.real_name AS v_real_name2 FROM url_urlaube r LEFT JOIN g_users u ON ( r.v_id = u.id ) LEFT JOIN g_users x ON ( r.u_id = x.id ) LEFT JOIN g_users y ON ( r.v_id2 = y.id ) WHERE ( ( FROM_UNIXTIME( 1283205600 ) >= r.from AN...

Flipping sign on packed SSE floats.

I'm looking for the most efficient method of flipping the sign on all four floats packed in an SSE register. I have not found an intrinsic for doing this in the Intel Architecture software dev manual. Below are the things I've already tried. For each case I looped over the code 10 billion times and got the wall-time indicated. I'm ...

How can I optimize a calculation-intensive C++ program with a known bottleneck?

I am developing some scientific software for my university. It is being written in C++ on Windows (VS2008). The algorithm must calculate some values for a large number of matrix pairs, that is, at the core resides a loop iterating over the matrices, collecting some data, e.g.: sumA = sumAsq = sumB = sumBsq = diffsum = diffsumsq = return...

C/C++ optimization: negate doubles fast

I need to negate very large number of doubles quickly. If bit_generator generates 0, then the sign must be changed. If bit_generator generates 1, then nothing happens. The loop is run many times over and bit_generator is extremely fast. On my platform case 2 is noticeably faster than case 1. Looks like my CPU doesn't like branching. Is ...

How much time would be good to set in expire header for a website?

"Web pages are becoming increasingly complex with more scripts, style sheets, images, and Flash on them. A first-time visit to a page may require several HTTP requests to load all the components. By using Expires headers these components become cacheable, which avoids unnecessary HTTP requests on subsequent page views....

How to implement twitter's 'friends' timeline' function

I'm trying to learn database design by creating a twitter clone.. And I was wondering what's the most efficient way of creating the friends' timeline function. I am implementing this in Google App Engine, which uses Big Table to store the data. IIRC, this means very fast read speed(gets), but considerably slower page queries, and this al...

Inline speed and compiler optimization

Hey guys, I'm doing a bit of hands on research surrounding the speed benefits of making a function inline. I don't have the book with me, but one text I was reading, was suggesting a fairly large overhead cost to making function calls; and when ever executable size is either negligible, or can be spared, a function should be declared inl...

See any problems with this C# implementation of a stack?

I wrote this quickly under interview conditions, I wanted to post it to the community to possibly see if there was a better/faster/cleaner way to go about it. How could this be optimized? using System; using System.Collections.Generic; using System.Linq; using System.Text; namespace Stack { class StackElement<T> { publi...

C# Linked List, Tracks Head + Tail with APIs InsertAfter + Remove. See any flaws or optimizations?

Another data structure I wrote under interview conditions. It is essentially a generic linked list that tracks the head and tail of the list (probably just for academic exercise, in RL life you'd just use List). Does anyone see any possible flaws or optimizations? using System; using System.Collections.Generic; using System.Linq; using ...

Why < is slower than >=

Hi all, I am using the following code to do the test and it seems like < is slower that >=., does anyone know why? import timeit s = """ x=5 if x<0: pass """ t = timeit.Timer(stmt=s) print "%.2f usec/pass" % (1000000 * t.timeit(number=100000)/100000) #0.21 usec/pass z = """ x=5 if x>=0: pass """ t2 = timeit.Timer(stmt=z) pr...

Optimizing the compiler for iPhone apps

The "Build" section of project info in XCode offers lots of compiler settings. I'm seeing good improvements in performance (up to about 20%) when I choose the LLVM GCC 4.2 compiler with the "FASTEST-O3" setting. Are there other settings that also improve performance when compiling for the iPhone? ...