It isn't easy and it does take practice and art, but here's the theory.
Tools like Photoshop and Picasa have automatic color correction on one button. It has to make assumptions about what the mean color distributions ought be and they probably work in the Lab color space rather than the RGB colorspace that you are familiar with. Since the approach is heuristic, it will get it wrong for some images. For example, if you take images in a bright yet shrouded forest, the ambient light has a decided green cast and you simply cannot jiggle the colors to make a white object white because you'd have to push too hard into the red thus screwing up, for example, a green shirt. Similarly, images in from the late afternoon orange sun are yellow biased and to correct that pushes too hard into the blue. There may be stop values in the auto-color modes to avoid over compensation.
Now Lab colorspace is a strange beast and there are literally entire books about it. It is a three channel space with Luminance on one channel (that's the easy one) and channels that have so little connection to the way we think of color that they are simply called "a" and "b". The a and b channels encode all the chromaticity data (everything that isn't Luminance) in dimensions that could be roughly called yellow-blue and green-red. Here's another weirdness, the gamut of Lab is far bigger than our eyes can handle (RGB and CMYK are both smaller than our visual gamut) yielding colors that are impossible, for example a deeply saturated red with almost no Luminance. We can describe it, but our perception drops color out as Luminance decreases (which is why nighttime give everything a blue-grey appearance).
So how would you do so algorithmically? First, you need to really understand the perceptual models, transform the images into a perceptual space, adjust the bi-axial distribution according to pretty complicated expectations of normal and then cast the result back into an RGB space so it can be rendered. Yes, this can be implemented in a pocket camera, but it is non-trivial and often needs hints (e.g. setting the expected color temperature to sunny or shaded, tungsten, or fluorescent, etc.). Absent human guidance algorithms will be wrong more often, and without hand masking some color casts like the green forest can't be done pleasingly on an image as a whole.
tl;dr