You need to think of this as a tokenization issue so SW1A 1AA should tokenize to:
(although you could break the inward part down into 1 and AA if you wanted to)
and G12 8QT should tokenize to:
Once you have broken the postcode down into those component parts then sorting should be easy enough. There is an exception with the GIR 0AA postcode but you can just hardcode a test for that one
edit: some more thoughts on tokenization
For the sample postcode SW1A 1AA, SW is the postcode area, 1A is the postcode district (which we'll break into two parts for sorting purposes), 1 is the postcode sector and AA is the unit postcode.
These are the valid postcode formats (source: Royal Mail PAF user guide page 8 - link at bottom of this page):
AN NAA
AAN NAA
ANN NAA
ANA NAA
AAA NAA (only for GIR 0AA code)
AANN NAA
AANA NAA
So a rough algorithm would be (assuming we want to separate the sector and unit postcode):
- code = GIR 0AA? Tokenize to GI/R/ /0/AA (treating R as the district simplifies things)
- code 5 letters long e.g G1 3AF? Tokenize to G/1/ /3/AF
- code 6 letters long with 3rd character being a letter e.g. W1P 1HQ? Tokenize to W/1/P/1/HQ
- code 6 letters long with 2nd character being a letter e.g. CR2 6XH? Tokenize to CR/2/ /6/XH
- code 7 letters long with 4th character being a letter e.g. EC1A 1BB? Tokenize to EC/1/A/1/BB
- otherwise e.g. TW14 2ZZ, tokenize to TW/14/ /2/ZZ
If the purpose is to display a list of postcodes for the user to choose from then I would adopt Neil Butterworth's suggestion of storing a 'sortable' version of the postcode in the database. The easiest way to create a sortable version is to pad all entries to nine characters:
- two characters for the area (right-pad if shorter)
- two for the district number (left-pad if shorter)
- one for the district letter (pad if missing)
- space
- one for the sector
- two for the unit
and GIR 0AA is a slight exception again. If you pad with spaces then the sort order should be correct. Examples using # to represent a space:
- W1#1AA => W##1##1AA
- WC1#1AA => WC#1##1AA
- W10#1AA => W#10##1AA
- W1W#1AA => W##1W#1AA
- GIR#0AA => GI#R##0AA
- WC10#1AA => WC10##1AA
- WC1W#1AA => WC#1W#1AA
You need to right-pad the area if it's too short: left-padding produces the wrong sort order. All of the single letter areas - B, E, G, L, M, N, S, W - would sort before all of the two-letter areas - AB, AL, ..., ZE - if you left-padded
The district number needs to be left padded to ensure that the natural W1, W2, ..., W9, W10 order remains intact