tags:

views:

169

answers:

3

I am making a django site to showcase children's clothing. You start with an overview page where you see a listing with all the clothes. In a side bar you have the following options to refine your search:

clothes for:

  • boys
  • girls

clothes in:

  • cotton
  • wool

clothes in size:

  • 56
  • 62
  • 68
  • 74
  • 80
  • 86
  • 92
  • 98
  • 104
  • 110
  • 116

clothes with the color:

  • white
  • black
  • red
  • green
  • blue
  • yellow

So, suppose the user is interested in boys clothes. She clicks on 'boys', and gets served the clothes_boys view at the URL websitename.com/clothes/boys/. On that page the sidebar lists the options for fabric, size and color. The user can then drill down further to for example /clothes/boys/cotton/56/white to get a listing of all the available white cotton boys' clothes in size 56.

I have the regexes and the views for the above scenario. But the user can off course take hunderds of different paths, like /clothes/red/wool/girls/92/ and so on.

How do I go about catching all those different cases without having to manually write each of those multitude of regexes and views.

+3  A: 

Solution 1:

use /gender/type/size/color/ and have some kind of reserved value for unspecified - let say na. so if the user first clicks "red", he'll go to /na/na/na/red/ . this way you only need 1 regex, and your urls are consistent.

Solution 2:

use GET params for this. everything is in the url /clothes/, but you can specify /clothes/?gender=boys&size=55&color=red etc'. its easy enough to parse these values in the view (request.GET['gender']). In this solution unspecified values are just unspecified (like type in my example).

Solution 3:

Use Djnago-filter - its a pluggable app that implements solution 2.

Ofri Raviv
django-filter does not allow you to actually drill-down, it's usign filter(foo).filter(bar) constructions, not filter(foo, bar).
Dmitry Shevchenko
+1  A: 

One disadvantage of having multiple paths like you've specified is that search engines are going to see each page as a distinct permutation - which might hurt SEO.

I've also seen bad spiders essentially DOS-attack a site in situations like this.

This is a yucky problem that you might be best served by implementing the simplest solution possible. To my eyes, Ofri's first solution is that, except NA is kind of an ugly placeholder. Something that'll look better to the eyes might be "ALL_GENDERS", "ALL_SIZES", "ALL_TYPES". That way, you can grok things from the url, instead of having it look like it's in some kind of error state.

Koobz
+1  A: 

My first reaction to this problem would be a middleware solution combined with some common SEO practices. Because you have a fairly narrow field of options in your URL schema, this can be a viable option.

The middleware would be responsible for performing two actions on each request.

  1. Parse request.path looking for the pieces of your url.
  2. Create a URL that is specific for the gender/size/color/material.

Quickly hacking something together, it may look something like this:

class ProductFilterMiddleware:
    GENDERS = ("girls", "boys")
    MATERIALS = ("cotton", "wool", "silk")
    def proc_url(self, path):
        """ Process a path looking for gender, color, material and size. """
        pieces = [x for x in path.split('/') if x != '']
        prod_details = {}
        for piece in pieces:
            if piece in self.GENDERS:
                prod_details['gender'] = piece
            elif piece in self.MATERIALS:
                prod_details['material'] = piece
            elif re.match(r'\d+', piece):
                prod_details['size'] = piece
            else:
                prod_details['color'] = piece
        return prod_details
    def get_url(self, prod_details):
        """ Parse the output of proc_url() to create the correct URL. """
        pieces = []
        if 'gender' in prod_details:
            pieces.append(prod_details['gender'])
        if 'material' in prod_details:
            pieces.append(prod_details['material'])
        if 'size' in prod_details:
            pieces.append(prod_details['size'])
        if 'color' in prod_details:
            pieces.append(prod_details['color'])
        return '/%s/' % '/'.join(pieces)
    def process_view(self, request, view_func, args, options):
        request.product_details = self.proc_url(request.path)
        request.product_url = self.get_url(request.product_details)

This would allow arbitrary links to be created to your products without your advanced knowledge, allowing the system to be flexible with its URLs. This also includes partial URLs (just a size and material, show me all the color choices). Any of the following should be parsed without incident:

  • /56/cotton/red/boys/
  • /cotton/56/
  • /green/cotton/red/girls/
  • /cotton/

From here, your view can then create a list of products to return using request.product_details as its guide.

Part two of this solution is to then include canonical tags in each of the pages you output. This should prevent duplicate content from adversely affecting your SEO.

<link rel="canonical" href="http://www.example.com/{{ request.product_url }}" />

Warning: Google and other search engines may still hammer your site, requesting information from each URL that it can find. This can create a nasty load on your server very quickly. Because content is available from so many different locations, the spider may dig around quite a bit, even though it knows that only one copy of each page is the real deal.

Jack M.