What are the weaknesses and strengths of the Brill Tagger? Can you suggest some possible improvements for the tagger?
The biggest weakness of a Brill tagger is the time needed for the training phase (take a look at the time-stamps for ACOPOST here or try to to implement one with NLTK to get an idea). Remember that you should always consider a Brill tagger as the last tagger to be used in a sequence of tagging systems (for simple tagging I usually use and train a Brill tagger on the output of an HMM tagger). Besides making the training phase even longer, to use a Brill tagger by itself generally results in a very large, normally overlapping and sometimes "incorrect" set of rules (i.e., rules which in "true" tagging contexts brake many correct tags).
The biggest strength of a Brill tagger is the fact that its model makes sense, in particular when you store the rules in an human-readable format as it is generally done. To manually inspect the model of a statistical tagger is tedious, error-prone and not very useful, while a set of transformation rules can not only be understood and tweaked manually, but this can be done even by people with no previous experience in NLP (in fact, I did years ago when some undergraduates of a language program evaluated the rules generated on a Brazilian Portugues corpus). In fact, you can even write the set of rules entirely by yourself.
In short, while a Brill tagger is useful as the last step in a robust system of cascading taggers, in general it is not the best alternative to be used by itself (if you want to use a single tagger, I would suggest to go with an HMM one). My suggestion is to train and use a Brill tagger on the tagged output of another tagger, preferably a combined system such as voting one (i.e., when you setup three or four different taggers, use a voting system to select the best tag for each token and only then feed these results to a Brill tagger that would hopefully correct the most common mistakes of the previous system).