This is the key problem with Agile, and I don't think anyone has solved it yet. Initial architectural decisions are critical to success, and as Kent Beck says ideally you would defer them until you have enough information.
That he can say it of course is largely because he can choose his clients, and demand that degree of freedom. Three months into a project changing the implementation language might be ok for him, but for most of us it's not an option. We have to make some decisions quite early, and they have to be right. We must work with insufficient information, and use our experience and nous as effectively as possible.
Most architectural texts have a process that starts with english sentences expressing required functionality, and then decompose the nouns and eventually verbs into semantic representations of classifiers that eventually end up being turned into actual lines of code.
We can't do this in agile too easily - user stories don't lend themselves to decomposition, because they are insufficiently detailed, and we don't have any other source of functional requirements.
I'd suggest avoiding anything like UML (for anything other than keeping your own notes at least) until you have at least written your Release Plan, and you have some idea of which stories are likely to be implemented in which Iteration. At this point you can start detailed architecture work.
Before this you must make some decisions, and the best you can manage I think is to try to pin down what detail you can:
- what the nonfunctional requirements are likely to be,
- which standards you must conform to,
- which existing systems you must interface with and their APIs,
- what the deployment platform will be,
- what skills the target operational team have,
That sort of thing. Often these constraints can sufficiently box in your expected delivery that you can be confident of the high level components and where they will reside.
Something I do strongly recommend is doing Information Architecture work on user interfaces. UIs are fragile and expensive to change, and a wireframe representation is close enough to the finished article to be possible to discuss with stakeholders and get valuable answers.
You need a good information architect though, and you need to regularly mine changes in the IA for user stories, to ensure everything in them gets appropriately estimated.
For the non-UI work, think carefully about some of the core concepts in the solution - often things like transaction boundaries and requirements for transaction safety can be quite clearly identified and stated, even if you don't know what specifically is contained in each type of transaction. Make some statements about constraints and success criteria specifically for transactions and data safety, and get them agreed by stakeholders.
Finally at the beginning of each iteration you can do some detail modelling and architecture work, since here is where real functional decomposition happens. I strongly recommend putting in the time pre-iteration for this work. At the start of actual coding for the iteration you should have a clear idea of every class you are going to create, where it will live and what it will talk to. If you don't have this, it's impossible to coordinate the development team.