I may be disagreeing with commonly accepted wisdom, but I think that you do have to think about performance all the time. The important thing though is how you think about performance.
Often if you start talking about performance, other people will start talking about optimisation, rather prematurely you might say, as performance and optimisation are not the same thing.
Paying too little attention to performance is prematurely optimising for the kudos that accues from not optimising prematurely.
Optimise is a verb that takes an object. Until you have that object, you cannot optimise it. You can optimise code once it has been written (though it may only be a single method). You can optimise a class model or a functional spec once they have been written.
Working on performance is a matter aiming for the optimal, whether a priori or posteriori. Some sort of performance work is only appropriate posteriori and this is what should be considered optimisation, and is premature to do a priori. Some sort of performance work is as, or more, appropriate a priori.
The main thing to try to get right a priori is whether your basic approach is reasonable. One example often given here is that of time complexity of algorithms, but I have to disagree. It is not always better to do something in O(1) or O(n log n) than in O(n). O(n) time is the same as O(1) time when n is 1, and faster when n is 0, and datasets with 0 or 1 item can be common in a lot of cases. More so, time complexity notation deliberately ignores lower-order terms and constants. Really O(n) time means kn + c (and possibly other lower-order terms) while O(1) time means k + c, but for different values of k and c. If this algorithm is itself inside a loop, it could be that O(n) will massively beat out O(1).
So time complexity isn't the thing that needs to be considered here. The thing that needs to be considered here, is whether time complexity is the thing to be considered here. If it is, then it's time to look at whether the case where O(n) beats O(1) because of lack of overhead applies, or whether one should go with the more common case where O(n) beats O(1), or whether one should ignore time complexity here and just do what reads more naturally. E.g. a common case of time-complexity competition is whether to use a list and search it or a hashed-based set and query it based on the hash. However with most libraries the code for each is going to look different so there will be one that better describes the intent, and that's the one to go for when it isn't performant critical.
The important a priori though about performance here, was whether it was worth thinking about performance at this stage.
Another case of basic approach, is in how remote resources are handled. If you are going to access the same rarely-changed remote resource multiple times a second, you need to make sure that your access code has either some degree of caching, or that it will at least be easy for you to put that caching in. Locking down to a particular approach to caching may or may not be premature, but tightly mixing in your access with other matters so that its hard to add that caching later is almost certainly premature pessimisation!
So we need some thought from the beginning, though we don't need to solve everything at this stage.
Another reasonable thought right at the beginning, is I don't need to think about the performance of this part right now. This is going with the people who say not to pay attention to peformance a priori, but on a finer-grained level. It's putting a small amount of thought into being reasonably confident that it is indeed okay to not think in more detail about performance.
Another reasonable thought is I am pretty sure the performance of this section will be critical, but I'm not yet able to measure the impact of different approaches. Here you've decided that optimisation will probably be necessary, but that aiming for the optimal now would indeed be premature. However, you can lay the ground work by making your functional boundaries such that you can more easily change the implementation of the suspected-critical part, or perhaps put time-logging code into that function esp. on debug builds, and esp. if it's quite far from the calling public method (so it isn't equivalent from doing the time logging in an external test). Here you haven't done anything a priori to make the code faster, but you have worked to aid later optimisation.
Another thing it is reasonable to think about is whether something should be done in a multi-threaded way, but note here that there are three reasonable thoughts; as well as this will need to be multi-threded and this will not need to be multi-threaded there is also this may need to be multi-threaded. Again, you can define functional boundaries in such a way as to make it easier to later re-write the code so that that particular method is called in parallel to other work.
Finally, you need to think about how able you are going to be to measure performance after the fact, and how often you are going to have to.
One important case here is where underlying data will change over the course of the lifetime of the project. With your DBA background you will be well aware that the optimal solution for a particular set of data with a particular balance of frequency of operations is not the same for different data with different operation frequencies even if the data fits the same schema (e.g. small table heavy reads and writes will benefit less from heavy indexing than the same table schema having many more rows, few writes and many reads). The same applies to applications, so you need to consider whether you are going to have to just do optimisation in a later optimisation phase, or if you are going to have to return to optimisation frequently as conditions change. In the latter case it's worth making sure now that it's easy to change parts likely to change in the future.
Another important case is where you will not be able to easily obtain information about how the code is being used. If you are writing an application that is used in a single site, you will be able to measure this very well. If you are writing an application that will be distributed, this becomes harder. If you are writing a library that will be used in several different applications, then it is harder still. In the latter case the argument that YAGNI becomes much weaker; maybe someone out there really does need that slow method to be much improved and you don't know about it. Again though, the approach you need to take isn't always the same; while one approach is to put in work up-front to make it more performant in this case (not quite a priori as it's posteriori to your library being written but a priori to it being used) another approach is simply to document that the particular method is by necessity expensive and that the calling code should memoise or otherwise optimise the way their own code uses it if appropritate.
Constantly, if mostly subliminally, thinking about performance is important, it's just that the appropriate response to that thinking isn't always make it go faster now.