Generally speaking, do the joining in the DBMS. If you do it in the application server, you are betting that you can do a better job of optimizing joins than the people who wrote the DBMS, and (further) that you can out-perform their best efforts by enough to offset the cost of transferring the unjoined data across the wire.
Now, if you are going to do a cross-product of two wide tables (lets say that they are T1, with N1 rows of width W1 and T2 with N2 rows of width W2) with no filtering, then the DBMS is obliged to create and send N1 * N2 * (W1 + W2) bytes of data over the wire, whereas you could suck down the tables separately as N1 * W1 + N2 * W2 bytes of data. If N1 = N2 = 1M and W1 = W2 = 100, then that's 200 TB vs 200 MB of data transfer in favour of doing the cross-product in the app server. But that's not exactly fair to the DBMS. Most queries are not that silly - they join on columns and apply conditions, and the DBMS optimizer will struggle mightily (and automatically) to minimize the work done. Further, it will only send the pertinent data back to you; it does not have to send all the rows that don't match your criteria.
To show an alternative scenario (in favour of the DBMS) consider a case where T1 has N1 = 1M rows of width W1 = 100, but T2 has N2 = 100K rows of width W2 = 50. There is a join between the two tables on an integer column, and there are, therefore, 10 rows in T1 for each on in T2. Suppose that you suck down all of T1 and T2 to the app server: that requires N1 * W1 + N2 * W2 = 105 MB of data. But the filter conditions limit the data to 1/10 of the rows in T2 and for each row in T1 that matches a row in T2, there is in fact only 2 rows that match the filter conditions. Now the DBMS is only going to transfer
N2 * (W1 + W2) / 5 = 3 MB, a saving of over 100 MB of data transfer by the DBMS. Now, if you manage to be clever and download just the N2 * W2 / 10 = 500 KB of data that corresponds to the values in T2, you still have to get the DBMS to do the 'semi-join' of T1 on the values you want to get the right rows from T1 to the app server. If you only need a subset of the columns, there can be another set of savings. And DBMS tend to have rather clever sort packages; you'll need a good sort package in your app server to present data in the correct order.
It should normally be a hands-down win for joins in the DBMS. If it isn't, it is because you are asking the server to do more work than it can handle. In that case, you need to look at whether replicating the database server makes sense, or whether adding more cores, or more network bandwidth, or more main memory will do the job.