When you do profiling you should generally try to reproduce the production environment as closely as possible. Differences in hardware (# of cores, memory, etc) and software (OS, JVM version) can make your profiling results as unique as the runtime environment.
For example, what looks like a CPU bottleneck worth optimizing on your local machine might disappear entirely or turn into a disk bottleneck on your production server depending on the differences in the CPU.
All modern profilers will allow to attach to a remotely running JVM so you don't need to worry about only having console access.
What profiler you decide to use will depend on your needs and preferences. Certain profilers will show you "hotspots" where your code is spending the majority of the time and these are often good candidates for optimization.
I prefer to use JProfiler for its extensive features and good performance. I previously used YourKit but switched to JProfiler for its memory and thread profiling features.