I would say both are significant, but it is not really as simple as that:
The actual context switch time is simply a matter of the number of instruction cycles required to perform the switch, like anything in software it may be coded efficiently or it may not. On the other hand, all other things being equal, a processor with a large register set will require more instruction cycles to save the context; but having a large register set may make other code far more efficient.
A processor may also have an architecture that directly supports fast context switching. For example the lowly 8bit 8051 has eight duplicate register banks; so a context switch is little more that a register bank switch (so long as you have not more that eight threads), and given that Silicon Labs produce 8051 based devices at 100MIPS, that could be very fast indeed!
More sophisticated processors and operating systems may use an MMU to provide thread memory protection, this is an additional context switch overhead but with benefits that may override that. Also of course such processors generally also have high clock rates which helps.
So all in all, the processor speed, the processor architecture, the quality of the RTOS implementation, and the functionality provided by the RTOS may all affect context switch time. But in the end the easiest way to improve switch time is almost certainly to increase the clock rate.
Although it is nice to have more headroom, if context switch time is a make or break issue for your project on any reputable RTOS you should consider the suitability of either your hardware or your design. You should aim toward a design that minimises context switches. For example, if an ADC conversion takes 6us and a context switch takes 20us, then you would do better to busy-wait than to use a conversion-complete interrupt; better yet use DMA transfers to avoid context switches on single data items where possible.