A basic problem with using a PC for motion control, without dedicated hardware, is the data rate of the various outputs built in to most PC's. If your customers don't have the budget for the high end motion controller you mention, then they simply aren't going to have access to 100 kHz signals.
The I/O controllers for every port provides some kind of buffering so that you will get smooth results out of them. This largely serves the purpose of alleviating software from the need to 'bit-bang' those signals, and also makes I/O frequency much more stable.
The downside is the buffering will probably introduce some latency in the system, which may impact the performance closed loop control circuit with the control in software. Another downside is this limits your frequency options. One of the fastest I/O options on a PC that doesn't provide high end flow control like USB or SATA is actually the audio port, but that is limited to 64 kHz or 44 kHz on most systems.
If the audio port is an option for you, then you don't need to worry about timing at all, Just make sure you set the frequency properly, and produce your signal as you would otherwise.
If you need to use a digital signal, the parallel and serial ports work in a similar fashion , but at a lower maximum speed.
EDIT Hmm... It seems parallel ports are a bit faster than they used to be. An enhanced parallel Port (EPP) capable machine can give you up to 2mbits of of binary bandwidth, which should be well within the reach of your goal. However, due to the way the parallel interface works, data speed is dependent on the peripheral, rather than some predefined buffer speed. Essentially, the peripheral acknowledges that it has recieved data, or that it is ready to send data, for each byte transferred.
This isn't too bad, though, because a simple, clock driven peripheral is easy to set up.
Additionally, data cannot flow in both directions at the same time, data coming from the peripheral will cause the stream of data from the host PC to block. A possible way to circumvent that is to just arrange to have two parallel ports available, or arrange for the peripheral to limit its own data transmissions to every other cycle.
I/O on a such a port is normally interrupt driven. Besides a few error states, the most useful interrupts are generated when the input or output FIFO's reach a certain level. You can use this interrupt to write (or read) new data to the port. The threshold can be set to anywhere from 1 to 16 bytes. You should carefully test (with a logic analyzer/oscilloscope) that data is actually flowing at the expected speed, with the expected rate. You could do this almost as well in software using a QueryPerformanceCounter.