You can either create and compile several programs (and create kernelobjects from those), or you can put all kernels into the same program (clCreateProgramWithSource
takes several strings afterall) and create all your kernels from that one. Either should work fine using the same commandqueue. Using more then one commandqueue to execute kernels which should execute serially on the same device is not a good idea anyways, because in that case you have to manually wait for the event completition instead of asynchroneously enqueueing all kernels and then waiting on the result (at least some operations should execute in parallel on device and host, so waiting at the last possible moment is generally faster and easier).