I haven't read the paper but I am making a big assumption here: The paper uses the CAS (compare and swap) technique to achieve the concurrency.
Lock free does not mean block free. The use of CAS will stall other threads, but at least one thread will be moving 'forwards' at all times.
Multiple producers all write to the same queue - that isn't a problem. The trickiness is the multiple consumers. If every consumer must access the data then I would implement that by multiple queues and the data would fall through to the next queue after being processed in one. If you mean multiple consumers via threads then that would work on the above method