So here is what I'd do. Given the answer to my previous question I think you have something like the following. Sounds like you want to implement some sort of 20 questions like approach.
With twenty questions you have yes/no answers so a binary tree works best. However, you could layer in multiple choice options, but the user picks one choice. So this algorithm assumes you've trained your tree ahead of time and it's been built from a dataset you wish to use.
Say for example we're trying to do a medical diagnosis so our data might look like the following:
Disease Name Head Ache Fever Back Pain Leg Pain Blurry Vision Hearing Loss
Common Cold Yes Yes No No No No
Migraine Yes No No No Yes No
Herpes No Yes No No No No
In this example, Head Ache, Fever, Back Pain, Leg Pain, etc are the influencers, and Disease Name is the target. Each row would be an actual diagnosis of a single patient so a disease could be repeated in the data more than once.
- Modify a walk algorithm to start at the root.
- If you've reached a leaf tell the user the potential answers.
- Take the influencer used to split this node and present it to the user and ask the "Yes/No" question (Do you have a Head Ache).
- Go left if the user answers Yes.
- Go Right if the user answers No.
- Goto Step 2
In the leaf nodes you'll have to actual rows that made it to that location so you can display it to the user saying you might have one of these:
Head ache
Migraine
Severed Head
Prescription is: blah blah blah.
With 1 million influencers it will take a while to build the tree. If you wanted to lower that it might be possible to use multi-valued influencers instead of yes/no. Although it's really hard to think of 1 million yes/no unique questions even for every medical condition. Once you build the tree it can offer as many diagnosis as you want.