Just as a refresher of terminology, in Q-learning, you are trying to learn the Q-functions, which depend on the state and action:
Q(S,A) = ????
The standard version of Q-learning as taught in most classes tells you that you for each S and A, you need to learn a separate value in a table and tells you how to perform Bellman updates in order to converge to the optimal values.
Now, lets say that instead of table you use a different function approximator. For example, lets try linear functions. Take your (S,A) pair and think of a bunch of features you can extract from them. One example of a feature is "Am I next to a wall," another is "Will the action place the object next to a wall," etc. Number these features f1(S,A), f2(S,A), ...
Now, try to learn the Q function as a linear function of those features
Q(S,A) = w1 * f1(S,A) + w2*f2(S,A) ... + wN*fN(S,A)
How should you learn the weights w? Well, since this is a homework, I'll let you think about it on your own.
However, as a hint, lets say that you have K possible states and M possible actions in each state. Lets say you define K*M features, each of which is an indicator of whether you are in a particular state and are going to take a particular action. So
Q(S,A) = w11 * (S==1 && A == 1) + w12 * (S == 1 && A == 2) + w21 * (S==2 && A==3) ...
Now, notice that for any state/action pair, only one feature will be 1 and the rest will be 0, so Q(S,A) will be equal to the corresponding w and you are essentially learning a table. So, you can think of the standard, table Q-learning as a special case of learning with these linear functions. So, think of what the normal Q-learning algorithm does, and what you should do.
Hopefully you can find a small basis of features, much fewer than K*M, that will allow you to represent your space well.