So let's say we have a function such as 6wn^2 - 6wn + 6w, would the big-o notation be O(n^2) or O(wn^2)? The problem set question asks me to write this function in Big-O form in terms of w and n, so I think it's the second one but I've never seen a big O notation like that involving 2 terms so I'm a little confused.
Technically, both are correct, but if the question asks for it it in terms of both w and n, then it's the second one.
If a function f(n) is O(g(n)) for some function g(x), it means that f(n) is bounded above by g(n) asymptotically. Basically this means that for large n values, g(n) will be bigger than f(n).
(More formally, we could say that f(n) is O(g(n)) if and only if there exists some N such that g(n)>f(n) for all n>N)
Now in your case, let f(n) = 6wn^2 - 6wn + 6w.Then f(n) is both O(n^2) and O(wn^2). This is because both are asymptotic upper bounds for f(n). In fact, f(n) is also O(n^2^2^2^2^2^2^2).
However the best answer for you to give would likely be that f(n) is O(wn^2), since that includes w, which is what the question asked for.
Note that in practice we usually remove all coefficients and unimportant powers from g(n). The reason is that you get more information about a function if you present a low upper bound as opposed to a high one. For example, if I tell you that my speedy search algorithm is O(n^(1000!)), I'm not telling you very much at all. On the other hand, if I told you it was O(n^2), I'm giving you more information - but both could be correct.
Since the question asks you to write in terms of both, the second is the answer pretended.
In general it is okay to have two variables in big-O notation, if the input size is defined by two variables and they do not depend in one another.
An example of this is Dijkstra's algorithm, that has a worst case performance of O( | E | + | V | log | V | ). The size of the input graph is defined by the number of vertices (V) and the number of edges (E). The adjacency matrix representation ends up making E = V^2, so if we used Dijkstra's algorithm with an adjacency matrix (don't do it, it's a bad idea) we would convey better information with O( V^2 + | V | log | V | ), which is the same as simply O(V^2).