One of the main characteristics of functional programming is the use of side-effectless functions. However, this can be done in an imperative language also. The same is true for recursion and lambda functions (for example C++0x). Therefore I wonder whether imperative programming languages are a superset of functional ones.
It is possible to implement a certain programming paradigm in a language which doesn't support the programming paradigm natively. For example its possible to write Object Oriented Code in C while it is not designed for this purpose.
Functional programming is a well developed programming paradigm of its own and is best learnt through languages like Haskell, LISP etc. And after you have learnt them well, even though you don't use these languages regularly, you may start using those principles in the day to day language you use on regular basis.
Some people may like to Google for Object oriented programming in C
I can't really say whether they are subset of one another. What I can tell, though, that (except for really esoteric languages) they are all Turing-complete, which means that in the end they're all equally powerful, but not neccesarily equally expressive.
A paradigm is a way of doing things, and there are two main programming paradigms: imperative and declarative. The fact that some languages allow to mix both paradigms doesn't mean that one is included in the other, but that the languages are multi-paradigm.
To clarify it a little bit more, let me continue with your analogy: if Lisp and OCaml (for example) are considered functional languages, and both of them allow imperative style... then should imperative be considered a subset of functional?
Generally speaking, no; functional programming is a subset of declarative programming (which includes logic programming languages, like Prolog). Many imperative languages borrow elements from functional programming languages, but simply having lambdas or referentially-transparent functions does not make an imperative language functional; functional programming is about more than just these elements.
Most imperative languages don't have functions as first-order types, whereas most functionald o. (As does C++, via boost::function.)
By first-order type, this meas a value/variable can be of any type, an int, a bool, a function from int->bool. It usually also includes closures or bound values as well, where you have the same function, but some arguments are already filled in.
Those two are what functional programming is mostly about, IMHO.
Pattern mapping like
f:: [int] -> int
f [] = 0
f (x:xs) = 1 + f(xs)
is something that is for instance one thing that is not available in imperative languages. Also constructs like curried functions:
add2 :: int -> int
add2 = (2 +)
is not available in most imperative languages
I think it might be helpful to draw a distinction between paradigm and language.
To me, paradigms represent "ways of thinking" (concepts and abstractions such as functions, objects, recursion), whereas languages offer "ways of doing" (syntax, variables, evaluations).
All true programming languages are equivalent in the sense that they are Turing-complete and able, in theory, to compute any Turing-computable function as well as simulate or be simulated by a universal Turing machine.
The interesting thing is how difficult it is to accomplish certain tasks in certain languages or paradigms, how appropriate the tool is to the task. Even Conway's Game of Life is Turing-complete, but that does not make me want to program with it.
Many languages support a number of paradigms. C++ was designed as an object-oriented extension for C, but it is possible to write purely procedural code in it.
Some languages borrow/acquire features from other languages or paradigms over time (just look at the evolution of Java).
A few languages, like Common Lisp, are impressively multi-paradigm languages. It is possible to write code that is functional, object oriented or procedural in Lisp. Arguably, aspect-orientation is already part of the common lisp object system, and therefore "nothing special". In Lisp, it is easy to extend the language itself to do whatever you need it to do, thus it is sometimes called the "programmable programming language". (I'll point out here that Lisp describes a family of languages of which Common Lisp is only one dialect).
I think it doesn't matter which of the terms, declarative, imperative, functional or procedural, is a subset of which. What matters more is to understand the tools languages you're working with, and how those are different from other tools. Even more important is to understand the different ways of thinking that the paradigms represent, since those are your thought-tools. As with most other things in life, the more you understand, the more effective you become.
One way to look at it (not saying it's the right way 'cos I'm not a lang designer or theorist by any means) is that if the language is essentially converted to something else then that 'something else' must be the superset of the source. So bytecode is necessarily a superset of Java. .NET IL is a superset of C# and of F#. The functional constructs in C# (i.e. LINQ) are thus a subset of the imperative constructs of IL.
Since machine language is imperative, you could take the position that, therefore, all languages are imperative, because they are just abstractions useful for humans that are then boiled away by the compiler to procedural, imperative machine code.