The nomenclature can be confusing. Instead of "first responder" think of it as "initial event target" i.e. the object that is the first responder becomes the initial target for all events. In some APIs this is also called the "focus" although in the Apple APIs that is usually reserved to describe windows.
At any given time, there is only one first-responder/intial-event-target in the app. Only individual objects/instances can become a first-responder/intial-event-target. Classes can merely define if their instance have the ability to become a first-responder/intial-event-target. A class need only provide the ability to become the app's first-responder/intial-event-target if it make sense to do so. For example, a textfield obviously needs the ability to trap events so that it can use those event to edit itself. By contrast, a static label needs no such capability.
Whether a particular class inherits from NSResonder has no bearing on whether the class (or a specific instance of the class) will let itself be set as the first-responder/intial-event-target. That ability comes solely from the instances' response to the canBecomeFirstResponder
message. The same instance can refuse to be the first-responder/intial-event-target under one set of conditions and then allow it later when conditions change. Classes can of course hardwire the status if they wish.
In other words, first-responder/intial-event-target is a status of a particular instance at a particular time. The first-responder/intial-event-target is like a hot potato or a token that is handed off from instance to instance in the UI. Some classes refuse to grab the hot potato at all. Some always do and others grab it sometimes and ignore it others.