Because in the original conceptions of Computer Science theory and practice, Functions and Subroutines had virtually nothing to do with each other.
FORTRAN is usually credited as the first language that implemented both of these and demonstrated the distinctions. (Early LISP had a somewhat opposing role in this also, but it had little impact outside of academia).
Following from the traditions of mathematics (which CS was still part of in the 60's) functions were only seen as the encapsulation of parametrized mathematical calculations solely intended to return a value into a larger expression. That you could call it "bare" (F = AZIMUTH(SECONDS)) was merely a trivial use case.
Subroutines, on the other hand were seen as a way to name a group of statements meant to have some effect. Parameters were a huge boost to their usability and the only reason that they were allowed to return modified parameter values was so that they could report their status without having to rely on global variables.
So, they really had no conceptual connection, other than encapsulation and parameters.
The real question, is: "How did so many developers come to see them as the same?"
And the answer to that is C.
When K+R originally designed their high-level macro assembler type language for the PDP-11 (may have started on the PDP-8?), they had no delusions of hardware independence. Virtually every "unique" feature of the language was a reflection of the PDP machine language and architecture (see i++ and --i). One of these was the realization the functions and subroutines could be (and always was) implemented identically in the PDP excpet that the caller just ignored the return value (in R0 [, R1]) for subroutines.
Thus was born the void pointer, and after the C language had taken over the whole world of programming, the misperception that this HW/OS implementation artifact (though true on almost every subsequent platform) was the same as the language semantics.