Padding with zeros at the front should not change the result. Computing the CRC is essentially binary long division. Unfortunately this involves splitting each byte. This is easy to with shift operators and bitwise or.
Zero padding at the end is, much easier, and depending on your reason for computing the CRC, a completely reasonable thing to do. For example, if you are using CRC for an integrity check.
Edit Taking my example from my comment. If you have 11 bits 11101110111 and want to compute the CRC, pad them to get 00000111 01110111 = 0x777, do not pad them to get 0x7770 as this will have a different CRC.
The reason that this works is that CRC is essentially binary long division
1 0 1 = 5
-------------
1 0 0 1 1 / 1 1 0 1 1 0 1
1 0 0 1 1 | |
--------- | |
1 0 0 0 0 |
0 0 0 0 0 |
--------- |
1 0 0 0 0 1
1 0 0 1 1
---------
1 1 1 0 = 14 = remainder
Has exactly the same result as
1 0 1 = 5
---------------
1 0 0 1 1 / 0 1 1 0 1 1 0 1
1 0 0 1 1 | |
--------- | |
1 0 0 0 0 |
0 0 0 0 0 |
--------- |
1 0 0 0 0 1
1 0 0 1 1
---------
1 1 1 0 = 14 = remainder
and similarly for any number of leading zeros.
Note at this point, unless you are a psychiatrist looking for field work, want to become one, or secretly desire to need to see one, it may be worth your while to skip to the Super double secret probationary edit
Further Edit Due to question change
If you have a nontrivial initial vector, you can do the following. Say we want to compute the CRC-CCITT CRC of the above string with an initializer of FFFF. We pad the string to get 0x0FFF compute the CRC-CCIT with initializer 0 to get 0x0ECE, then compute the CRC-CCIT with initializer 0xFFFF of 0x0000 to get 0x1D0F, and xor them 0x0ECE xor 0x1D0F = 0x13C1.
The CRC of an arbitrary string of 0's and a nonzero initializer can be computed quickly if the polynomial is primitive (I think they all are), but it gets complicated and I do not have nearly enough time.
The essence of the technique is that we can consider the state of the shift register as a polynomial. If we initialize it with n ones this is the same as considering the initial polynomial as p(x) = x^(n - 1) + x^(n - 2) ... + x + 1. Computing the CRC of a string of k zeros is equivalent to finding p(x) x^k mod CRC. x^k mod CRC is easily found by repeated squaring and reduction. Any library for polynomial arithmetic over GF(2) should do this.
Even further Edit It probably makes more sense in the case of nonzero initializers to pad with zeros and change the initializer to a value such that after reading |pad| number of zeros the shift register contains FFFF (or whatever the value you wanted was. These can be precomputed, and you only need to store 16 or 32 of them (or howver many bits are in your crc polynomial.
For example with CRC-CCIT with initializer 0xFFFF and a single bit 0 padding we will want to use an initializer of 0xF7EF. These can be computed by finding x^(-1) mod CRC using the extended euclidean algorithm and then computing initializer * x^(-k) mod CRC for the various padding lengths. Again any GF(2) polynomail package should make this easy. I have used NTL in the past and found it quite flexible, but it is probably overkill here. Even for 32 bit crcs exhjaustive search will probably find the initializers faster than you can write the code.
Super double secret probationary edit
Ok, Things are actually considerably simpler than I thought they were. The general idea above is correct, we want to pad the string with 0's at the front to extend the size to a multiple of 8, 16 or 32 depending on what our software implementation wants, and we want to change our initial vector to set our state to something that after reading the padding zeros will the LFSR will be set to the initial vector that we wanted. We certainly could use galois field arithmetic to do this, but there is an easier way: just run the LFSR backwards.
For example if we want to compute the CRC-CCITT (0xFFFF) of the 11 bits 11 bits 11101110111 we pad them with 5 0's to get 00000111 01110111 and then back the LFSR up five spaces to get an initial vector of 0xF060. (I've done the computation by hand, so beware).
So if you start an LSFR (or a software implementation) with IV of 0xF060 and run it on 0x0fff, you should get the same result as running an LFSR with IV 0xFFFF on the original 11 bits.