I've read that ZIP files start with the following bytes:
50 4B 03 04
Reference: http://www.garykessler.net/library/file_sigs.html
Question: Is there a certain sequence of bytes that indicate a ZIP file has been password-protected?
I've read that ZIP files start with the following bytes:
50 4B 03 04
Reference: http://www.garykessler.net/library/file_sigs.html
Question: Is there a certain sequence of bytes that indicate a ZIP file has been password-protected?
It's underlying files within the zip archive that are password-protected. You can have a series of password protected and password unprotected files in an archive (e.g. a readme file and then the contents).
If you followed the links describing ZIP files in the URL you reference, you'd find that this one discusses the bit that indicates whether a file in the ZIP archive is encrypted or not. It seems that each file in the archive can be independently encrypted or not.
It's not true that ZIP files must start with
50 4B 03 04
Entries within zip files start with 50 4B 03 04...
..and often, pure zip files start with a zip entry as the first thing in the file. But, there is no requirement that zip files start with those bytes. All files that start with those bytes are probably zip files, but not all zip files start with those bytes.
For example, you can create a self-extracting archive which is a PE-COFF file, a regular EXE, in which there actually is a signature for the file, which is 4D 5A ...
. Then, later in the exe file, you can store zip entries, beginning with 50 4B 03 04...
. The file is both an .exe and a .zip.
A self-extracting archive is not the only class of zip file that does not start with 50 4B 03 04
. You can "hide" arbitrary data in a zip file this way. WinZip and other tools should have no problems reading a zip file formatted this way.
If you find the 50 4B 03 04
signature within a file, either at the start of the file or somewhere else, you can look at the next few bytes to determine whether that particular entry is encrypted. Normally it looks something like this:
50 4B 03 04 14 00 01 00 08 00 ...
The first four bytes are the entry signature. The next two bytes are the "version needed to extract". In this case it is 0x0014, which is 20. According to the pkware spec, that means version 2.0 of the pkzip spec is required to extract the entry. (The latest zip "feature" used by the entry is described by v2.0 of the spec). You can find higher numbers there if more advanced features are used in the zip file. AES encryption requires v5.1 of the spec, hence you should find 0x0033 in that header. (Not all zip tools respect this).
The next 2 bytes represents the general purpose bit flag (the spec calls it a "bit flag" even though it is a bit field), in this case 0x0001. This has bit 0 set, which indicates that the entry is encrypted.
Other bits in that bit flag have meaning and may also be set. For example bit 6 indicates that strong encryption was used - either AES or some other stronger encryption. Bit 11 says that the entry uses UTF-8 encoding for the filename and the comment.
All this information is available in the PKWare AppNote.txt spec.