In situations where I've had to do this, I like to look at entry/exit points first, and label those first. Then, from there, look for common underlying functions/methods supporting the public items.
If there is sparse or no documentation, it can useful to think of this as breaking into a building.
Is it a large building with lots of entrances/exits?
Is this an application or component that spans multiple processes or subsumes an entire server?
What powers the building?
What frameworks are visible that were used to construct it?
Is there public access, say on a ground floor?
What public interfaces or interactions are possible?
From here, I would label all the public methods, structures, and classes available. This might entail labeling everything, though, if you are in a language or environment where everything is public-access. Particularly interesting items are those that have analogous start/stop, begin/end, pause/resume prefixes. These usually hint at high-powered control of specific items. Things like pascal-casing, use of m_ markers for member variables can hint at internal operations here too.
Continuing, it may be important to know about its file formats, communication activity (network? pipes? etc?), and security. Each of these can be broken down with similar analogies if you'd like. Rather than tire everyone with metaphors, here are some outlines.
File Formats
- Is text visible in files generated by the component? Does it resemble any popular format for text or text-transport? (XML, JSON, HTML, SGML, CSV, Tab-Delimited, etc.) This may reveal the intended recipient or source of communications with the app as well.
- Is there an offset or size stored in the first 4 bytes of the file? First 8 bytes? First 4 bytes after the very first 4? This can hint at binary layout or raw packets.
- Is there a well-known four-character marker present in the file? Media types frequently have FOURCC codes to label content. Container formats such as AVIs will frequently contain multiple instances and embedded items. Also note if there are consistently named items, if you have the source.
Communications
- Do new socket connections show up (netstat -a on windows) when the component is active?
- Are there sockets that are now listening as a result? This hints at a server/recipient component.
- Are there outbound connections? Where do the outbound connections go? What server ports do they attempt to contact? HTTP tends to be the most interesting item you may find, here.
Security & Operation Access
- Does the component fail when run as the lowest-privileged user available? Does it require elevated permissions to run?
- Does the component read or write to the registry, user profile, or any temp directories? Does it attempt to access locations for shared users?
- Does the component generate exceptions? What kind of exceptions (.NET managed? Native? POSIX Signals?). It can be very useful to locate specific exception declarations if the source code is written in a language/framework that allows them.
Finally, if none of this is able to increase your working knowledge of the code/library in question, consider the motive behind the creation or use of the code/library.
- What problems does it attempt to solve?
- What usage benefits might be gained by using it?
- Is this component actually intended to operate as part of a larger component or system?
These questions can lead to clues on what entry/exit points are the most interesting.