views:

747

answers:

7

I'm planning to code a library that should be usable by a large number of people in on a wide spectrum of platforms. What do I have to consider to design it right? To make this questions more specific, there are four "subquestions" at the end.

Choice of language

Considering all the known requirements and details, I concluded that a library written in C or C++ was the way to go. I think the primary usage of my library will be in programs written in C, C++ and Java SE, but I can also think of reasons to use it from Java ME, PHP, .NET, Objective C, Python, Ruby, bash scrips, etc... Maybe I cannot target all of them, but if it's possible, I'll do it.

Requirements

It would be to much to describe the full purpose of my library here, but there are some aspects that might be important to this question:

  • The library itself will start out small, but definitely will grow to enormous complexity, so it is not an option to maintain several versions in parallel.
  • Most of the complexity will be hidden inside the library, though
  • The library will construct an object graph that is used heavily inside. Some clients of the library will only be interested in specific attributes of specific objects, while other clients must traverse the object graph in some way
  • Clients may change the objects, and the library must be notified thereof
  • The library may change the objects, and the client must be notified thereof, if it already has a handle to that object
  • The library must be multi-threaded, because it will maintain network connections to several other hosts
  • While some requests to the library may be handled synchronously, many of them will take too long and must be processed in the background, and notify the client on success (or failure)

Of course, answers are welcome no matter if they address my specific requirements, or if they answer the question in a general way that matters to a wider audience!

My assumptions, so far

So here are some of my assumptions and conclusions, which I gathered in the past months:

  • Internally I can use whatever I want, e.g. C++ with operator overloading, multiple inheritance, template meta programming... as long as there is a portable compiler which handles it (think of gcc / g++)
  • But my interface has to be a clean C interface that does not involve name mangling
  • Also, I think my interface should only consist of functions, with basic/primitive data types (and maybe pointers) passed as parameters and return values
  • If I use pointers, I think I should only use them to pass them back to the library, not to operate directly on the referenced memory
  • For usage in a C++ application, I might also offer an object oriented interface (Which is also prone to name mangling, so the App must either use the same compiler, or include the library in source form)
  • Is this also true for usage in C# ?
  • For usage in Java SE / Java EE, the Java native interface (JNI) applies. I have some basic knowledge about it, but I should definitely double check it.
  • Not all client languages handle multithreading well, so there should be a single thread talking to the client
  • For usage on Java ME, there is no such thing as JNI, but I might go with Nested VM
  • For usage in Bash scripts, there must be an executable with a command line interface
  • For the other client languages, I have no idea
  • For most client languages, it would be nice to have kind of an adapter interface written in that language. I think there are tools to automatically generate this for Java and some others
  • For object oriented languages, it might be possible to create an object oriented adapter which hides the fact that the interface to the library is function based - but I don't know if its worth the effort

Possible subquestions

  • is this possible with manageable effort, or is it just too much portability?
  • are there any good books / websites about this kind of design criteria?
  • are any of my assumptions wrong?
  • which open source libraries are worth studying to learn from their design / interface / souce?
  • meta: This question is rather long, do you see any way to split it into several smaller ones? (If you reply to this, do it as a comment, not as an answer)
+4  A: 

I don't know but if it's for Windows then you might try either a straight C-like API (similar to the WINAPI), or packaging your code as a COM component: because I'd guess that programming languages might want to be able to invoke the Windows API, and/or use COM objects.

ChrisW
Windows headers are very good, specially the older ones. They are so strict and complete, it is fairly easy to write a semi automated header converter. I did that for e.g. FPC's commctrl headers.
Marco van de Voort
+17  A: 

Mostly correct. Straight procedural interface is the best. (which is not entirely the same as C btw(**), but close enough)

I interface DLLs a lot(*), both open source and commercial, so here are some points that I remember from daily practice, note that these are more recommended areas to research, and not cardinal truths:

  • Watch out for decoration and similar "minor" mangling schemes, specially if you use a MS compiler. Most notably the stdcall convention sometimes leads to decoration generation for VB's sake (decoration is stuff like @6 after the function symbol name)
  • Not all compilers can actually layout all kinds of structures:
    • so avoid overusing unions.
    • avoid bitpacking
    • and preferably pack the records. While slower, at least all compilers can access packed records afaik
  • On Windows use stdcall. This is the default for Windows DLLs. Avoid fastcall, it is not entirely standarized (specially how small records are passed)
  • Some tips to make automated header translation easier:
    • macros are hard to autoconvert due to their untypeness. Avoid them, use functions
    • Define separate types for each pointer types, and don't use composite types (xtype **) in function declarations.
    • follow the "define before use" mantra as much as possible, this will avoid users that translate headers to rearrange them if their language in general requires defining before use, and makes it easier for one-pass parsers to translate them. Or if they need context info to auto translate.
  • Don't expose more than necessary. Leave handle types opague if possible. It will only cause versioning troubles later.
  • always have a version check function (easier to make a distinction).
  • be careful with enums and boolean. Other languages might have slightly different assumptions. You can use them, but document well how they behave and how large they are. Also think ahead, and make sure that enums don't become larger if you add a few fields, break the interface. (e.g. on Delphi/pascal by default booleans are 0 or 1, and other values are undefined. There are special types for C-like booleans (byte,16-bit or 32-bit word size, though they were originally introduced for COM, not C interfacing))
  • I prefer stringtypes that are pointer to char + length as separate field (COM also does this). Preferably not having to rely on zero terminated. This is not just because of security (overflow) reasons, but also because it is easier/cheaper to interface them to Delphi native types that way.
  • Memory always create the API in a way that encourages a total separation of memory management. IOW don't assume anything about memory management. This means that all structures in your lib are allocated via your own memory manager, and if a function passes a struct to you, copy it instead of storing a pointer made with the "clients" memory management. Because you will sooner or later accidentally call free or realloc on it :-)
  • (implementation language, not interface), be reluctant to change the coprocessor exception mask. Some languages change this as part of conforming to their standards floating point error(exception-)handling.
  • be careful with the coprocessor status word. It might be changed by others and break your code, and if you change it, other code might stop working. The status word is generally not saved/restored as part of calling conventions. At least not in practice.

(*) Delphi programmer by day, a job that involves interfacing a lot of hardware and thus translating vendor SDK headers. By night Free Pascal developer, in charge of, among others, the Windows headers.

(**) This is because what "C" means binary is still dependant on the used C compiler, specially if there is no real universal system ABI. Think of stuff like:

  • C adding an underscore prefix on some binary formats (a.out, Coff?)
  • sometimes different C compilers have different opinions on what to do with small structures passed by value. Officially they shouldn't support it at all afaik, but most do.
  • structure packing sometimes varies, as do details of calling conventions (like skipping integer registers or not if a parameter is registerable in a FPU register)

===== automated header conversions ====

While I don't know SWIG that well, I know and use some delphi specific header tools( h2pas, Darth/headconv etc).

However I never use them in fully automatic mode, since more often then not the output sucks. Comments change line or are stripped, and formatting is not retained.

I usually make a small script (in Pascal, but you can use anything with decent string support) that splits a header up, and then try a tool on relatively homogeneous parts (e.g. only structures, or only defines etc).

Then I check if I like the automated conversion output, and either use it, or try to make a specific converter myself. Since it is for a subset (like only structures) it is often way easier than making a complete header converter. Of course it depends a bit what my target is. (nice, readable headers or quick and dirty). At each step I might do a few substitutions (with sed or an editor).

The most complicated scheme I did for Winapi commctrl and ActiveX/comctl headers. There I combined IDL and the C header (IDL for the interfaces, which are a bunch of unparsable macros in C, the C header for the rest), and managed to get the macros typed for about 80% (by propogating the typecasts in sendmessage macros back to the macro declaration, with reasonable (wparam,lparam,lresult) defaults)

The semi automated way has the disadvantage that the order of declarations is different (e.g. first constants, then structures then function declarations), which sometimes makes maintenance a pain. I therefore always keep the original headers/sdk to compare with.

The Jedi winapi conversion project might have more info, they translated about half of the windows headers to Delphi, and thus have enormous experience.

Marco van de Voort
(**) I meant "C-like, as opposed to using **C++**-specific functionality ... unless, that is, the C++-specific functionality is packaged as a COM object (i.e. you can use C++ to implement COM objects)".
ChrisW
I understand. Definitely C then, (**) was just a sideremark to make the difference between C-like and real C. Anyway, both C and COM then. COM can be used by VB, Delphi and .NET.C can be used by languages that prefer in-process for speed, like other C++ compilers, and non-lazy .NET, Delphi and FPC programmers.
Marco van de Voort
Could you elaborate on this point please: "always have a version check function (easier to make a distinction)"?
sbk
version check - if you have a fn that tells you the version of the library, you'll have more options for a) debugging problems, b) allowing backward-compatibility (eg your new app can still use the old version of a library, just not newer features), c) disallow using old versions of the lib if necessary.
gbjbaanb
gbjbaanb says it all. This is means an user can switch to dynamically loading the dll if too many versions confuse the picture, and select runtime what is available. Typical for stuff like mysqlclient that seems to break its own interface every minor version
Marco van de Voort
Event though I don't agree with (or understand?) each and every of your points, there's much insight in there. Also, I'm impressed by you constant effort to update and expand your answer, so it is definitely the accepted one for me.
Brian Schimmel
+3  A: 

Regarding automatic wrapper generation, consider using SWIG. For Java, it will do all the JNI work. Also, it is able to translate complex OO-C++-interfaces properly (provided you follow some basic guidelines, i.e. no nested classes, no over-use of templates, plus the ones mentioned by Marco van de Voort).

Alexander Gessler
Wow, I've heard of SWIG before, but I though it was just for Java and one or two other languages. But now I see it supports 18 target languages (even if "only" 6 of them have a large relevance IMHO), so it solves many problems at once. Anyway, SWIG says to be design- style-agnostic, which is good, but still leaves me with the question on how to design the API well.
Brian Schimmel
Don't forget to check if the generated output has a comparable quality. In a lot of pluggable programs, the plugins that are non standard are only there for quantity.
Marco van de Voort
+2  A: 

NestedVM I think is going to be slower than pure Java because of the array bounds checking on the int[][] that represents the MIPS virtual machine memory. It is such a good concept but might not perform well enough right now (until phone manufacturers add NestedVM support (if they do!), most stuff is going to be SLOW for now, n'est-ce pas)? Whilst it may be able to unpack JPEGs without error, speed is of no small concern! :)

Nothing else in what you've written sticks out, which isn't to say that it's right or wrong! The principles sound (mainly just listening to choice of words and language to be honest) like roughly standard best practice but I haven't thought through the details of everything you've said. As you said yourself, this really ought to be several questions. But of course doing this kind of thing is not automatically easy just because you're fixed on perhaps a slightly different architecture to the last code base you've worked on...! ;)

My thoughts:

All your comments on C interface compatibility sound sensible to me, pretty much best practice except you don't seem to properly address memory management policy - some sentences a bit ambiguous/vague/wrong-sounding. The design of the memory management will be to a large extent determined by the access patterns made in your application, rather than the functionality per se. I suiggest you study others' attempts at making portable interfaces like the standard ANSI C API, Unix API, Win32 API, Cocoa, J2SE, etc carefully.

If it was me, I'd write the library in a carefully chosen subset of the common elements of regular Java and Davlik virtual machine Java and also write my own custom parser that translates the code to C for platforms that support C, which would of course be most of them. I would suggest that if you restrict yourself to data types of various size ints, bools, Strings, Dictionaries and Arrays and make careful use of them that will help in cross-platform issues without affecting performance much most of the time.

martinr
Plus NestedVM might not be good at balancing its memory needs with those of the rest of the software running on a phone, if the NestedVM virtual machine memory usage varies greatly because of lots of frees and reallocs, or because of heap fragmentation in the virtual MIPS machine (I suggest).
martinr
Oh yeah and I forgot to add Byte Arrays and composite Objects of the all the other types to (my) list of data types.
martinr
Thanks a lot for your detailed answer. I didn't expect that one of my readers would now Nested VM at all (that's why I put in the link), but you seem to have a solid knowledge of its pros and cons. But J2ME is only of limited importance to me, so I can't let it determine large parts of my design, not even decide over the use of Java instead of C / C++.Your comment on memory management is very appreciated, because I totally forgot this important aspect. Right now, I don't see any specific problems arising, but I will keep an eye on it.
Brian Schimmel
I think this is the more general solution...! (I would go with C and NestedVM IF it happened to be meet the needs of my application...)
martinr
OOps you pre-empted me - EDIT: Replace "I think this is the more general solution" with "I think mine is the more general solution." **** But hey, I'm not precious about which designs get used! And perhaps more important is what you are happy with / work best with / can describe to others most easily. Happy computing Brian.
martinr
Key in the quick development of such systems is the good conceptualisation and naming (I suggest you avoid OO-inheritance type concepts in your C-like APIs except where it would be silly not to have inheritance - existing C APIs do this) and what may possibly speed development is abstraction of handles/object pointers in a way that makes it easy to switch between a debug mode where the handle is an index into a table that can be easily checked for validity and a release mode where the handle is a simple memory pointer to a structure.
martinr
@martinr I use pointers for handles; if I want to check whether a pointer/handle is valid, then I keep a set or map which remembers my list of valid, allocated pointer/handle values.
ChrisW
martinr
+2  A: 

Hi, your assumptions seem ok, but i see trouble ahead, much of which you have already spotted in your assumptions. As you said, you can't really export c++ classes and methods, you will need to provide a function based c interface. What ever facade you build around that, it will remain a function based interface at heart.

The basic problem i see with that is that people choose a specific language and its runtime because their way of thinking (functional or object oriented) or the problem they address (web programming, database,...) corresponds to that language in some way or other. A library implemented in c will probably never feel like the libraries they are used to, unless they program in c themselves. Personally, I would always prefer a library that "feels like python" when I use python, and one that feels like java when I do Java EE, even though I know c and c++.

So your effort might be of little actual use (other than your gain in experience), because people will probably want to stick with their mindset, and rather re-implement the functionality than use a library that does the job, but does not fit.

I also fear the desired portability will seriously hamper development. Just think of the infinite build settings needed, and tests for that. I have worked on a project that tried to maintain compatibility for 5 operating systems (all posix-like, but still) and about 10 compilers, the builds were a nightmare to test and maintain.

tabdamage
For the "not fitting to the language": Good point there. But I think this is not the biggest problem. If my lib will be usable in Python, some people will do, even if it feels odd. And if there is enough interest, it should be possible to write a wrapper that makes it feel more like python, and it would be easier then reimplenting - but this would be none of my core concerns.What's about the general difficulty of portability, you're also completely right. I know some of those pains, but this time, I'm willing to take them. Of course, if there's a way to minimize it, I'll take it :)
Brian Schimmel
+4  A: 

Think C, nothing else. C is one of the most popular programming languages. It is widely used on many different software platforms, and there are few computer architectures for which a C compiler does not exist. All popular high-level languages provide an interface to C. That makes your library accessible from almost all platforms in existence. Don't worry too much about providing an Object Oriented interface. Once you have the library done in C, OOP, functional or any other style interface can be created in appropriate client languages. No other systems programming language will give you C's flexibility and potability.

Vijay Mathew
I totally understand that C gives you the most universal interface to other languages. But what is your opinion about programming in C++ but providing a C interface? I know, there are some architectures for which there is a C compiler but not C++, but I think those are generally not suited for my lib. Is there anything why I shouldn't use C++ internally?
Brian Schimmel
There's no reason why you can't write your C library in C++. The only thing that matters to clients of your app is the interface, once past that doorway it could be written in basic! The interface, or 'contract', is all that matters to your users. The internals of your library are none of their business, so write it in C++ (which fortunately allows an easy way to expose C functions, cool)
gbjbaanb
@Brian Schimmel If all your target platforms have C++ compilers, you can use C++. Still keep in mind that, while most systems (especially the Unix variants) has a C compiler installed by default, users may have to download and install the C++ compiler to use your library.
Vijay Mathew
I would love if this were always the case, (I love C), and it usually is if you have access to the raw machine metal, but in a lot of mobile platforms, if you are writing sandbox embedded apps, you must write in Java, or suffer a usually unacceptable performance hit from having a C virtual machine within a Java virtual machine. On some previously C-programmable mobile app sandbox platforms like Symbian C is no longer the base language.
martinr
+1  A: 

Give it an XML interface, whether passed as a parameter and return value or as files through a command-line invocation. This may not seem as direct as a normal function interface, but is the most practical way to access an executable from, e.g., Java.

Joshua Fox
Could you clarify (by explanation or by pointing to some examples) what you mean by an XML interface? Do you mean something like AJAX / SOAP / WebServices, or am I missing the point?
Brian Schimmel
I think he means to stream all data as XML. But that is only a solution for the data part (and an expensive bulky one, both the streamed data as the libraries you need to write and read validatable XML).
Marco van de Voort
I have reasons not to involve XML, and don't think it is generally a good idea, but anyway: your answer provided some valuable insight about command line interfaces for me and made me think about things I forgot before. Thus: +1
Brian Schimmel
@Brian I mean that the Java (or other program) generates an XML file, calls an executable with the input filename as well as an output filename (of a file that does not yet exist) as a command-line argument; the executable writes an XML file with the given output filename. I agree that this is clumsy, but may be better than some alternatives.@Marco Yes, it is bulky. But as always, you should check if this is really the bottleneck, I don't understand what it means to say "solution for the data part". You can invoke behavior on the command
Joshua Fox