views:

806

answers:

7

The log4j network adapter sends events as a serialised java object. I would like to be able to capture this object and deserialise it in a different language (python). Is this possible?

NOTE The network capturing is easy; its just a TCP socket and reading in a stream. The difficulty is the deserialising part

+3  A: 

Generally, no.

The stream format for Java serialization is defined in this document, but you need access to the original class definitions (and a Java runtime to load them into) to turn the stream data back into something approaching the original objects. For example, classes may define writeObject() and readObject() methods to customise their own serialized form.

(edit: lubos hasko suggests having a little java program to deserialize the objects in front of Python, but the problem is that for this to work, your "little java program" needs to load the same versions of all the same classes that it might deserialize. Which is tricky if you're receiving log messages from one app, and really tricky if you're multiplexing more than one log stream. Either way, it's not going to be a little program any more. edit2: I could be wrong here, I don't know what gets serialized. If it's just log4j classes you should be fine. On the other hand, it's possible to log arbitrary exceptions, and if they get put in the stream as well my point stands.)

It would be much easier to customise the log4j network adapter and replace the raw serialization with some more easily-deserialized form (for example you could use XStream to turn the object into an XML representation)

Charles Miller
+1  A: 
lubos hasko
+1  A: 

I would recommend moving to a third-party format (by creating your own log4j adapters etc) that both languages understand and can easily marshal / unmarshal, e.g. XML.

SCdF
JSON would work well too.
Gregg Lind
+1  A: 

Theoretically, it's possible. The Java Serialization, like pretty much everything in Javaland, is standardized. So, you could implement a deserializer according to that standard in Python. However, the Java Serialization format is not designed for cross-language use, the serialization format is closely tied to the way objects are represented inside the JVM. While implementing a JVM in Python is surely a fun exercise, it's probably not what you're looking for (-:

There are other (data) serialization formats that are specifically designed to be language agnostic. They usually work by stripping the data formats down to the bare minimum (number, string, sequence, dictionary and that's it) and thus requiring a bit of work on both ends to represent a rich object as a graph of dumb data structures (and vice versa).

Two examples are JSON (JavaScript Object Notation) and YAML (YAML Ain't Markup Language).

ASN.1 (Abstract Syntax Notation One) is another data serialization format. Instead of dumbing the format down to a point where it can be easily understood, ASN.1 is self-describing, meaning all the information needed to decode a stream is encoded within the stream itself.

And, of course, XML (eXtensible Markup Language), will work too, provided that it is not just used to provide textual representation of a "memory dump" of a Java object, but an actual abstract, language-agnostic encoding.

So, to make a long story short: your best bet is to either try to coerce log4j into logging in one of the above-mentioned formats, replace log4j with something that does that or try to somehow intercept the objects before they are sent over the wire and convert them before leaving Javaland.

Libraries that implement JSON, YAML, ASN.1 and XML are available for both Java and Python (and pretty much every programming language known to man).

Jörg W Mittag
+1  A: 

Well I am not Python expert so I can't comment on how to solve your problem but if you have program in .NET you may use IKVM.NET to deserialize Java objects easily. I have experimented this by creating .NET Client for Log4J log messages written to Socket appender and it worked really well.

I am sorry, if this answer does not make sense here.

jatanp
A: 

Thank you Charles and lubos hasko, really appreciated. Going to make my own log4j adapter.

Hyposaurus
A: 

If you can have a JVM on the receiving side and the class definitions for the serialized data, and you only want to use Python and no other language, then you may use Jython:

  • you would deserialize what you received using the correct Java methods
  • and then you process what you get with you Python code
penpen