Friday, February 22, 2008

Abstract vs. Opaque data types

So, the problem boils down to the ancient abstract vs. opaque data type debate, which was ultimately responsible for the emacs/xemacs split, and which is one major reason why the Lisp/Python/Ruby folks don't understand the Java/C# folks. I'm jotting ideas to clarify the issues in my own mind, and on the off chance that folks may find this useful if I de-lock it.

First, definitions. An abstract data type is a plain old data structure - a dict or list or integer - along with functions to operate on it as a domain object. You can still reach in and twiddle its innards as a normal hash or list though, and as far as the programming language is concerned, it's just a built-in data type. An opaque data type is a special class with methods to manipulate it, and the language either prevents you from accessing the guts (Java or C++) or strongly discourages it (Python _members).

Some languages - like Arc, JavaScript or pure Scheme - provide only abstract data types, and you need to fake opaque types with conventions. Other languages - like Java - provide only opaque types, and the basic concrete data types are special cases of that. The languages I tend to use - like Python - provide both, and it's up to you which is appropriate.

When prototyping, I've almost always found abstract types to be better (or sometimes even concrete types). This is because they don't need to be declared: the interface to an abstract data type is whatever functions you provide to manipulate it, and if you're missing something, you can just use it as a dict or literal. This lets you change things around very quickly, which is incredibly important when you don't really know what you're doing. They also tend to be less code.

In production code, I've found opaque types to be generally better, because they provide additional contractual guarantees that are really important when you're building stuff on top of these classes. You don't want the interface to change with every revision, because it'll break everything you've built on top. Moreover, because the interface is stable (and presumably tested), you can treat the type as a solid black box, which reduces the complexity you need to keep inside your head at once.

Unfortunately, GameClay is currently in that awkward stage where I don't yet know what I'm doing (in the sense of having detailed interface specifications for each object), and yet I need stronger specification guarantees on the base types in order to move forwards. I've sorta got a hybrid architecture now, where classes wrap raw data structures and provide accessors. However, the problem there is that I've gotta remember whether I'm dealing with raw unwrapped structures or wrapped structures with utility methods.

The cleanest solution in terms of remembering stuff is to bite the bullet, convert the raw JSON structure to language objects when read in, and have all accessors return language objects. Then I'd need a method to convert it back to JSON data structures. We can assume that anywhere within the system, once the objects have been constructed, it's all objects, and they all have the appropriate utility methods.

One pitfall I ran into when I thought of doing this yesterday was that the conversion is somewhat lossy. For example, expressions are represented by strings in the props structure, but get parsed into an internal data structure. Printing them back out loses all whitespace and parenthesization. I suppose I could store the initial prop for expressions, and just omit accessors to change parts of the expression (all our data structures are immutable anyway). If it's changed by code, it'll have to be changed all at once. I don't think I have any other places where the representation is lossy.

Another problem is that this is pretty significant code-bloat. This is unfortunate, but I'm not sure it's avoidable. Currently, we're using a decorator to reach inside the props structure and return the appropriate part, but this isn't really correct: it doesn't wrap the props structure with the appropriate class. If we were to make it correct, we'd need conversion logic, and conversion logic is probably simpler when all in one place.

A third downside is that we need certain validation state to properly validate objects, and we need to validate before we can safely convert. The easiest way to do this is probably to pass the state in to the constructor along with the props data structure; the constructor will throw an exception if invalid. This also fixes my uneasiness about having some constructs throw in the constructor (eg. parsing expressions, actions) while others don't throw until validate is called. A downside is that validate can't be called standalone, but this shouldn't be necessary: if the props are invalid, you shouldn't be able to create the object, and if you try to call a setter with an invalid value, it should fail in the setter (since objects are immutable and create new ones, this can just re-use the validation from the constructors; however, we need to save a copy of the validation state so that we can invoke the constructor in mutators).

Arc, offers, and hiccups

It's weird how I forget about this blog for long periods on end. Quick update on what I've done in the 3 weeks since the end of January:

After getting collisions and keybindings to display, I took a break for a week and ported Arc to JavaScript. It came out in early February, and I was initially going to stay away, but it was just so tempting. I needed a break from UI widgets anyway, and an interpreter is just the sort of quick project that can provide some variety.

'Course, that got some attention, which is good for me but bad for the Diffle. In that week, I got 3 job offers - one from Drew at DropBox, one from my friend Doug out in SV, and one from Paul Buchheit at FriendFeed. Drew also put us in touch with the FuzzWich guys, and we had a long e-mail exchange back and forth. Apparently they were working on a game-creation system before they did FuzzWich, and they have some really impressive credentials in game development. No wonder YC didn't go for us. The sphere is apparently a tough nut to crack: they kept running up against problems about making the games playable enough to be impressive yet easy enough to create. Also, apparently my architecture is insane.

That was probably the closest I've come to quitting so far, between the existence of very attractive alternative offers, and the confirmed difficulty (by people with way more experience in this than us) of the problem sphere, and the prospect of rewriting our whole software in Flex. I mentioned the job offers to Mike, and he said to do whatever I felt was best, so I'm not really tied down to Diffle.

But I'm going to stick it out for now. I'm more convinced than I've been in a while about how this is really a prime space to be in, and that the market's heating up. The big question is whether we can execute on it, and I'm not sure we can, but the challenge of trying is kinda fun.

Anyway, I was kinda burnt out for about a week or so after releasing ArcLite (this was when the e-mails were flying back and forth). When I got back to work, I decided I'd move back to the compiler and update it to meet the full requirements, instead of just the skeleton we've been using to prototype the editor UI. I also wanted to correct some mistakes in data representations that have really been hurting us on the JavaScript side - we can simplify the editor widgets a lot by making the game data structure more sensible. This work has been moving along quite well; I initially estimated 2-3 months until full launch when I first set out, but it now looks like we may manage in half that time, barring extraordinary hiccups.

I'll do a separate post with the current technical hiccup that's been stalling me, for the last few hours at least.