2011-08-25

scala instead of perl

I've used perl for years as a "better shell script", and found it perfectly natural and easy to express the things I've asked of it.
The other day I did something that seemed perfectly natural to me, I passed a list to a function that returned another list.
I expected this to work, as I would in any language.
I was testing each function separately and everything seemed fine.
So I started testing the script as a whole; that's when strange things started happening.
Lists starting containing too many values, and of the wrong kind.
So, I did some googling without really knowing what kind of problem it was; assuming the whole time that I'd written some simple semantic error or something.
What I didn't expect though was the real problem :
Perl doesn't support lists properly.

You can't have a list of lists, you can't pass a list to a function or return a list from a function.
You can have a list of list references however, or pass list references to a function, or return a list reference.
So this whole time, I'd just happened to be passing scalars around, thinking perl was a pretty straightforward language. The reality is that it isn't actually much higher level than C.

It was originally designed late eighties to early nineties, so it makes sense that the OO and functional aspects aren't as well thought out as newer languages. I guess the idea of collections being fundamental wasn't that big at the time either.

I've recently had a chance to use ruby again, and this time I've actually enjoyed it. Especially the meta programming aspect. As ruby was designed a fair bit later, it feels a lot more modern; and lists work as I expect !

I think from now on, (assuming I got a choice), I'll be using perl _only_ for very simple shell scripts, or not at all; and using more competent languages (like ruby) for anything of substance.

For this specific task, I've reimplemented it as a Scala script; and in doing so I've fixed a bunch of bugs in my original perl code.
It's also made the code run a bit faster, as well as made the code easier to change.
Scala's regex and system exec libraries aren't as succinct as perl's; but you'd have to make them first class features of the language (as perl has) to reach that goal.
It's still run as a script, so it won't be any different from the user's point of view.

learning breadth vs depth

This is the list of languages that I learned and used at university, when studying Computer Science
  1. C
  2. C++
  3. Java
  4. Python
  5. Perl
  6. Common Lisp
  7. Prolog
  8. Shell scripting
  9. motorola 68k assembler
  10. MIPS assembler
This is the list of languages that I learned and/or used at work, when developing corporate software
  1. Java
  2. Perl
  3. C++
  4. C
  5. Ruby
  6. BPEL ? (this doesn't really count)
The above lists are ordered by frequency of use.
Now it may look like I learned a hell of a lot more languages at uni than I have at work; But really we only had time to learn the surface of those languages and just enough to get the subject projects completed.
At work we may only use a small number of languages, but I've had to study them in greater depth. In addition, I've had to learn a lot of different frameworks and tools (especially in Java) to learn.

It's a matter of :
  • breadth vs depth
  • core-language vs tools and frameworks.

Is it a framework or a library?

Is it a framework or a library?
Paul Chiusano expresses a very similar sentiment in push-libraries-vs-pull-libraries

correct or configurable?

I would rather work on software that changes easily and does the correct thing, than the kind of overly configurable software described in http://www.thoughtclusters.com/2009/09/software-analysis-paralysis/ also http://www.thoughtclusters.com/2007/08/hard-coding-and-soft-coding/

I've worked on projects where the software was incredibly configurable. One was able to support over 11 different versions of over 30 different interfaces to external systems. You could reconfigure the core behaviour to a ridiculous degree.
The system was so configurable that finding a working configuration became a problem in development, and getting your hands on the actual correct configuration was practically impossible.
The users had the one gold configuration that they modified for each new version; development did the same.
Creating a complete configuration from scratch would have been impossible.
The way it was managed, and updated, the config was actually treated as part of the code base. It's just in a different language (external DSL). So the only benefit was being able to change parts in a live system, without compiling. Using an interpreted language would have the same effect.
So maybe this type of system should really be built from two languages, a compiled language for the performance critical sections, and an interpreted language for the more dynamic behaviours of the application.
The dynamic behaviour would not be something the average user would modify.
A third part of the application is the user defined configuration.

Games based on the Unreal engine (one of the most used 3D games engines), almost exactly follow the above layering scheme:
  • The core engine is written in C++.
  • The content of the game (and mods) is written in UnrealScript. (Which is compiled in this case rather than my proposed interpreted language).
  • The user configuration is stored in key/value text property files.
If the script layer was interpreted, then users would be able to change the game rules themselves; so it's understandable that they've gone with compiled.

Having this kind of layering would really help with the average Java programmers obsession with being able to make changes without compiling. This obsession is part of the reason that there are so many frameworks that require tonnes of xml config. To me it all stems from the fact that Java is compiled, and for most purposes the dynamic features require too much overhead to bother with.
On the JVM, Java could still be used for the compiled stuff, with groovy, JRuby or Jython being used for the middle layer, finally xml and property files for the config layer. The key is that the engine and first and middle layers should be able to call both ways, and should both have access to the config layer.
If reloading this middle layer at runtime is required, I'm sure it's possible in most interpreted languages. I've done something recently in ruby which did exactly that.

Another option would be to use a language such as Scala or a lisp, which can operate at all 3 layers.
Compiled statically typed code, interpreted script possibly in an internal DSL, and simple DSL for storing user config.

With this kind of architecture, it would be possible to have different base configurations available for different individual users, while having different middle layers for different classes of users.

hacker howto

http://catb.org/~esr/faqs/hacker-howto.html

Teach Yourself Programming in Ten Years