Tomer Gabel's annoying spot on the 'net RSS 2.0
# Monday, 11 February 2013

I just spent a few days migrating our codebase from Scala 2.9.2 to Scala 2.10. A quick search found very little in the way of migration guides and studies, so it seemed appropriate to document my experiences and offer what tips I managed to collect on the way.

General Observations

This process should not be undertaken lightly, and its implementation inevitably depends on your resources and team composition. The group (or person) charged with migrating the codebase should, at the very least, consist of an experienced Scala developer not afraid to get his or her hands dirty.

The migration is not an entirely smooth process - there are plenty of (fortunately fairly minor) breaking changes in the Scala APIs, and since Scala does not yet feature binary backwards compatibility between major versions (2.9→2.10→2.11...), so expect some inevitable library upgrades and the potential complexities inherent in any such update. The specific issues I encountered are documented in their respective sections below.

Unfortunately there is not much I can offer in the way of preparation; these are the obvious steps:

  • Familiarize yourself with Scala 2.10. Make sure you know what you’re getting in exchange for investing time and effort on the migration process and early-bird issues;
  • Work in an isolated branch and pull changes from the master branch often;
  • Commit early, commit often. Try to keep your commits small and well-documented, so you have good revert checkpoints for when things go wrong;
  • Keep the affected code to a minimum. The fewer changes, the easier it will be to isolate problems down the road.

If you’ve read so far and still can’t wait for 2.10.1, I hope the next few sections will save you some time and pain.

Toolchain

To start things off I switched the build to use Scala 2.10. Our project runs on Maven in lieu of SBT (topic for another post), and we use David Bernard’s excellent scala-maven-plugin, which deduces the Scala version from the scala-library dependency.

The convention for the Scala version suffix has changed between 2.9 and 2.10: for example, argot2.9.2 is now argot2.10 - no revision for the 2.10 series. This rendered the suffix macro we employ useless, because annoyingly the Scala librariesthemselves (e.g. scalap or scala-compiler) actually use the full version number, so we ended up needing two macros (a “raw” Scala version and a suffix).

Fortunately, at least as far as Maven is concerned the rest was smooth sailing. I next turned my attention to IntelliJ IDEA - I wanted to make sure an existing workspace can be reused with the migrated codebase, otherwise my entire team will have to undergo the inconvenience of starting from scratch on a clean working copy. Fortunately the process turned out to be quite painless (on IDEA 12.0.3 with Scala plugin build 0.7.121), with the following caveats:

  • Code analysis appears to miss some cases of stricter compilation compared to 2.9.x (see below);
  • Code analysis occasionally identifies relative import statements as erroneous (e.g. import util.Random);
  • Project FSC settings had to be manually changed to use the 2.10 compiler bundle (it remained on the default 2.9.2). This proved to be a moot point, because:
  • The much-touted IDEA external build mode finally works consistently for the first time (and it uses the significantly better SBT compiler)!
  • On the negative side, IDEA does not seem to handle compiler failures (as opposed to compilation errors) gracefully, missing a lot of detail in the output. As I ran into quite a few of these (details below), I ended up doing most of the compilation tests with Maven directly.

Beyond setting up a build job for the new branch, Jenkins posed no issues.
To summarize, from a toolchain standpoint, this was actually a fairly smooth process.

Language/Compiler

I was surprised and somewhat disheartened to find that, after some initial compilation attempts, it became evident that I missed every single mark in my code risk predictions. Where I expected most language-level headaches I encountered none, and seemingly simple and risk-free code ended up taking most of the time working on this process.

First off, the Scala 2.10 compiler is much stricter than the earlier 2.9.x series. It has a few new heuristics that ended up triggering very often on our codebase; some are simply the result of messy code or design, but most are new best practices adopted by the compiler team at Typesafe:

  • “warning: This catches all Throwables. If this is really intended, use `case _ : Throwable` to clear this warning.”
    We wrap-and-throw or swallow a lot of exceptions, and for brevity’s sake often use the shorter catch { case _ => }. A catch block is analogous to a partial function from Throwable, and there are undoubtedly cases where the distinction matters. We’ll clean these up iteratively by qualifying that we’re only catching Exceptions.

  • “non-variable type argument B[T] in type A[B[T]] is unchecked since it is eliminated by erasure”
    The 2.10 compiler is actually quite helpful in warning you when generic code may not behave as expected. This is just one example of such a warning.

  • “error: overloaded method methodName needs result type”
    The 2.10 compiler requires you to specify result types on most (all?) types of method overloading. Not much work here.
  • Case classes can no longer derive from other case classes, which makes a whole lot of sense if you consider the typesystem; 2.9.x had a whole variety of bugs and edge-cases relating to case class inheritance and companion objects, including some unwieldy constructor syntax. 2.10 simply does away with the complexity and potential ambiguity, which will break existing code. Since such class hierarchies are inherently problematic, I see this as an opportunity to refactor them properly; extractor objects were often useful for that task.

  • In one case we ran into stricter cyclic dependency analysis on the part of 2.10 - an object extended a class and passed a member of another object as a constructor parameter. The second object referenced the first, which compiled fine under 2.9.2 but resulted in a cyclic dependency error with 2.10. As the original design was hard to understand I refactored the code to be a bit simpler, which resolved the problem satisfactorily.

  • Manifests are deprecated in favor of TypeTags and the new reflection library built into 2.10 (certainly one if its most anticipated and celebrated features), but are still functional. We haven’t migrated to the new APIs yet, and that process will likely deserve a post all on its own; read this excellent answer on Stack Overflow for an introduction to TypeTags.

Beyond these and the occasional minor syntax change, the 2.10 compiler proved a somewhat difficult beast to tame:

  • Ran into crazily-detailed compiler failures that clearly (though long-windedly) indicated that I need to manually add scala-reflect to the classpath;

  • Missing or conflicting dependencies, typically due to mishandling POMs, resulted in what I ended up dubbing the Spontaneous Compiler Combustion phenomenon: a cryptic compilation failure, complete with what appears to be an AST dump and full-blown debug information. Tracking the occasional familiar type name in the compiler log can be helpful in tracking down the dependency at fault (this was the case in all but one of the occurrences), but the error itself is completely inscrutable.

  • The one case was, unfortunately, a proper compiler bug logged as SI-7109, which has to do with consuming a package-protected (or -private) trait/class from another trait/class with the same accessibility. Jason Zaugg of Typesafe (@retronym) was extremely helpful in analysing the compiler output and producing a reproduction case, which I haven’t been able to do on my own. Until a fix is included and released, I’ve worked around this by temporarily commenting out the problematic qualifiers.

  • Update (2013-02-12): Ran into another compiler bug that causes a runtime ClassFormatError. We've managed to identify, reproduce and work around the problem; see SI-7120 on the Typesafe JIRA.

Lastly, it’s worth noting that the 2.10 compiler is significantly heavier than the older 2.9.x series: it appears to require 50-100% more permgen space, and compile times are up x2 on average and up to x10 on a clean, fresh build. This seems consistent on both my laptop and the build server, which use different CPUs and OS. A quick check (top and jstat -gcutil) showed the compiler process to be single-threaded and CPU-bound, as it consistently utilizes 100% of a single core. GC activity was low to the point of negligible, so it appears the new compiler is actually a step back in terms of compilation throughput. I hope subsequent 2.10.x releases focus primarily on compilation stability and performance.

 

That’s it for today; next post will be up in a day or two is up and focuses on the Scala library, dependencies and miscellanea.

Monday, 11 February 2013 02:57:09 (Jerusalem Standard Time, UTC+02:00)  #    -
Development | Scala
Me!
Send mail to the author(s) Be afraid.
Archive
<2024 March>
SunMonTueWedThuFriSat
252627282912
3456789
10111213141516
17181920212223
24252627282930
31123456
All Content © 2024, Tomer Gabel
Based on the Business theme for dasBlog created by Christoph De Baene (delarou)