Tomer Gabel's annoying spot on the 'net RSS 2.0
# Monday, February 11, 2013

This is the second post of a two-parter. Please read part 1 for context and additional information.

3rd Party Libraries

As mentioned in part 1, Scala is not binary backwards-compatible between major releases. Although code compiled with 2.10 may work properly with libraries compiled against 2.9.x, this is hardly recommended and may result in a lot of subtle error conditions at runtime that are hard to track down. As a result, the recommended practice is to upgrade all Scala dependencies to versions compiled against 2.10. Fortunately the Scala ecosystem is very active, so most common libraries had 2.10 builds on announcement day, but you are still likely to encounter libraries that need to be upgraded because:

  • They're lagging behind the Scala release schedule and simply haven't released 2.10-compatible binaries (fortunately, this seems to be extremely rare; in practice all our dependencies were otherwise resolved);
  • The specific version you're using does not have a 2.10-compatible release and likely never will (Squeryl 0.9.5-2);
  • A 2.10-compatible official release is out but hasn't yet been published to the central repository (Scalatra 2.0.5).

My experience shows that with a bit of dilligence, practically all our (myriad) dependencies were fairly easy to migrate to 2.10. In most cases this simply entails switching to the appropriate artifact ID suffix (_2.10 instead of _2.9.2, see part 1 for associated rant), in other cases I had to upgrade to a later official release and in rare cases I had to dig in and replace a dependency with an appropriate fork or snapshot version. Here is a list of migration notes for some of our dependencies:

LibraryPreviousNew versionComments
Scala-Time0.6nscala-time 0.2 The original library appears to be abandoned and was forked on GitHub.
The package changed from org.scala-tools.time to com.github.nscala-time.time 
ScalaMock2.43.0.1The 3.x series runs natively on Scala 2.10 without proxy factories and the like. Syntax is a bit odd but works like a charm.
The ScalaTest integration is compiled against ScalaTest 2.0-M5b, which forced us to upgrade. 
ScalaTest1.82.0-M5b Minor API changes: SpecFunSpec, parameters changed in Suite.run()
Squeryl 0.9.5-2 0.9.5-6  
Argot 0.4 1.0.0  
Scala I/O 0.4.1-seq 0.4.2  
Lift JSON (+ext)2.4 2.5-M4  
Lift Facebook 2.4 2.5-SNAPSHOT Had to exclude net.liftweb:lift-webkit_2.10 to avoid dependency conflicts
Akka2.0.4 2.1 Scheduling now requires an implicit ExecutionContext. See migration guide for details. 
Scalatra2.0.4 2.0.5-SNAPSHOT Official 2.0.5 is out, but has not yet shown up on the central repository, so I ended up going with the (frozen) 2.0.5 snapshot artifacts
Scalatra 2.1.1 2.2.0 scalatra-akka module has been folded back into the core artifact as of 2.2.0.
Graph for Scala1.5.11.6.1  

 

Scala Library Changes

The Scala class library itself has a number of changes you ought to be aware of before migrating to 2.10. The really big change is that Scala actors are deprecated in favor of Akka. You can still use them by importing the scala-actors artifact from the Scala 2.10 distribution, but it is recommended to migrate fully to the new actor system as this is also likely to be obsoleted by 2.10.1. The gentle folk at Typesafe have provided a very comprehensive migration guide to assist your efforts. 

The less prevasive API changes we ran into include:

  • List.elements is deprecated in favor of List.iterator;
  • TraversableOnce.toIndexedSeq no longer takes a type argument. This was actually quite pervasive in our codebase, causing plenty of compilation errors, and is easily worked around by removing the type parameter (which is extraneous to begin with);
  • Scala actors' Actor.receive method is now public (previously protected). This had to be rectified in pretty much all of our existing actors by removing the protected modifer;
  • Occasional subtle API changes requiring minor code fixes. For example, see this question on StackOverflow.


Summary

Opinions differ among members of our team - some predicted that the migration process will be much more complex whereas personally, given the relatively high level of maturity I've come to depend on in the 2.9 series, the migration process actually ended up being significantly harder than I anticipated. Particularly disconcerting were the occasional compiler failures which took a lot of time to track down. Practically speaking, though, the whole migration process took less than 3 days (including documentation), did not involve additional teammates, and all problems were either resolved or worked around rather quickly. The Typesafe compiler team has been very helpful in analyzing and resolving the single bona-fide compiler bug we've run into, and the community as a whole (on Stack Overflow, Google Groups and elsewhere) was also extremely supportive and helpful.

On the whole, am I happy with the process? So-so. There is nothing here an experienced Scala developer should have serious trouble with, but it will take a lot more stability and predictability for Scala to gain mainstream acceptance in the industry, and that includes a much easier and more robust migration path to upcoming releases (yes, that includes migrating from 2.10 to 2.11 when it comes along). That being said, Scala has been a terrific language to work with in the last couple of years, and the new features in 2.10 (particularly reflection, macros and string interpolation) should make this an extremely worthwhile upgrade. We still have a lot of regression testing to do on the new version, and if anything interesting pops up I'll be sure to post about it separately (or bitch about it on Twitter...)

Monday, February 11, 2013 4:33:11 PM (Jerusalem Standard Time, UTC+02:00)  #    -
Development | Scala

I just spent a few days migrating our codebase from Scala 2.9.2 to Scala 2.10. A quick search found very little in the way of migration guides and studies, so it seemed appropriate to document my experiences and offer what tips I managed to collect on the way.

General Observations

This process should not be undertaken lightly, and its implementation inevitably depends on your resources and team composition. The group (or person) charged with migrating the codebase should, at the very least, consist of an experienced Scala developer not afraid to get his or her hands dirty.

The migration is not an entirely smooth process - there are plenty of (fortunately fairly minor) breaking changes in the Scala APIs, and since Scala does not yet feature binary backwards compatibility between major versions (2.9→2.10→2.11...), so expect some inevitable library upgrades and the potential complexities inherent in any such update. The specific issues I encountered are documented in their respective sections below.

Unfortunately there is not much I can offer in the way of preparation; these are the obvious steps:

  • Familiarize yourself with Scala 2.10. Make sure you know what you’re getting in exchange for investing time and effort on the migration process and early-bird issues;
  • Work in an isolated branch and pull changes from the master branch often;
  • Commit early, commit often. Try to keep your commits small and well-documented, so you have good revert checkpoints for when things go wrong;
  • Keep the affected code to a minimum. The fewer changes, the easier it will be to isolate problems down the road.

If you’ve read so far and still can’t wait for 2.10.1, I hope the next few sections will save you some time and pain.

Toolchain

To start things off I switched the build to use Scala 2.10. Our project runs on Maven in lieu of SBT (topic for another post), and we use David Bernard’s excellent scala-maven-plugin, which deduces the Scala version from the scala-library dependency.

The convention for the Scala version suffix has changed between 2.9 and 2.10: for example, argot2.9.2 is now argot2.10 - no revision for the 2.10 series. This rendered the suffix macro we employ useless, because annoyingly the Scala librariesthemselves (e.g. scalap or scala-compiler) actually use the full version number, so we ended up needing two macros (a “raw” Scala version and a suffix).

Fortunately, at least as far as Maven is concerned the rest was smooth sailing. I next turned my attention to IntelliJ IDEA - I wanted to make sure an existing workspace can be reused with the migrated codebase, otherwise my entire team will have to undergo the inconvenience of starting from scratch on a clean working copy. Fortunately the process turned out to be quite painless (on IDEA 12.0.3 with Scala plugin build 0.7.121), with the following caveats:

  • Code analysis appears to miss some cases of stricter compilation compared to 2.9.x (see below);
  • Code analysis occasionally identifies relative import statements as erroneous (e.g. import util.Random);
  • Project FSC settings had to be manually changed to use the 2.10 compiler bundle (it remained on the default 2.9.2). This proved to be a moot point, because:
  • The much-touted IDEA external build mode finally works consistently for the first time (and it uses the significantly better SBT compiler)!
  • On the negative side, IDEA does not seem to handle compiler failures (as opposed to compilation errors) gracefully, missing a lot of detail in the output. As I ran into quite a few of these (details below), I ended up doing most of the compilation tests with Maven directly.

Beyond setting up a build job for the new branch, Jenkins posed no issues.
To summarize, from a toolchain standpoint, this was actually a fairly smooth process.

Language/Compiler

I was surprised and somewhat disheartened to find that, after some initial compilation attempts, it became evident that I missed every single mark in my code risk predictions. Where I expected most language-level headaches I encountered none, and seemingly simple and risk-free code ended up taking most of the time working on this process.

First off, the Scala 2.10 compiler is much stricter than the earlier 2.9.x series. It has a few new heuristics that ended up triggering very often on our codebase; some are simply the result of messy code or design, but most are new best practices adopted by the compiler team at Typesafe:

  • “warning: This catches all Throwables. If this is really intended, use `case _ : Throwable` to clear this warning.”
    We wrap-and-throw or swallow a lot of exceptions, and for brevity’s sake often use the shorter catch { case _ => }. A catch block is analogous to a partial function from Throwable, and there are undoubtedly cases where the distinction matters. We’ll clean these up iteratively by qualifying that we’re only catching Exceptions.

  • “non-variable type argument B[T] in type A[B[T]] is unchecked since it is eliminated by erasure”
    The 2.10 compiler is actually quite helpful in warning you when generic code may not behave as expected. This is just one example of such a warning.

  • “error: overloaded method methodName needs result type”
    The 2.10 compiler requires you to specify result types on most (all?) types of method overloading. Not much work here.
  • Case classes can no longer derive from other case classes, which makes a whole lot of sense if you consider the typesystem; 2.9.x had a whole variety of bugs and edge-cases relating to case class inheritance and companion objects, including some unwieldy constructor syntax. 2.10 simply does away with the complexity and potential ambiguity, which will break existing code. Since such class hierarchies are inherently problematic, I see this as an opportunity to refactor them properly; extractor objects were often useful for that task.

  • In one case we ran into stricter cyclic dependency analysis on the part of 2.10 - an object extended a class and passed a member of another object as a constructor parameter. The second object referenced the first, which compiled fine under 2.9.2 but resulted in a cyclic dependency error with 2.10. As the original design was hard to understand I refactored the code to be a bit simpler, which resolved the problem satisfactorily.

  • Manifests are deprecated in favor of TypeTags and the new reflection library built into 2.10 (certainly one if its most anticipated and celebrated features), but are still functional. We haven’t migrated to the new APIs yet, and that process will likely deserve a post all on its own; read this excellent answer on Stack Overflow for an introduction to TypeTags.

Beyond these and the occasional minor syntax change, the 2.10 compiler proved a somewhat difficult beast to tame:

  • Ran into crazily-detailed compiler failures that clearly (though long-windedly) indicated that I need to manually add scala-reflect to the classpath;

  • Missing or conflicting dependencies, typically due to mishandling POMs, resulted in what I ended up dubbing the Spontaneous Compiler Combustion phenomenon: a cryptic compilation failure, complete with what appears to be an AST dump and full-blown debug information. Tracking the occasional familiar type name in the compiler log can be helpful in tracking down the dependency at fault (this was the case in all but one of the occurrences), but the error itself is completely inscrutable.

  • The one case was, unfortunately, a proper compiler bug logged as SI-7109, which has to do with consuming a package-protected (or -private) trait/class from another trait/class with the same accessibility. Jason Zaugg of Typesafe (@retronym) was extremely helpful in analysing the compiler output and producing a reproduction case, which I haven’t been able to do on my own. Until a fix is included and released, I’ve worked around this by temporarily commenting out the problematic qualifiers.

  • Update (2013-02-12): Ran into another compiler bug that causes a runtime ClassFormatError. We've managed to identify, reproduce and work around the problem; see SI-7120 on the Typesafe JIRA.

Lastly, it’s worth noting that the 2.10 compiler is significantly heavier than the older 2.9.x series: it appears to require 50-100% more permgen space, and compile times are up x2 on average and up to x10 on a clean, fresh build. This seems consistent on both my laptop and the build server, which use different CPUs and OS. A quick check (top and jstat -gcutil) showed the compiler process to be single-threaded and CPU-bound, as it consistently utilizes 100% of a single core. GC activity was low to the point of negligible, so it appears the new compiler is actually a step back in terms of compilation throughput. I hope subsequent 2.10.x releases focus primarily on compilation stability and performance.

 

That’s it for today; next post will be up in a day or two is up and focuses on the Scala library, dependencies and miscellanea.

Monday, February 11, 2013 2:57:09 AM (Jerusalem Standard Time, UTC+02:00)  #    -
Development | Scala
Me!
Send mail to the author(s) Be afraid.
Archive
<December 2024>
SunMonTueWedThuFriSat
24252627282930
1234567
891011121314
15161718192021
22232425262728
2930311234
All Content © 2024, Tomer Gabel
Based on the Business theme for dasBlog created by Christoph De Baene (delarou)