My brother was going on and on about a coding challenge he found as part of the "Lisp as an Alternative to Java" study (the challenge instructions are outlined here, and you can find the sample dictionary and input data by following the study link); eventually I agreed to give it a try.
Spoiler alert: Do not read on if you want to take on this challenge!
Although I had to stop in the middle because I had company, the initial (hacked-together, very inefficient and completely uncommented C# 2.0 code) version took me 1 hour and 20 minutes. While technically correct this version performed abysmally, but it served as a good basis for further optimizations; a second, optimized version took another 30 minutes and with another 20 or so minutes to clean it up, the final version took a grand total of 2 hours and 20 minutes. Total length is 193 lines of code (including comments and white-space), which you can find here.
The crux of the solution is a very simple tree structure to hold the word dictionary. Each node has a 10-entry branch table, each entry representing a phone digit; a node can have an arbitrary number of leaves, each leaf representing a dictionary word. This allows for a very simple (and hopefully efficient) inner search loop:
- Use successive digits as indices for traversal into the tree;
- When a node with leaves is encountered, the leaves (dictionary words) are added to the potential result set. The rest of the input is then recursively encoded;
- If no leaves are encountered for the entire input set, a digit is added to the encoded string and the rest of the input is again encoded. A boolean flag is passed along with the recursion to make sure two consecutive digits are never encoded.
This is basically a prefix tree (or trie). With the provided dictionary and input sets, runtime on my 1.7GHz Pentium M-equipped laptop is approximately three seconds. I haven't properly profiled the application for CPU and memory utilization because, frankly, it's good enough. There's also a lot of leeway for optimizations: the current tree implementation is woefully inefficient, as tree traversal can result in additional memory pressure.