Tomer Gabel's annoying spot on the 'net RSS 2.0
# Sunday, 24 February 2008

Among a lot of other tools and technologies we use Lucene here at Delver. Lucene is a very mature, high-performance and full-featured information retrieval library, which in simple terms means: it allows you to add search functionality to your applications with ease. I could spend an entire day talking about Lucene, but instead I'd like to show you how to write scripts for Lucene with Python on Windows (and thus differs from Sujit Pal's excellent instructions on the same subject matter).

First, make sure you have your Python environment all set up -- at the time of writing this I use the Python 2.5.1 Windows installer. Next you'll need Python bindings for the library, so go and grab a copy of PyLucene; to make a long story short I suggest you use the JCC version for Windows which is downloadable here. Grab it and install it, no fussing necessary. Finally you'll need the Java VM DLL in your path, and this depends on which type of JRE you're using:

  • If you're using the Java JDK, add the mental equivalent of C:\Program Files\Java\jdk1.6.0_03\jre\bin\client to your path;
  • If you're using a JRE, add C:\Program Files\Java\jre1.6.0_03\bin\client to your path.

(As an aside, you should probably be using %JAVA_HOME% and %JRE_HOME% and indirecting through those.)

Now you can quickly whip down scripts in Python, like this one which took about two minutes to write:

#!/usr/bin/python
#
# extract.py -- Extracts term from an index
# Tomer Gabel, Delver, 2008
#
# Usage: extract.py <field_name> <index_url>
#

import sys
import string
import lucene
from lucene import IndexReader, StandardAnalyzer, FSDirectory

def usage():
	print "Usage:\n"
	print sys.argv[ 0 ] + " <field_name> <index_url>"
	sys.exit( -1 )

def main():
	if ( len( sys.argv ) < 3 ):
		usage()

	lucene.initVM( lucene.CLASSPATH )

	term = sys.argv[ 1 ]
	index_location = sys.argv[ 2 ]
	
	reader = IndexReader.open( FSDirectory.getDirectory( index_location, False ) )
	try:
		for i in range( reader.maxDoc() ):
			if ( not reader.isDeleted( i ) ):
				doc = reader.document( i )
				print doc.get( term )
	finally:
		reader.close()

main()

Note the VM initialization line at the beginning -- for the JCC version of PyLucene this is a must, but even like so using the library is extremely painless. Kudos to the developers responsible!

Sunday, 24 February 2008 12:26:41 (Jerusalem Standard Time, UTC+02:00)  #    -
Development
# Thursday, 21 February 2008

I just encountered a really weird issue with Windows Vista, where an external Western Digital hard drive (an older My Book 250GB) would show up as a mass storage device, but was not allocated a drive letter and was basically inaccessible. The weirdest thing is that the USB "Safely remove USB Mass Storage Device" icon did show, except with no drive letter.

Anyway the way to deal with it was to fire up the Device Manager (Start->type in Device Manager) and double-click the external hard drive:

externalstorage_devmgr

Then go to the Volumes tab and click on Populate:

externalstorage_volumes

If the volumes show up, you're good to go.

Thursday, 21 February 2008 15:55:13 (Jerusalem Standard Time, UTC+02:00)  #    -
Software
# Monday, 21 January 2008

Comments are working again. I no longer have the time to muck about with applications (blogging or otherwise), and so I'm going to move this blog into a hosted blog service. If you can recommend one that doesn't suck (and preferably supports code highlighting, user extensions etc.) I'd be delighted to hear :-)

Monday, 21 January 2008 18:39:49 (Jerusalem Standard Time, UTC+02:00)  #    -
Personal
# Wednesday, 16 January 2008

I take pride in being one of the few people I know who actually buy their media: I have a sizable collection of CDs, DVDs, computer games and software that I've bought over the years, and I always feel good about having paid the people responsible for these efforts.

Until recently, that is.

It is commonly said that one of the most obvious traits of Israelis is that they hate to be screwed, and this is as true for me as it is for everyone else. It seems the media companies have taken upon themselves to screw me in every conceivable way, and paying for media is fast becoming an exercise in frustration for me. A most recent example of this is Valve's not-so-new-and-shiny content delivery network which goes by the name of Steam. I don't even know where to begin recounting what's wrong with this thing:

  1. Content delivery speeds are abysmal. I recently downloaded Half Life 2 Episode Two and got 200K/sec maximum transfer rate (more common rates hovering around 50K/sec) on a dedicated line with 5Mb downstream. I consistently get 300K+ rates to even the most busy content delivery servers (Akamai, Microsoft etc.) and it's not like I can use a download manager to better tune the download to my connection.
  2. The download manager is shit. Even ignoring the fact that the only controls it exposes are "pause" and "resume" doesn't help the fact that the error detection code is buggy as all hell: the first time I tried downloading the game it got stuck on 99% without any type of diagnostic or error message, and wouldn't resume. Reading piles of angry forum threads led me to the conclusion that the downloaded content files are simply corrupt; deleting and re-downloading the game solved the problem.
  3. Terminology is all screwed up: telling the game manager not to automatically download updates for a certain game will pause any pending download for that game, including the game content itself.
  4. Although there is no apparent reason for this, playing a game pauses the downloads for all other games. That, at least, has been my observation (Episode Two was downloading when I started on Episode One, and hasn't progressed a single per cent when I quit the game).
  5. The application itself is completely opaque. At no point does it give any indication of what it's doing; you can start the client, nothing happens for two minutes until it finally shows you an "updating Steam client" window. There are no visible clues when it's attempting to access a server (e.g. when clicking on Show News) or when a downloaded upgrade is being installed.
  6. I don't want to connect to a server to play a locally installed, legally bought game. That's just unforgivable, even if it didn't mean I sometimes have to wait for several minutes before the server actually logs me in instead of timing out.
  7. It might shock you, but I still play old games. Sometimes very old games (think Master of Magic). Will Half Life 2 be playable in five- or ten-year's time when the Steam servers have long been cold? I doubt it.

I know Steam probably works well for a lot of people, but for me it's a god-damned affront: I'm a paying customer, there's no reason why I should have so little control over a game that takes up gigabytes on my hard drive. To add insult to injury, the pirated versions often work better: the pirated version of Half Life 2 itself had considerably lower loading times, didn't suffer from the audio stuttering issues that plagued the original, and didn't waste hours of your CPU time on decrypting the game content once it was finally downloaded. If Valve wants to keep my business, here's what they should do:

  1. Switch to an open distribution model (HTTP or, preferably, BitTorrent) so I can use my own software to download their games if I so wish;
  2. Get rid of the dependency on Steam for their games. When I click on the HL2E2 icon I want the game to come up, and I don't give a rat's ass about Steam;
  3. Move to an asynchronous, transparent update mechanism for their games, preferably one that allows me to download game updates and install them on my own.

With the original versions becoming increasingly irritating and pirated versions becoming better than the originals (not to mention less costly), does paying for media still make sense? Remember, that's just one example, I could give a great many more.

Wednesday, 16 January 2008 12:24:04 (Jerusalem Standard Time, UTC+02:00)  #    -
Gaming | Personal | Software
# Sunday, 06 January 2008

The Java implementation for generics is radically different from the C# equivalent; I won't reiterate issues that have been thoroughly discussed before, but suffice to say that Java generics are implemented as a backwards-compatible compiler extension that works on unmodified VMs.

The implications of this are considerable, and I'd like to present one of them. Lets fast forward a bit and consider a relatively new language feature in Java (introduced, I believe, with J2SE 5.0): autoboxing. A thoroughly overdue language feature, autoboxing allows the seamless transition from regular value types (e.g. the ubiquitous int) to object references (Integer); before autoboxing you couldn't simply add a value to an untyped ArrayList, you had to box (wrap) it in a reference type:

ArrayList list = new ArrayList();
list.add( 3 );			// Compile-time error
list.add( new Integer( 3 ) );	// OK

Eventually Java caught up with C# (which introduced autoboxing in 2002), and with a modern compiler the above code would be valid.

With the introductions out of the way, here's a pop-quiz: what does the following code print?

HashMap<Integer, Integer> map = new HashMap<Integer, Integer>();
map.put( 4, 2 );
short n = 4;
System.out.println( Integer.toString( map.get( n ) ) );

As a long-time C# programmer I was completely befuddled when the code resulted in a NullPointerException. Huh? Exception? What? Why?

It took me a while to figure it out: because Java generics are compile-time constructs and are not directly supported by the VM, what actually happens is that the underlying container class accepts regular Object instances (reference types); the compile-time check merely asserts that n can be promoted from short to int, whereas the actual object passed to the container class (via autoboxing) is a Short! Since the container doesn't doesn't actually have a runtime generic type per se, the collection merely looks up the reference object in the map, fails to find it (I guess the Object.hashCode implementation for value types simply returns the reference value as the hash code as in C#) and returns null. Doh! *slaps forehead*

Sunday, 06 January 2008 19:15:54 (Jerusalem Standard Time, UTC+02:00)  #    -
Development | Java
# Wednesday, 02 January 2008

A few ReSharper tips for your viewing pleasure (according to rumors it can extend your penis too!):

Alt+Shift+L locates the current file in the Solution Explorer (at least with the IDEA-style keyboard bindings we all know and love). This is particularly useful for extremely file-heavy solutions where you often want to Get Latest (or Undo Check Out) the current file.

Note: you can get this behavior automatically through Tools->Options->Projects and Solutions->Track Active Item in Solution Explorer, but in my opinion it's annoying as hell.

Easier unit testing: you can bind shortcuts to run or debug unit tests by context. Context means that the behavior is exactly as expected: using the shortcut with the cursor on a test will run just that test, using it on a class will run it as a whole fixture, and using it on a directory in the Solution Explorer will run all tests that reside in that directory. This is easily accomplished by going to the Tools->Options->Keyboard menu and binding:

  • ReSharper.ReSharper_UnitTest_ContextDebug to whichever shortcut you wish for debugging purposes (my favourite is Ctrl+T, Ctrl+D)
  • ReSharper.ReSharper_UnitTest_ContextRun to whichever shortcut you'd like to use in order to just run tests (my favourite is Ctrl+T, Ctrl+T)

Note: You can get the same behavior with the ubiquitous TestDriven.net, but it makes no sense to me to pay twice for the same functionality.

Wednesday, 02 January 2008 18:07:15 (Jerusalem Standard Time, UTC+02:00)  #    -
Development

Bits and pieces:

  • Apparently everything is closed on Christmas day, which I suppose is obvious to you unless you live in a predominantly Jewish or Islamic country in which Christmas isn't really celebrated.
  • Spamalot is brilliant.
  • Manchester is a really cool city with terrific pubs and shopping districts.
  • The dominant ethnic group in London is not, in fact, cockney brits, but rather Indians (i.e. immigrants from India).
  • I'll post some pictures as soon as I upload them to Flickr.

Bottom line: As promised. Would buy again.

Wednesday, 02 January 2008 18:06:56 (Jerusalem Standard Time, UTC+02:00)  #    -
Personal
# Monday, 24 December 2007

If you use Executor (a freeware launcher utility), check this out. There's even a video showing it in action!

Monday, 24 December 2007 12:39:40 (Jerusalem Standard Time, UTC+02:00)  #    -
Software
# Sunday, 23 December 2007

I'm flying to England tomorrow for a week's vacation, which will hopefully give me a bunch of ideas what to write about (it's quite difficult for me not to focus on my current area of work, which I doubt would be of much interest to readers...)

If you happen to be in London or Manchester some time within the next week, get in touch and maybe we'll get together for a beer :-)

Sunday, 23 December 2007 11:30:05 (Jerusalem Standard Time, UTC+02:00)  #    -
Personal
# Sunday, 09 December 2007

Quick link: download

In my work at Semingo I often encounter situations where it's impossible to unit- or integration-test a component without accessing the web. This happens in one of two cases: either the component itself is web-centric and makes no sense in any other context, or I simply require an actual web server to test the components against.

Since I firmly believe that tests should be self-contained and rely on external resources as little as possible, a belief which also extends to integration tests, I wrote a quick-and-dirty pluggable web server based on the .NET HttpListener class. The unit-tests for the class itself serve best to demonstrate how it's used; for instance, the built-in HttpNotFoundHandler returns 404 on all requests:

    [Test]
    [ExpectedException( typeof( WebException ) )]
    [Description( "Instantiates an HTTP server that returns 404 on all " +
"
requests, and validates that behavior." )] public void VerifyThatHttpNotFoundHandlerBehavesAsExpected() { using ( LightweightWebServer webserver =
new LightweightWebServer( LightweightWebServer.HttpNotFoundHandler ) ) { WebRequest.Create( webserver.Uri ).GetResponse().Close(); } }
The web server randomizes a listener port (in the range of 40000-41000, although that is easily configurable) and exposes its own URI via the LightweightWebServer.Uri property. By implementing IDisposable the scope in which the server operates is easily defined. Exceptions thrown from within the handler are forwarded to the caller when the server is disposed:
    [Test]
    [ExpectedException( typeof( AssertionException ) )]
    public void VerifyThatExceptionsAreForwardedToTestMethod()
    {
        using ( LightweightWebServer webserver = new LightweightWebServer(
            delegate { Assert.Fail( "Works!" ); } ) )
        {
            WebRequest.Create( webserver.Uri ).GetResponse().Close();
        }
    }
The handlers themselves receive an HttpListenerContext, from which both request and response objects are accessible. This makes anything from asserting on query parameters to serving content trivial:
    [Test]
    public void VerifyThatContentHandlerReturnsValidContent()
    {
        string content = "The quick brown fox jumps over the lazy dog";

        using ( LightweightWebServer webserver = new LightweightWebServer(
            delegate( HttpListenerContext context )
                {
                    using ( StreamWriter sw = new StreamWriter( context.Response.OutputStream ) )
                        sw.Write( content );
                } ) )
        {
            string returned;
            using ( WebResponse resp = WebRequest.Create( webserver.Uri ).GetResponse() )
                returned = new StreamReader( resp.GetResponseStream() ).ReadToEnd();

            Assert.AreEqual( content, returned );
        }
    }

We use this class internally to mock anything from web services to proxy servers. You can grab the class sources here -- it's distributed under a Creative Commons Public Domain license, so you can basically do anything you want with it. If it's useful to anyone, I'd love to hear comments and suggestions!

Sunday, 09 December 2007 23:18:06 (Jerusalem Standard Time, UTC+02:00)  #    -
Development
Me!
Send mail to the author(s) Be afraid.
Archive
<2008 February>
SunMonTueWedThuFriSat
272829303112
3456789
10111213141516
17181920212223
2425262728291
2345678
All Content © 2024, Tomer Gabel
Based on the Business theme for dasBlog created by Christoph De Baene (delarou)