Useless Inc. - Sunday, 11 March 2007

Sunday, 11 March 2007

With my project finally nearing completion, it's nigh time for Microsoft to release yet another update to the .NET Compact Framework 2.0. Service Pack 2 ought to bring it to about "beta 2" level.

Check out these gems:

NETCF deadlocks on exit if native callback delegate has been called on native thread (this is one of the few bugs in the CF I haven't encountered. Ironic, considering we make heavy use of native code in our project.)
Access violation marshaling a class with a string field (there is a dent in the nearby wall on account of this one.)
TypeLoadException using generics with NETCF 2.0 (TypeLoadExceptions in general are a lot of fun in the CF.)

Installing multiple locales of same MSI results in multiple instances of NetCF showing up in Add Remove Programs (we've had some complaints regarding this from our client. They'll be mighty pleased to hear this, I'm sure.)

Now don't get me wrong - I think the CF is an impressive platform, or at the very least could've been. I would venture to say that the people on the CF implementation team are probably skilled professionals just doing the best job they can. But I can't forgive Microsoft - as a company - for shipping a half-baked, half-assed product that even at version 2.0 and after two service packs is still riddled with bugs! It boggles the mind that for any but the most hard-core developers, a third-party extension to the .NET CF is practically a necessity because the class library itself is simply inadequate.

As an aside, I seriously doubt we'll chance regression bugs this close to the launch date, so we'll probably stick with SP1 (we've worked around the issues we've encountered anyway.)

Sunday, 11 March 2007 16:40:49 (Jerusalem Standard Time, UTC+02:00)

-
Development | Compact Framework

Saturday, 17 February 2007

.NET Compact Framework, P/Invoke and MissingMethodException

By far one of the most annoying aspects of the .NET Compact Framework is how heavily it relies on P/Invoke to fill in the gaps. The framework itself is missing huge pieces of functionality, and with the lack of C++/CLI for the Compact Framework a developer is often time left with no choice but to hack around the missing functionality with P/Invoke (or COM interop, if you have the patience to muck about with ATL).

The problem is that, on occasion, a P/Invoke call would result in a MissingMethodException (in fact, if you're really unlucky, the type loader will throw the same exception on loading the method actually making the P/Invoke call). Although a lot of the scenarios have been thoroughly ironed out by now and can be resolved through a Google search or some hacking on the developer's part, there is one scario that is esoteric enough that I couldn't find any references to it on the internet: you can get a MissingMethodException when you are out of memory.

We're working on an extremely large (in mobile proportions) and complex project, involving massive amounts of .NET logic combined with a very large, performance-concious and memory-hungry native codebase. We've had to hack around a lot of missing capabilities in the .NET Compact Framework (as well as some bugs and/or shortcomings in other parts of the OS), and one of our native calls would inconsistently throw a MisingMethodException; having reearched the problem for a day or two I was convinced that the problem was an incorrect function prototype for the exported function and added explicit calling convention declarations. This seemed to have resolved and I was content for a couple of days, until the problem resurfaced.

The exception content itself is next-to-useless, and since the same P/Invoke call would work intermittently I was hoping that enabling the loader log might supply some further information. Alas, all the log would provide was the following line:

Failed to find/load [SomeDll.dll] (even in [\Program Files\Somewhere\])

Not good. I then dug up another article on the subject and proceeded to enable the interop log, which provided some additional information:

JIT ERROR FOR PINVOKE METHOD (Managed -> Native):
[pinvokeimpl][preservesig]
int Workarounds::GetSinkWrapper(out IImageSink , IImageEncoder );
int (I4_VAL) GetSinkWrapper(IImageSink& ** (INTF_REF) , IImageEncoder *(INTF_VAL) );

I was originally interested in the bizarre native signature generation (specifically the IImageSink& ** parameter - where'd the reference come from?), but upon reading some valid log files for working methods I was convinced that it's a dead duck. I then set my attention to the JIT message: what has the JIT to do with a native function? I theorized that the JIT is responsible for the native signature generation for native functions and kept working under that assumption. That narrowed the question down to, "what went wrong with the JIT compiler?"

Eventually it occured to me that this might be yet another manifestation of the memory limitation issue (processes under Windows CE 5.0 are limited to a 32MB address space). The P/Invoke call is made fairly late into the application; I added a dummy function to the library which I called on application initialization. This forced the JIT to take care of the native library before anything was sucking up memory, and the issue was resolved.

The moral of the story? If you get MissingMethodExceptions on your P/Invoke calls, fisrt make sure your DLL is actually deployed; then check the DllImport signature (you can find a lot of useful resources on this FAQ). Finally, make sure you're not out of memory.

Saturday, 17 February 2007 16:26:26 (Jerusalem Standard Time, UTC+02:00)

-
Development | Compact Framework

Tuesday, 06 February 2007

Three words: Oh, My, God

I was screwing around with XmlSerializer and manual recompilations (you can enable raw code output from the XmlSerializer) and somehow this happened:

.NET Framework reinstall time, I guess...

Tuesday, 06 February 2007 16:28:41 (Jerusalem Standard Time, UTC+02:00)

-
Development

Tuesday, 30 January 2007

Sick as a dog

Also had some personal issues to take care of, and my family are all out of the country, which means I have a dog to take care of, which means less free time. All of this results in a lower posting frequency, which I intend to remedy in the coming weeks.

In the pipeline (in case you're wondering, this is also a placeholder so I know what's left):

More posts about .NET Compact Framework (memory and string handling, nonmodal dialog boxes, bizarre method overloads in the BCL, issues with CAB generation)
Issues with Windows Media Player Mobile
Some articles I want to link to and discuss
A couple additional articles on using wikis in the workplace (with additional insight on real-world scalability and usability)

Tuesday, 30 January 2007 12:34:53 (Jerusalem Standard Time, UTC+02:00)

-
Personal

Tuesday, 23 January 2007

No More ICQ For Me

I use Trillian for my IM needs, and a few days ago ICQ abruptly stopped working with an "invalid password" error. Attempting to log in using Icq2Go resulted in the same error; logging in on ICQ's site also indicated that the password is wrong. Oddly enough, the security question remained the same but the system would accept none of my answers; furthermore it wouldn't accept any of my e-mails (current or previous) for resetting my password.

I searched the help system. ICQ's answer?

Please note: If you are unable to get a new password using the password assistance system, then you will have to register a new ICQ number.

So basically, my ICQ number since 1999 or so is useless. I'll update my contact information and try to get in touch with my older ICQ contacts, but if you're looking to contact me please don't use ICQ anymore (I'm not going to register a new number, not much of a point with everyone having an MSN account anyway). I'll be changing passwords on all my accounts in the meantime.

Tuesday, 23 January 2007 09:52:44 (Jerusalem Standard Time, UTC+02:00)

-
Personal

Tuesday, 16 January 2007

When did THIS happen?

Update (1 July 2007): The newer versions no longer exhibit this problem (tested on beta 12.0.1183.516). Just make sure you chose "None" for the image borders.

Looking back at my post on Windows Live Writer dating back to August, it seems that I either missed a very significant fact about the product or perhaps something has changed in one of the betas. The short story is that Windows Live Writer sucks at handling images; whenever I embed an image it goes on to severely reduce the image quality:

You be the judge: on the left, the original. On the right, the "improved" version

Not only does WLW upscale my image for some unknown reason, it also does this via an extremely low quality scaler. In the case of simple images such as the one above, this will also result in larger file sizes (this example demonstrates a 400% increase in file size - from under 21KB to a little over 104KB).

This reminds me of the old image pasting problem with Microsoft Outlook, and what kills me is that I can't think of any conceivable reason why anyone would develop this "feature". To add insult to injury, the only feedback link I could find (via this post in the Windows Live Writer blog) is an MSN Groups page that requires registration. The registration process itself asks you about a hundred completely personal questions that Microsoft has absolutely no business asking.

I'll try and get in contact with the WLW team since I do want to keep using the tool, but this particular issue is starting to cost me a lot of time and effort.

Tuesday, 16 January 2007 16:01:29 (Jerusalem Standard Time, UTC+02:00)

-
Software

On HttpWebRequest and Closing Streams

Footnote: This is fast becoming a series of posts on the woes of using the .NET Compact Framework. In fact, I've added a subcategory for this exact purpose; you can access the category and its RSS feed from the Categories list on the right-hand side of the screen.

Here's another one of those "bang your head repeatedly against the wall and look for a sharp object" issues: closing a stream derived from HttpWebResponse.GetResponseStream takes an abnormally long time. In fact, the (blocking) call to Stream.Close takes approximately the same amount of time as it would to read to the end of the stream. I've only encountered one other reference to this in a newsgroup post from 2004, which (according to Todd, the original author of the post) was never actually answered.

Seeing as I was trying to use HttpWebRequest to create a minimalistic download manager for one of our applications, I had to come up with a solution: one option was to close the stream asynchronously and just let the download run its course, wasting valuable memory, CPU time and bandwidth; another was to never close the stream and make do with a resource leak. Not content with these solutions I decided to dig into the framework code with the ever-useful Reflector. The first hurdle was locating the assemblies; they're not trivial to find, but if you look hard enough you can find actually useful versions in your Visual Studio 2005 installation folder, under SmartDevices\SDK\CompactFramework\2.0\v2.0\Debugger\BCL. These are regular .NET Assembly PEs so it's trivial to load them up in Reflector.

Some digging into the BCL sources proved that HttpWebRequest does, in fact, read until the end of the stream when closing; this is the relevant code excerpt, from HttpWebRequest.doClose:

if (!this.m_doneReading)
{
      byte[] buffer1 = new byte[0x100];
      while (this.ReadInternal(buffer1, 0, buffer1.Length, true) != 0)
      {
      }
}

It only started to make sense when I did some reading on HTTP (a protocol I'm not deeply familiar with). Apparently, HTTP 1.1 connections are persistent by default, which means the connection is maintained even after the request is completed, and further requests are served from the same connection. Technically, this means that the KeepAlive property of HttpWebRequest is true by default, and a Connection="Keep-alive" header is added to the HTTP request. I can only surmise that, with persistent connections, the response must be read in full in order to allow future requests (if the response was cut off, some concurrency issues may apply). Unfortunately, setting the KeepAlive property to false did not resolve the issue and the connection was maintained until closed.

Since I could find no way to resolve the problem, I decided to hack around it:

stream.GetType().BaseType
    .GetField( "m_doneReading", BindingFlags.Instance | BindingFlags.NonPublic )
    .SetValue( stream, true );

Screwing around with BCL internals is obviously NOT a recommended practise and it's something I tried very hard to avoid, however this problem had to be resolved in one way or another. I traced the code a bit deeper to see if this would actually have the expected results; apparently HttpWebRequest keeps an internal "reference counter" (actually a bitfield, but same principle) on the connection. According to the implementation for HttpWebRequest.connectionRemoveRef, if the reference counter reaches 0 the private m_connection field of the request becomes null, and the connection is released. I started out by testing this:

Before and after - you can see that the m_connection field becomes null

Unfortunately I have no idea how to test whether ot not the connection is actually disposed (conceptually, if KeepAlive is set to false then the connection should be immediately closed); when I find the time I'll try and track the HTTP session using Wireshark, but currently this horrible hack will just have to do.

Tuesday, 16 January 2007 15:33:43 (Jerusalem Standard Time, UTC+02:00)

-
Development | Compact Framework

Tuesday, 09 January 2007

Unusual Marshalling Behavior in the .NET Compact Framework

I've been hacking away quite a bit at an application that contains large managed and unmanaged portions, and a necessarily complex interop layer in between. It appears that interop marshalling behaves somewhat differently between the full and Compact framework. Here's a somewhat laconic list of the issues I've encountered and how to resolve them:

You may encounter NotSupportedException on calls to Marshall.SizeOf. Although the documentation does not specify this as a possible exception, experience shows that this is a result of wrong marshalling attributes: for example, although trying to marshal a bool to UnmanagedType.I4 makes sense from a C programmer's perspective, it results in the behavior described above. The article describing this is called "Using the MarshalAsAttribute Attribute," but the version in my locally installed copy of MSDN Library (August 2006) does not contain this information.
Annoyingly, the default marshalling, or even an explicit UnmanagedType.Bool, results in corrupt values (probably some minor framework bug). I worked around this by defining the member as int and manually giving it the value 0 or -1.
It's not obvious, but you can't use UnmanagedType.LPArray from within structures - it only works on P/Invoke method declaration parameters. The only way to do this is to manually call Marshal.StructureToPtr and do some pointer arithmetic with IntPtr (annoying, but at least it's safe code).
The marshaller always frees up memory; although this makes a lot of sense from the .NET perspective, it probably means that you'll have to code in some sort of deep copying mechanism in your native code if you want any of that information in your state. This also has performance repercussions you should consider when architecting your interop layer.

I'll post updates to this as I come across more issues.

Tuesday, 09 January 2007 18:34:43 (Jerusalem Standard Time, UTC+02:00)

-
Development | Compact Framework

Sunday, 07 January 2007

Curious

I was writing some code with Visual Studio 2005, and pressed Alt+Keypad 8 (under ReSharper, this moves the current method up/down). Not only did this produce the ASCII code 8 - backspace (control character) or inverse bullet (printable character) - but it also completely screwed up the font on the same line:

Sunday, 07 January 2007 19:45:26 (Jerusalem Standard Time, UTC+02:00)

-
Development

Thursday, 04 January 2007

Catching Data Abort Exceptions

For some reason the Visual Studio 2005 debugger (on my native Smart Device project) flat out refused to halt on Data Abort errors. Instead it would just show the error information on the debug trace and crash the application:

Data Abort: Thread=81e27040 Proc=804c68c0 'Client.exe'
AKY=00000041 PC=00eb1cc0(emnative.dll+0x00001cc0) RA=00eb6d6c(emnative.dll+0x00006d6c) BVA=0e537510 FSR=00000007

For obvious reasons, I wasn't particularly happy with this. Digging into Google and the documentation didn't help; after some serious headscratching I figured that the Windows CE kernel must be catching these exceptions at some point, so from the Debug->Exceptions... menu I enabled, under Win32 Exceptions, catching of thrown Access Violations: