|« BA Executive Club? Ridiculous.||Null values for F# classes »|
F# compiler considered too linear
F# compiler considered too linear
In my continuing efforts to make XPO work fully with F#, I found the next problem to deal with: the extremely linear way of thinking of the F# compiler.
Basically, the compiler seems to read each source code file from top to bottom. Generally, things that are defined below the current line can't be referred to in the current line. Apart from the order of lines in source code files, the order of source code files in the project is equally important, since the compiler handles them in precisely that order.
Hint for Visual Studio users: While the order of source code files in the project is represented correctly in the Visual Studio Solution Explorer, it can't be changed from there. Instead, it is necessary to edit the project file and swap source code files around manually. Right-click the project in Visual Studio and select "Unload Project" from the context menu. Right-click the project again and select "Edit <projectname>.fsharpp". When you're done making your changes, use the context menu a third time to select "Reload Project". You will see that the order of files in Solution Explorer has changed according to the changes you made in the project file.
In some schools of though on programming, this linear view of things might seem quite normal, but in .NET it is not. In main-stream .NET programming, it was widely regarded a great innovation when C# took care of the old problems that C and C++ had in that regard (no more header files!!!). Pascal wasn't any better, and many other languages - most of them had some sort of "pre-declaration" feature that had to be used when, for instance, a reference to a certain type needed to be created before the type itself was declared. Nothing like that in C# -- the compiler looks at all the types and namespaces declared somewhere in my current project and figures things out for me. Great, that's how it should be.
In all fairness, in F# there's at least one very obvious reason why the compiler takes that linear approach: type augmentations. They basically mean that depending on the position in code, a class might have a certain member or not. If you're not familiar with the feature, look at this example:
type MyClass() = let answer = 42 let mc1 = new MyClass() // At this point Output can't be found on mc1 //mc1.Output() let output x = printfn "%d" x type MyClass with member x.Output() = output 52 let mc2 = new MyClass() mc2.Output() // Now Output is part of MyClass, so I can even call // it on my "old" instance mc1.Output()
Before I get to my particular use case -- just generally speaking, the ordering requirements introduced by linear compiling seem like a great and quite unnecessary hassle in the vast majority of cases. F# has a very strong type inference system, because it is deemed to be unnecessary for the developer to mark all types explicitly in order to implement strong typing. In the same way, the compiler could automatically find types and namespaces in my current project regardless of their location, and it could detect those cases where types change through augmentation.
The particular case I'm dealing with is that of persistent business class hierarchies. These hierarchies are typically interrelated to the point where one or more networks of classes are formed. As an example, consider modelling a hospital. You'd have a whole bunch of different types of people to store, so you'd have classes for People and Addresses, Employees, which might be Nurses, Doctors and cleaning and housekeeping personnel, Patients with relationships to the Nurses and Doctors, Rooms, Floors and OperatingTheatres which are assigned to Doctors or Teams of Doctors. Visitors, CarParks, the whole Accounting and Booking business... the list is endless. It is quite clear that many of these types have references to many other types, and typically a one-to-many relationship is modelled with a collection property on one end and a simple reference on the other end, so as soon as there's a relationship there, it will result in two classes interrelating.
Sure, not all classes interrelate, so it might be possible, taking a lot of time and great care, to separate the classes into groups that are hopelessly tangled, but have only unidirectional references outside the group. Of course it might make sense in the example above for almost all classes to have a reference to the Hospital type, since that is important if there's ever more than one hospital being handled at once. There might be other such "special", high-level objects that make the grouping approach really complicated. In any case the task of sorting the classes into such groups is extremely tedious and the grouping breaks easily, as soon as any class is changed to include or exclude a property that refers to another class. You might wonder why I'm going on about the grouping thing at all -- well, read on, that's what F# wants me to do.
Have I mentioned that persistent business class hierarchies can be large? Apart from having private fields and public members for each and every piece of data that is associated with the various entities, the classes will typically also contain certain parts of business logic functionality. Depending on the architectural approach that is used, validation logic might live in these classes, as well as a lot of the state handling that many entities need. To mention some numbers, a C# project I've worked on myself -- really just a medium size application -- has 75 persistent classes and a total of 11263 lines of code in these classes.
Now, why am I going on about these interrelated networks of classes? Quite simple: because F# requires me to declare all interrelated classes in one block of code! Yes, that's right. I can't put some of the classes into other files. I can't put them in different namespaces. The only valid syntax to declare interrelated classes in F# is this:
type ClassA() = let foo = new ClassB() and ClassB() = let foo = new ClassA()
As you can see, this uses the "and" keyword to concatenate the two type declarations. This doesn't hold true for classes only, but for all types. One of my first thoughts about this was that it shouldn't be that much of a problem if my application made use of lots of interfaces and dependency injection throughout to remove the need for direct references from one class to another. But in the end this approach only shifts the problem to the interfaces - at a rough count, that class hierarchy from my old project would require me to declare 75 interfaces with 520 properties and around 300 other members. For those declarations, the problem is still the same, and while the volume may be smaller, it's still significant. Plus, of course, it requires my application architecture to work in a very specific way, I need to create all those interfaces for no real reason whatsoever, ... doesn't sound like a very good idea.
In the end I don't think that this problem is entirely particular to my use case. In other class hierarchies, dependencies might typically be somewhat more linear than they are in those hierarchies I've described, but interrelations are still rather common. So here are the important points I want to make:
- For this particular use case, we need a change that allows us to declare interrelated classes separately. There's a very similar problem for namespaces - perhaps not quite as severe, but that's just because there aren't going to be as many namespaces as there are classes. To solve these issues, I guess a "pre-declaration" feature like I described above could do the job (perhaps as an attribute), but what we really need is ... see (2).
- The F# compiler should handle all type resolution matters automatically, independent of declaration order, apart from those cases where order is important due to type augmentation. It is my belief that the compiler could detect such "significant-order" cases automatically, so there shouldn't be a need for any new keywords or decorations to make this work. This intelligent implementation is what I expect from a language compiler in the year 2008, and with the ambitions F# has as a multi-paradigm language, we should expect no less.
As to the interrelated class problem, I am surprised to read that it is such an issue. We have been using F# for over a year now at my company and have only run into what you describe in a few GUI related instances. I would recommend taking another look at generics to help you with the large hierarchy. In a language like F#, polymorphism like .NET's generics often proves superior to runtime polymorphism with class hierarchies.
Consider the mutually recursive type definition:
type stuff = Stuff of things
and things = Things of stuff
Rewrite it like this:
type 'a stuff = Stuff of 'a
type 'a things = Things of 'a
This idiomatic functional approach is sometimes called "untying the recursive knot" and it allows you to split any definitions (type or value) across any boundaries you like (e.g. source files or even DLLs).
You are right - renaming the files actually does change the order in the project! Unbelievable, that's ridiculous. You know, if the order was entirely unimportant, then this wouldn't matter. But seeing how the order is in fact extremely important for all the reasons I've listed above, this is a major issue.
Thanks for your comment. I have to admit though that I don't have the first clue how your suggestion is supposed to work in reality... I wrote this little piece of code to model what you said:
type ClassA<'b when 'b : (new : unit -> 'b)>() =
let foo = new 'b()
let test() =
type ClassB<'a when 'a : (new : unit -> 'a)>() =
let foo = new 'a()
member x.DoSomethingInB() = printfn "Doing something"
let classA = new ClassA<ClassB>();
Now, there are two major problems with it:
- The final line doesn't compile, because ClassB on its own isn't accepted - after all it's ClassB<'a>. So this results in an endless ClassA<ClassB<ClassA<ClassB<.... chain, which I have no idea how to resolve.
- The line foo.DoSomethingInB() doesn't compile, because it makes assumptions about 'b that the compiler can't check. I would have to use a type annotation to make the compiler understand which type(s) I'm expecting, and that in turn would reintroduce the dependency I'm trying to get rid of. I don't see how there's a way around this if I'm ever going to do anything meaningful with the classes I depend on.
Theoretically, your idea is interesting. While we don't support these generic class structures in our library right now, it would be possible to extend without introducing a dependency to F# into our code. But I just don't see how this should work, since especially (2) above seems to be a general problem with the approach. If you know what the solution is, perhaps you could remodel my little sample above so that it works?
I agree that the human can work around it. However, I would question why I want a feature in the language that requires a human to rewrite an otherwise correct type hierarchy when introducing a cycle in the type dependency. This makes it much more difficult to predict the cost of making a change--the cost could vary by more than an order of magnitude depending on whether the change introduces a cycle or not.
What does the F# user gain from the ordering requirement?
Good question. The answer is reliability. The linearity that has been observed here allows this family of languages to enforce a well-defined order of evaluation on the entire program and that is absolutely essential for value-based (rather than class based) languages like F#. The ad-hoc evaluation order that Oliver is advocating is even completely prohibited in many languages from the ML family. The F# compiler will actually let you do this but it emits a warning because this is a dangerous practice.
In fact, if Oliver's last code snippet did compile it would be an infinite loop, with the constructors invoking each other indefinitely. I suspect the desired functionality was to construct two mutually-dependent objects simultaneously but that requires the manipulation of partially constructed objects. If either construction sequence contains side-effects (as is the case here because constructing ClassA invokes DoSomethingInB that prints to the console) then the evaluation order of these statements is undefined and the program becomes non-deterministic.
Having said that, my proposed idiomatic-OCaml solution does not work in F# as Oliver noted. The reason is that OCaml has structurally-typed objects so it can infer classes whereas F# requires class types to be explicitly declared beforehand (just like a header file).
Finally, regarding the Solution Explorer in Visual Studio for F#, I agree completely that the current implementation is dreadful. The F# team are completely rewriting the VS mode and, hopefully, the new version will make it into the CTP this summer. My personal opinion is that the VS mode is by far the weakest point of the current F# distro.
Have you found a solution to this problem?
I've tried asking on HubFS (http://cs.hubfs.net/forums/2/11241/ShowThread.aspx), but the responses were more evangelical than helpful.
I thought I finally found a workaround solution with intrinsic type extensions, but for whatever reason, they recently pulled out the ability to create type extensions outside the file or even the module to which the type belongs (http://blogs.msdn.com/dsyme/archive/2009/05/20/detailed-release-notes-for-the-f-may-2009-ctp-update-and-visual-studio-2010-beta1-releases.aspx) -- which would seem to completely defeat the purpose of the feature.
No, I haven't found anything. But then it's been a while since I last looked at this particular problem, so I may well be missing something at this point. I'll be sure to post to this blog if I find the time to look into it.