Ryan Harrison My blog, portfolio and technology related ramblings

Java - Serialization Constructors

It is a common misconception that classes which implement the Serializable interface must also declare a constructor which takes no arguments.

When deserialization is taking place, the process does not actually use the object’s constructor itself. The object is instantiated without a constructor and is then initialised using the serialized instance data.

The only requirement on the constructor for a class that implements Serializable is that the first non-serializable superclass in its inheritance hierarchy must have a no-argument constructor. This is because when you serialize an object, the serialization process chains it’s way up the inheritance hierarchy of the class - saving the instance data of each Serializable type it finds along the way. When a class is found that does not implement Serializable, the serialization process halts.

Then when deserialization is taking place, the state of this first non-serializable superclass cannot be restored from the data stream, but is instead initialised by invoking that class’ no-argument constructor. The rest of the instance data of all the Serializable subclasses can then be restored from the stream.

For example this class which does not provide a no-arguments constructor:

public class Foo implements Serializable {  
	public Foo(Bar bar) {  
		...  
	}  
	...
	...  
}  

Although the class itself does not itself declare a no-arguments constructor, the class is still able to be serialized. This is because the first non-serializable superclass of this class, which in this case is Object, provides a no-arguments constructor which can be used to initialize the subclass during deserialization.

If however Foo extended from a Baz class which did not implement Serializable and did not declare a no-arguments constructor:

public class Baz {  
	public Baz(Bar bar) {  
	   ...
	}  
	...
}
public class Foo implements Serializable {  
	...
	...
}  

In this case a NotSerializableException would be thrown during the deserialization process as the state of the Baz class cannot be restored through the use of a no-arguments constructor. Because the instance data of the superclass Baz could not be restored, the subclass also cannot be properly initialised - so the deserialization process cannot complete.

Read More

Java Regression Library - Regression Models

Part 1 - Regression Models

In this tutorial series we’ll be going over how to create a simple Regression Analysis library in Java. If you have any prior knowledge of regression analysis you will probably know that this is a very large field with a great many applications. In this tutorial series we won’t be covering any massively advanced techniques. Our final library will be able to produce the same results as you would find in Microsoft Excel (excluding the graph plotting), which in most basic circumstances will be plenty enough to get you some good results.

Prerequisites -

It’s best if you start this series with a sound knowledge of OOP (object-oriented programming) practices in Java as this series will include the use of abstract classes and polymorphism. You will also need a good knowledge of some of the more basic concepts in Java such as looping, methods and variables. I will do my best to explain the code as much as I can but it is advisable that you have some prior knowledge.

As this tutorial series will of course focus on mathematical concepts as regression analysis is a mathematical technique you will need a sound knowledge of algebra and graphs. I will again do my best to explain all of the concepts as much as possible to cater for beginners, people who have a basic algebra or statistics course under their belts will find things a lot easier.

What is Regression Analysis?

So enough of all the introductions lets get straight in! If you haven’t heard of regression analysis before you are probably already asking what is it and why is it useful? From the Wikipedia article on regression analysis:

“a statistical process for estimating the relationships among variables. It includes many techniques for modelling and analysing several variables, when the focus is on the relationship between a dependent variable and one or more independent variables. More specifically, regression analysis helps one understand how the typical value of the dependent variable changes when any one of the independent variables is varied, while the other independent variables are held fixed.”

Well that hasn’t really helped much now has it? It is much simpler to understand if you think about the variables as the X and Y coordinates on a graph.

Consider the case where you have a simple scatter plot diagram. You have a set of X and Y coordinates that are plotted on a graph with two axis’ - the x and y. For example this graph where the data runs up until an X value of 11. Say these values are from a particular stock on the stock exchange (regression analysis has a lot of applications in stocks and shares). The X values represent each a month in the year and the respective Y coordinates are the average price of the stock in that particular month. From the graph plot we can see that the price of shares is steadily increasing but we don’t possess any data for the 12th month. Is the price going to increase or decrease in December? How can we find out? For market traders this is very important information that can make them or lose them millions. The answer - regression analysis!

Scatter Plot

So we have data up to November and we want to find out what the Y value is when X is 12. The trouble is its not December yet so we don’t know what it is. We need a forecast model. Lets revisit the situation. We have an X value and we need the Y value. Hopefully this is ringing some bells. It sounds an awful lot like a good use of an function such as Y = aX + b (or it could be any other function). We can insert an X value of 12 and we get back the corresponding Y value which is the average stock price for December. Sounds great but we have a problem. We don’t know the variables a and b! The function could have any intercept and gradient. We currently don’t have a clue. We could make one up but someone like a market trader doesn’t want to risk their money on a made up value. We need a way to find the values of a and b which when put into the function will give us back an accurate value for the price in December.

Armed with that knowledge lets go back to the Wikipedia definition. ‘estimating the relationships among variables’ - this kind of makes more sense now. As X increases what does Y do? This is called the relationship between the two variables. If the Y values are increasing a lot as X increases, our forecast should reflect this relationship. We now need to label X and Y in more formal terms.

Y is the dependent variable. It depends on the values of the other independent variables and parameters a, X and b to give it a value.

We can now again go back to the Wikipedia definition. ‘helps one understand how the typical value of the dependent variable changes when any one of the independent variables is varied, while the other independent variables or parameters are held fixed.’ Again this makes more sense now. We want to analyse how the dependent variable Y changes as the independent X value is varied and the other parameters a and b are kept fixed. This is most often done through an function such as Y = aX + b.

So essentially we want to find some function that best fits the data points that we have for the other months. The function models the relationship between X and Y. Once we have this function we can plug in X values and get the Y values that follow the relationship. This has many uses!

Lets go back to our example. We want to find the forecast of the stock price in December. We therefore need to find some function that relates the month to the price. This is regression analysis in its simplest form. Things get harder when we have to figure out what function is best to use to model the relationship (is it a linear line, an exponential line etc) and how can we find out how good our model is at describing the relationship, but we will move onto that in later parts of this series.

The most basic form of regression analysis is linear regression - that is finding a linear function that best models the relationship between the two variables. The base linear line function is Y = aX + b from earlier. We want to find the price Y and X is the month. We need to find the best values for a and b that produce a line that follows our current data as much as possible. If the line is accurate, we can use it to forecast other months. Our function becomes PRICE = a * MONTH + b. A huge part of regression analysis is finding the best values of a and b that produce a line that closely models our current data set.

Read More

C# - Re-throwing Exceptions

In C# it is common that exceptions are re-thrown after some logging has taken place, or perhaps even to alter the exception information to be more user friendly. However there are two different ways of re-throwing exceptions in C# and care needs to be taken when doing so, as one method will loose the stack trace - making things a lot harder to debug.

Consider the following code:

  
using System;

namespace ScratchPad  
{  
    internal class Throwing  
    {  
        // Some method which can throw an exception  
        static void DoSomething()  
        {  
            throw new Exception("Something bad happened");  
        }

        // Another method which tries calling the other method, catching an exceptions it may throw  
        static void Run()  
        {  
            try  
            {  
                Console.WriteLine("Trying to run DoSomething()");  
                DoSomething();  
            }  
            catch (Exception e)  
            {  
                // We caught the exception. Typically some logging is taken place here  
                Console.WriteLine("We caught an exception in DoSomething(). Message = " + e.Message);  
                Console.WriteLine("Re throwing the exception after logging");

                // We then re-throw the exception  
                throw e;  
            }  
        }

        static void Main()  
        {  
            try  
            {  
                // Run the method  
                Run();  
            }  
            catch (Exception e)  
            {  
                // We catch the exception here and examine its stack trace  
                Console.WriteLine("We caught the exception. Stack trace is");  
                Console.WriteLine();  
                Console.WriteLine(e.StackTrace);  
            }

            Console.ReadLine();  
        }  
    }  
}  

This is pretty self-explanatory. We have a method that runs another method and catches any exceptions it may throw (in this case one will be thrown every time). In the catch block we examine the exception, perhaps do some logging and re-throw the exception for the caller to handle. Finally in Main the re-thrown exception is caught again and the stack trace is examined.

At first look there is nothing wrong with this code, it’s all pretty commonplace, nothing much to see here. However this is the output that we get:

  
Trying to run DoSomething()  
We caught an exception in DoSomething(). Message =: Something bad happened  
Re throwing the exception after logging  
We caught the exception. Stack trace is

at ScratchPad.Throwing.Run() in C:\Code\ScratchPad\ScratchPad\Program.cs:line 28  
at ScratchPad.Throwing.Main() in C:\Code\ScratchPad\ScratchPad\Program.cs:line 37  

You may or may not have noticed that this is not the full stack trace. We can see that the exception came from Run(), however we can’t tell that in actual fact the exception originated from DoSomething() at all. This may or may not cause problems when debugging as now instead of going straight to the route cause, you first have to go through Run().

We lose the top of the stack trace because we used

  
throw e;  

Which essentially resets the stack trace to now start in that method. This makes sense as this is really the same as doing something like:

  
throw new Exception(e.Message);  

But what if we wanted to see the whole stack trace? Well instead of using throw e; we just use:

  
throw;  

With the updated catch block:

  
catch (Exception e)  
{  
    // We caught the exception. Typically some logging is taken place here  
    Console.WriteLine("We caught an exception in DoSomething(). Message = " + e.Message);  
    Console.WriteLine("Re throwing the exception after logging");

    // We then re-throw the exception  
    throw;  
}  

We get the output:

  
Trying to run DoSomething()  
We caught an exception in DoSomething(). Message = Something bad happened  
Re throwing the exception after logging  
We caught the exception. Stack trace is

at ScratchPad.Throwing.DoSomething() in C:\Code\ScratchPad\ScratchPad\Program.cs:line 10  
at ScratchPad.Throwing.Run() in C:\Code\ScratchPad\ScratchPad\Program.cs:line 28  
at ScratchPad.Throwing.Main() in C:\Code\ScratchPad\ScratchPad\Program.cs:line 37  

We now have the full story in the stack trace. We can see that the exception originated from the DoSomething() method and passed through the Run() method into Main() - much more helpful when debugging.

I don’t see any situation when using throw e; would be of any use at all. If you wanted to hide the stack trace then you would typically be throwing a completely new exception anyway - with a new message and perhaps other information to pass to the caller. If you didn’t want to hide the stack trace then throw; is the statement to use. Resharper even sees throw e; as a problem and tries to replace it with the simple throw;.

Even so I bet this mistake has been made a lot of times by a lot of people. So remember if you are wanting to re-throw an exception, never use throw e; as it will loose your stack trace. Instead always use throw;.

Read More

C# - String concatenation instead of StringBuilders

In C# when you concatenate two strings together you are implicitly creating a lot of strings in memory - more than you would have thought. For example consider the code:

  
List<string> values = new List<string>() {"foo ", "bar ","baz"};
string output = string.Empty;
foreach (var value in values)
{ 
    output += value; 
} 

Behind the scenes new strings are created for each portion of the resulting string in completely different memory locations through inefficient copy operations. So in total in this one line we have created: 1. "foo" 2. "bar" 3 "baz" 4 "foo bar" 5. "foo bar baz" In just one seemingly simple concatenation loop 5 strings have been created which of course is wildly inefficient. The problem gets a lot worse when you end up concatenating hundreds of strings together in a loop like this. The solution is to use StringBuilders. The above code is converted into:

 
List<string> values = new List<string>() {"foo ", "bar ","baz"};
StringBuilder builder = new StringBuilder();
foreach (var value in values)
{
    builder.Append(value);
} 

Using this method is a lot more efficient thanks to the fact that StringBuilders keep the same position in memory for their strings and do not perform inefficient copy operations each time a new string is appended (for example number 4 from above would not be created in a completely separate memory location). This makes StringBuilders very useful when concatenating many strings at once. But that doesn’t mean go replace all of your string concatenation code with StringBuilders right away. There are some situations where explicitly using a StringBuilder can make the situation worse. For example:

string result = "foo " + "bar " + "baz";

You might think that this suffers with the same inefficiencies as in the first example but in fact it doesn’t at all. The difference is that compile-time concatenations (which is what’s happening here) are automatically translated by the compiler into the appropriate calls to String.Concat() (which is the fastest way). Adding a StringBuilder would essentially be ruining the optimisations made by the compiler. The use of StringBuilder should be reserved to building complex strings at runtime - not replacing compile time concatenations.

Read More

C# - Casting with (T) vs. as (T)

In C# there are two methods for casting:

  1. (T) works with both value and reference types. It casts the object to T, and throws an InvalidCastException if the cast isn’t valid.
    e.g -; Foo obj = (Foo) bar;
  2. as (T) works only with reference types. It returns the object casted to T if it succeeds, and null if the cast isn’t valid.
    e.g -;
Foo obj = bar as Foo;  
if(obj == null)
{  
  // the cast did not succeed so proceed accordingly  
}  

The question therefore is which one should be used where?

Using (T) means that you fully expect the cast to succeed. If it doesn’t succeed then there is an error in the code that needs looking at.

Using as (T) on the other hand means that you do not fully expect the cast to succeed in every case. It is considered normal behaviour if the cast did not succeed and this would be taken care of through a null check afterwards.

The only mistake is when you use as (T) but do not follow it up with a null check. The developer fully expects the cast to succeed so doesn’t write the null check. However later down the line when something goes wrong, no exception is thrown on the invalid cast, no null check is performed, and you have yourself a bug that is hard to track down. It is best to always use the regular cast (T) unless you intend to check yourself for the invalid cast via as (T) and a null check afterwards.

Read More