Wednesday, January 25, 2006

Binary Serialization of DataSets in .NET 2.0

Last week I mentioned that I would post some test results on the different options we now have with serializing DataSets; Binary (new in 2.0), XML, and custom serialization surrogate. First, let me direct your attention to an updated surrogate class that I used in the test that can handle the new DateTimeMode property on columns.

Now before you read these numbers I'd like to be clear about the scenarios I tested. To make it easy on me I set up the internet test to work with one of our current product's test beds already set up with some real test data. There are two servers in this test bed, one is the application server hosting remote components in IIS (HttpChannel) and accessed by the client using the BinaryFormatter. The second server is the database server. So what I was measuring was also including the time it took to select the records from the database through the components (no business logic, just filling an untyped dataset from a stored proc simple select -- about 19 columns with various data types). The servers are in Florida and I work from home in California where I was running the client. I was accessing them over a cable modem. In the local test I had the exact same code and databse running in all tiers on my local development machine to simulate no network latency.

That said, we don't want look at the numbers per se, look at the trend -- that is what is important here. There are also a lot of factors that affect serialization over the internet on a cable modem and you'll notice that there are a few anomalies in the local numbers with small sets of data, probably because my dev machine hiccupped at that moment.


So to recap, this is what we're measuring;

1) The time it takes to make the remote call to activate the component,
2) The time it takes to select the records from the database and create the dataset,
3) In the case of the surrogate class, the time it takes to convert the dataset to a series of byte arrays,
4) The time it takes to transmit the data,
5) In the case of the surrogate class, the time it takes to convert the byte arrays back into a dataset.

My conclusion is that native Binary serialization of datasets is only better over a long wire when the number of rows are in the thousands. Our application is 99% data entry forms so we would never be returning that much data. The surrogate class does have a slight overhead if your network is not congested/not the internet, however nothing that the user would notice. Therefore, for now, I'm sticking with the surrogate class for our application. I'll let you know if I change my mind later based on more formal load testing.

Okay here are the performance numbers:

LOCAL TEST (Low Network Latency)      
# Records Size (bytes) Transmission Time (ms)
  Surrogate Binary XML Surrogate Binary XML
2 13,761 56,198 11,673 15.63 31.25 15.63
10 15,261 57,419 15,754 15.63 31.25 15.63
20 17,133 58,972 21,018 31.25 31.25 15.63
100 32,233 71,626 63,434 31.25 31.25 31.25
500 107,336 134,641 276,350 62.50 46.88 62.50
1000 203,943 216,285 550,832 93.75 78.13 109.38
2000 392,296 374,665 1,087,187 171.88 125.00 250.00
4000 772,451 694,776 2,174,403 343.75 265.63 515.63
8000 1,521,110 1,323,693 4,383,707 734.38 531.25 968.75
   
INTERNET TEST (High Network Latency)      
# Records Size (bytes) Transmission Time (ms)
  Surrogate Binary XML Surrogate Binary XML
2 13,761 56,198 11,673 312.50 578.13 312.50
10 15,261 57,419 15,754 328.13 640.63 390.63
20 17,133 58,972 21,018 343.75 655.43 343.75
100 32,233 71,626 63,434 421.88 671.88 593.75
500 107,336 134,641 276,350 906.25 1078.13 1875.00
1000 203,943 216,285 550,832 1484.38 1562.50 3531.25
2000 392,296 374,665 1,087,187 2640.63 2515.63 6750.00
4000 772,451 694,776 2,174,403 5312.50 4500.00 13312.50
8000 1,521,110 1,323,693 4,383,707 9687.50 8406.25 26609.38

12 comments:

foo said...

Beth,

Thanks for this data. The surrogate class looks very promising for most uses. I look forward to your follow-up results.

++Alan

Anonymous said...

Hi Beth,

Try to do something like:

DataSet1 ds = new DataSet1();
new DataSet1TableAdapters.Order_DetailsTableAdapter().FillByOrderId(ds.Order_Details, 10255);

BinaryFormatter bin = new BinaryFormatter();

// Save as XML
using (StreamWriter writer1 = new StreamWriter(@"c:\xml.dat"))
{
bin.Serialize(writer1.BaseStream, ds);
}

// Save as binary
using (StreamWriter writer2 = new StreamWriter(@"c:\bin.dat"))
{
ds.RemotingFormat = SerializationFormat.Binary;
bin.Serialize(writer2.BaseStream, ds);
}

The DataSet was created using the order details Northwind database.

If you run it for only one record, the xml one is smaller, unless you set the SchemaSerializationMode = ExcludeSchema in the DataSet designer, in which case they are the same size.

For 20 rows (WHERE (OrderID < 10255)), the xml one is twice the size of the binary one.

I'm not sure if this is consistent with your tests ;), however, it could be interesting to check what happens if you set the SchemaSerialization property. Also make sure you set the RemotingFormat property in the DataSet because if you don't set it, it will be serialized as XML even if you are using remoting binary serialization.

Regards

Beth Massi said...

Hi Andres,

I was testing untyped datasets, only typed datasets can take advantage of the SchemaSerializationMode property ;-). And yes, I definately was testing the new RemotingFormat.

Thanks for the comment. I'll try to set up a typed dataset test in the near future.

Cheers,
-B

Anonymous said...

Beth,
Great post, thanks for the data. One question though...Any guesses on how the new DataTableReader would compare to the surrogate option?

Beth Massi said...

The DataTableReader is not a serializable class. Possibly you meant a DataTable? That would be be faster but you can't return multiple tables/relations.

Anonymous said...

Beth,
Sorry about that. I should have put a little more thought in before I asked that question.

One more question though. What we've been doing for a couple years with remoting is in our data access layer we use DataReaders to get data then loop through and shove it into arraylists. One arraylist per datatable. Then on the client we do the opposite, loop through the arraylists and populate the datatables.

Of course this complicates things slightly and some boxing and unboxing occurs, but this method always seems to outperform anything else we've tried.

Any thoughts?

Beth Massi said...

Yes, that's basically what we're doing with the surrogate class here; it contains a series of byte arrays. This class was originally developed by a member of the ADO.NET team and I took it and added time zone support. You may want to try it out and see if it works for your scenario.

Anonymous said...

Hi Beth,

just a quick note to let you know that I got the same issue like you with DateTimeMode in DateTable Columns.

Do you know about any work arounds?

Beth Massi said...

Take a look at this post for a work around.
http://bethmassi.blogspot.com/2006/01/serializing-data-across-time-zones-in.html

Anonymous said...

A little boy wow power leveling wearied with a long journey,rolex lay down overcome RuneScape Gold with fatigue on the very brink of a deep well.replica rolex Being within an inch of falling into the water,aion kina Dame Fortune,flyff gold it is said,maple story mesos appeared to him,shaiya gold and waking him from his slumber,aoc power leveling thus addressed him: World of warcraft power leveling Little boy,flyff power leveling pray wake up:aion kina for had you wow power leveling fallen into the well,dog collars the blame will be thrown on me,flyff power leveling and I shall get an ill name among mortals;2moons gold for I find that men are sure world of warcraft gold to blame their calamities to me,maple story mesos however much by their own folly they have world of warcraft gold really brought them on themselves

milla said...

a system that is unique, the binary system is a system that always left me open-mouthed as the site of buy viagra

Unknown said...

this is a good blog