Blogs  >  Windows Azure Storage Client Library: CloudBlob.DownloadToFile() may not entirely overwrite file contents

Windows Azure Storage Client Library: CloudBlob.DownloadToFile() may not entirely overwrite file contents

Windows Azure Storage Client Library: CloudBlob.DownloadToFile() may not entirely overwrite file contents


Update 3/09/011:  The bug is fixed in the Windows Azure SDK March 2011 release.

Summary

There is an issue in the Windows Azure Storage Client Library that can lead to unexpected behavior when utilizing the CloudBlob.DownloadToFile() methods

The current implementation of CloudBlob.DownloadToFile() does not erase or clear any preexisting data in the file. Therefore, if you download a blob which is smaller than the existing file, preexisting data at the tail end of the file will still exist.

Example: Let’s say I have a blob titled movieblob that currently contains all the movies that I would like to watch in the future. I want to download this blob to a local file moviesToWatch.txt, which currently contains a lot of romantic comedies which my wife recently watched, however, when I overwrite that file with the action movies I want to watch (which happens to be a smaller list) the existing text is not completely overwritten which may lead to a somewhat random movie selection.

moviesToWatch.txt

You've Got Mail;P.S. I Love You.;Gone With The Wind;Sleepless in Seattle;Notting Hill;Pretty Woman;The Runaway Bride;The Holiday;Little Women;When Harry Met Sally

movieblob

The Dark Knight;The Matrix;Braveheart;The Core;Star Trek 2:The Wrath of Khan;The Dirty Dozen;

moviesToWatch.txt (updated)

The Dark Knight;The Matrix;Braveheart;The Core;Star Trek 2:The Wrath of Khan;The Dirty Dozen;Woman;The Runaway Bride;The Holiday;Little Women;When Harry Met Sally

As you can see in the updated local moviesToWatch.txt file, the last section of the previous movie data still exists on the tail end of the file.

This issue will be addressed in a forthcoming release of the Storage Client Library.

Workaround

In order to avoid this behavior you can use the CloudBlob.DownloadToStream() method and pass in the stream for a file that you have already called File.Create on, see below.

using (var stream = File.Create("myFile.txt"))
{
            myBlob.DownloadToStream(stream);
}

To reiterate, this issue only affects scenarios where the file already exists, and the downloaded blob contents are less than the length of the previously existing file. If you are using CloudBlob.DownloadToFile() to write to a new file then you will be unaffected by this issue. Until the issue is resolved in a future release, we recommend that users follow the pattern above.

Joe Giardino

Comments (2)

  1. Mark Richards says:

    Joe,

    I am using the latest storage client library, and I am using DownloadToFile to download files in excess of 1GB.  I have downloaded tens of thousands of smaller files without fail.  But with these large files, I am getting a lot of errors, maybe 2/3 of the time.  If I re-run my job, they sometimes work, sometimes fail, i can't discern a pattern.

    here is the meat of the error:

    Unable to read data from the transport connection: The connection was closed., StackTrace:    at OD.CloudCoder.Common.Storage.BlobStore.GetBlobItem(CloudBlobContainer container, String filepath, String blobName)

    Have you run into this?  Any ideas what I can do to fix it ?

    Thanks,

    Mark Richards

  2. joegiardino@live.com says:

    @Mark Richards

    This sounds like you are hitting a connection issue which may be caused by your proxy / edge connection. If you interrogate the innerException of the StorageClientException that is thrown it will give more detailed information about the exact error.  Additionally, you can increase the timeout for the transaction via either the BlobRequestOptions or the CloudBlobClient.

    Lastly, the DownloadToFile/ByteArray/Stream/Text() methods performs it’s entire download in a single streaming get.  If you use CloudBlob.OpenRead() method it will utilize the BlobReadStream abstraction which will download the blob one block at a time as it is consumed. If a connection error occurs, then only that one block will need to be re-downloaded(according to the configured RetryPolicy). Also, this will potentially help improve performance as the client may not need cache a large amount of data locally. For large blobs this can help significantly, however be aware that you will be performing a higher number of overall transactions against the service.

    I hope this helps,

    Joe