We recently released an update to the Storage Client library in SDK 1.3. We wanted to take this opportunity to go over some breaking changes that we have introduced and also list some of the bugs we have fixed (compatible changes) in this release.
Thanks for all the feedback you have been providing us via forums and this site, and as always please continue to do so as it helps us improve the client library.
Note: We have used Storage Client Library v1.2 to indicate the library version shipped with Windows Azure SDK 1.2 and v1.3 to indicate the library shipped with Windows Azure SDK 1.3
Breaking Changes
1. Bug: FetchAttributes ignores blob type and lease status properties
In Storage Client Library v1.2, a call to FetchAttributes never checks if the blob instance is of valid type since it ignores the BlobType property. For example, if a CloudPageBlob instance refers to a block blob in the blob service, then FetchAttributes will not throw an exception when called.
In Storage Client Library v1.3, FetchAttributes records the BlobType and LeaseStatus properties. In addition, it will throw an exception when the blob type property returned by the service does not match with type of class being used (i.e. when CloudPageBlob is used to represent block blob or vice versa).
2. Bug: ListBlobsWithPrefix can display same blob Prefixes multiple times
Let us assume we have the following blobs in a container called photos:
- photos/Brazil/Rio1.jpg
- photos/Brazil/Rio2.jpg
- photos/Michigan/Mackinaw1.jpg
- photos/Michigan/Mackinaw2.jpg
- photos/Seattle/Rainier1.jpg
- photos/Seattle/Rainier2.jpg
Now to list the photos hierarchically, I could use the following code to get the list of folders under the container “photos”. I would then list the photos depending on the folder that is selected.
IEnumerable<IListBlobItem> blobList = client.ListBlobsWithPrefix("photos/"); foreach (IListBlobItem item in blobList) { Console.WriteLine("Item Name = {0}", item.Uri.AbsoluteUri); }
The expected output is:
- photos/Brazil/
- photos/Michigan/
- photos/Seattle/
However, assume that the blob service returns Rio1.jpg through Mackinaw1.jpg along with a continuation marker which an application can use to continue listing. The client library would then continue the listing with the server using this continuation marker, and with the continuation assume that it receives the remaining items. Since the prefix photos/Michigan is repeated again for Mackinaw2.jpg, the client library incorrectly duplicates this prefix. If this happens, then the result of the above code in Storage Client v1.2 is:
- photos/Brazil/
- photos/Michigan/
- photos/Michigan/
- photos/Seattle/
Basically, Michigan would be repeated twice. In Storage Client Library v1.3, we collapse this to always provide the same result for the above code irrespective of how the blobs may be returned in the listing.
3. Bug: CreateIfNotExist on a table, blob or queue container does not handle container being deleted
In Storage Client Library v1.2, when a deleted container is recreated before the service’s garbage collection finishes removing the container, the service returns HttpStatusCode.Conflict with StorageErrorCode as ResourceAlreadyexists with extended error information indicating that the container is being deleted. This error code is not handled by the Storage Client Library v1.2, and it instead returns false giving a perception that the container exists.
In Storage Client Library v1.3, we throw a StorageClientException exception with ErrorCode = ResourceAlreadyExists and exception’s ExtendedErrorInformation’s error code set to XXXBeingDeleted (ContainerBeingDeleted, TableBeingDeleted or QueueBeingDeleted). This exception should be handled by the client application and retried after a period of 35 seconds or more.
One approach to avoid this exception while deleting and recreating containers/queues/tables is to use dynamic (new) names when recreating instead of using the same name.
4. Bug: CloudTableQuery retrieves up to 100 entities with Take rather than the 1000 limit
In Storage Client Library v1.2, using CloudTableQuery limits the query results to 100 entities when Take(N) is used with N > 100. We have fixed this in Storage Client Library v1.3 by setting the limit appropriately to Min(N, 1000) where 1000 is the server side limit.
5. Bug: CopyBlob does not copy metadata set on destination blob instance
As noted in this post, Storage Client Library v1.2 has a bug in which metadata set on the destination blob instance in the client is ignored, so it is not recorded with the destination blob in the blob service. We have fixed this in Storage Client Library v1.3, and if the metadata is set, it is stored with the destination blob. If no metadata is set, the blob service will copy the metadata from the source blob to the destination blob.
6. Bug: Operations that returned PreConditionfailure and NotModified returns BadRequest as the StorageErrorCode
In Storage Client Library v1.2, PreConditionfailure and NotModified errors lead to StorageClientException with StorageErrorCode mapped to BadRequest.
In Storage Client Library v1.3, we have correctly mapped the StorageErrorCode to ConditionFailed
7. Bug: CloudBlobClient.ParallelOperationThreadCount > 64 leads to NotSupportedException
ParallelOperationThreadCount controls the number of concurrent block uploads. In Storage Client Library v1.2, the value can be between 1, int.MaxValue. But when a value greater than 64 was set, UploadByteArray, UploadFile, UploadText methods would start uploading blocks in parallel but eventually fail with NotSupported exception. In Storage Client Library v1.3, we have reduced the max limit to 64. A value greater than 64 will cause ArgumentOutOfRangeException exception right upfront when the property is set.
8. Bug: DownloadToStream implementation always does a range download that results in md5 header not returned
In Storage Client Library v1.2, DownloadToStream, which is used by other variants – DownloadText, DownloadToFile and DownloadByteArray, always does a range GET by passing the entire range in the range header “x-ms-range”. However, in the service, using range for GETs does not return content-md5 headers even if the range encapsulates the entire blob.
In Storage Client Library v1.3, we now do not send the “x-ms-range” header in the above mentioned methods which allows the content-md5 header to be returned.
9. CloudBlob retrieved using container’s GetBlobReference, GetBlockBlobReference or GetPageBlobReference creates a new instance of the container.
In Storage Client Library v1.2, the blob instance always creates a new container instance that is returned via the Container property in CloudBlob class. The Container property represents the container that stores the blob.
In Storage Client Library v1.3, we instead use the same container instance which was used in creating the blob reference. Let us explain this using an example:
CloudBlobClient client = account.CreateCloudBlobClient(); CloudBlobContainer container = client.GetContainerReference("blobtypebug"); container.FetchAttributes(); container.Attributes.Metadata.Add("SomeKey", "SomeValue"); CloudBlob blockBlob = container.GetBlockBlobReference("blockblob.txt"); Console.WriteLine("Are instances same={0}", blockBlob.Container == container); Console.WriteLine("SomeKey value={0}", blockBlob.Container.Attributes.Metadata["SomeKey"]);
For the above code, in Storage Client Library v1.2, the output is:
Are instances same=False
SomeKey metadata value=
This signifies that the blob creates a new instance that is then returned when the Container property is referenced. Hence the metadata “SomeKey” set is missing on that instance until FetchAttributes is invoked on that particular instance.
In Storage Client Library v1.3, the output is:
Are instances same=True
SomeKey metadata value=SomeValue
We set the same container instance that was used to get a blob reference. Hence the metadata is already set.
Due to this change, any code relying on the instances to be different may break.
10. Bug: CloudQueueMessage is always Base64 encoded allowing less than 8KB of original message data.
In Storage Client Library v1.2, the queue message is Base64 encoded which increases the message size by 1/3rd (approximately). The Base64 encoding ensures that message over the wire is valid XML data. On retrieval, the library decodes this and returns the original data.
However, we want to provide an alternate where that data which is already valid XML data is transmitted and stored in the raw format, so an application can store a full size 8KB message.
In Storage Client Library v1.3, we have provided a flag “EncodeMessage” on the CloudQueue that indicates if it should encode the message using Base64 encoding or send it in raw format. By default, we still Base64 encode the message. To store the message without encoding one would do the following:
CloudQueue queue = client.GetQueueReference("workflow"); queue.EncodeMessage = false;
One should be careful when using this flag, so that your application does not turn off message encoding on an existing queue with encoded messages in it. The other thing to ensure when turning off encoding is that the raw message has only valid XML data since PutMessage is sent over the wire in XML format, and the message is delivered as part of that XML.
Note: When turning off message encoding on an existing queue, one can prefix the raw message with a fixed version header. Then when a message is received, if the application does not see the new version header, then it knows it has to decode it. Another option would be to start using new queues for the un-encoded messages, and drain the old queues with encoded messages with code that does the decoding.
11. Inconsistent escaping of blob name and prefixes when relative URIs are used.
In Storage Client Library v1.2, the rules used to escape a blob name or prefix provided in the APIs like constructors, GetBlobReference, ListBlobXXX etc, are inconsistent when relative URIs are used. The relative Uri name for the blob or prefix to these methods are treated as escaped or un-escaped string based on the input.
For example:
a) CloudBlob blob1 = new CloudBlob("container/space test", service);
b) CloudBlob blob2 = new CloudBlob("container/space%20test", service);
c) ListBlobsWithPrefix("container/space%20test");
In the above cases, v1.2 treats the first two as "container/space test". However, in the third, the prefix is treated as "container/space%20test". To reiterate, relative URIs are inconsistently evaluated as seen when comparing (b) with (c). (b) is treated as already escaped and stored as "container/space test". However, (c) is escaped again and treated as "container/space%20test".
In Storage Client Library v1.3, we treat relative URIs as literal, basically keeping the exact representation that was passed in. In the above examples, (a) would be treated as "container/space test" as before. The latter two i.e. (b) and (c) would be treated as "container/space%20test". Here is a table showing how the names are treated in v1.2 compared to in v1.3.
Method |
BlobName Or Prefix |
V1.2 |
V1.3 |
CloudBlobClient::GetBlobReference CloudBlodDirectory::GetBlobReference CloudBlodContainer::GetBlobReference |
space test/blob1 |
space test/blob1 |
space test/blob1 |
CloudBlobClient::GetBlobReference CloudBlodDirectory::GetBlobReference CloudBlodContainer::GetBlobReference |
space%20test/blob1 |
space test/blob1 |
space%20test/blob1 |
CloudBlobClient::ListBlobWithPrefix |
space test/ |
space test/ |
space test/ |
CloudBlobClient::ListBlobWithPrefix |
space%20test/ |
space%20test/ |
space%20test/ |
CloudBlodDirectory::ListBlob |
space test/ |
space test/ |
space test/ |
CloudBlodDirectory::ListBlob |
space%20test/ |
space test/ |
space%20test/ |
12. Bug: ListBlobsSegmented does not parse the blob names with special characters like ‘ ‘, #, : etc.
In Storage Client Library v1.2, folder names with certain special characters like #, ‘ ‘, : etc. are not parsed correctly for list blobs leading to files not being listed. This gave a perception that blobs were missing. The problem was with the parsing of response and we have fixed this problem in v1.3.
13. Bug: DataServiceContext timeout is in seconds but the value is set as milliseconds
In Storage Client Library v1.2, timeout is incorrectly set as milliseconds on the DataServiceContext. DataServiceContext treatsthe integer as seconds and not milliseconds. We now correctly set the timeout in v1.3.
14. Validation added for CloudBlobClient.WriteBlockSizeInBytes
WriteBlockSizeInBytes controls the block size when using block blob uploads. In Storage Client Library v1.2, there was no validation done for the value.
In Storage Client Library v1.3, we have restricted the valid range for WriteBlockSizeInBytes to be [1MB-4MB]. An ArgumentOutOfRangeException is thrown if the value is outside this range.
Other Bug Fixes and Improvements
1. Provide overloads to get a snapshot for blob constructors and methods that get reference to blobs.
Constructors include CloudBlob, CloudBlockBlob, CloudPageBlob and methods like CloudBlobClient’s GetBlobReference, GetBlockBlobReference and GetPageBlobReference all have overloads that take snapshot time. Here is an example:
CloudStorageAccount account = CloudStorageAccount.Parse(Program.ConnectionString); CloudBlobClient client = account.CreateCloudBlobClient(); CloudBlob baseBlob = client.GetBlockBlobReference("photos/seattle.jpg"); CloudBlob snapshotBlob = baseBlob.CreateSnapshot(); // save the snapshot time for later use DateTime snapshotTime = snapshotBlob.SnapshotTime.Value; // Use the saved snapshot time to get a reference to snapshot blob // by utilizing the new overload and then delete the snapshot CloudBlob snapshotRefernce = client.GetBlobReference(blobName, snapshotTime); snapshotRefernce.Delete();
2. CloudPageBlob now has a ClearPage method.
We have provided a new API to clear a page in page blob:
public void ClearPages(long startOffset, long length)
3. Bug: Reads using BlobStream issues a GetBlockList even when the blob is a Page Blob
In Storage Client Library v1.2, a GetBlockList is always invoked even for reading Page blobs leading to an expected exception being handled by the code.
In Storage Client Library v1.3 we issue FetchAttributes when the blob type is not known. This avoids an erroneous GetBlockList call on a page blob instance. If the blob type is known in your application, please use the appropriate blob type to avoid this extra call.
Example: when reading a page blob, the following code will not incur the extra FetchAttributes call since the BlobStream was retrieved using a CloudPageBlob instance.
CloudPageBlob pageBlob = container.GetPageBlobReference("mypageblob.vhd"); BlobStream stream = pageBlob.OpenRead(); // Read from stream…
However, the following code would incur the extra FetchAttributes call to determine the blob type:
CloudBlob pageBlob = container.GetBlobReference("mypageblob.vhd"); BlobStream stream = pageBlob.OpenRead(); // Read from stream…
4. Bug: WritePages does not reset the stream on retries
As we had posted on our site, In Storage Client Library v1.2, WritePages API does not reset the stream on a retry leading to exceptions. We have fixed this when the stream is seekable, and reset the stream back to the beginning when doing the retry. For non seekable streams, we throw NotSupportedException exception.
5. Bug: Retry logic retries on all 5XX errors
In Storage Client Library v1.2, HttpStatusCode NotImplemented and HttpVersionNotSupported results in a retry. There is no point in retrying on such failures. In Storage Client Library v1.3, we throw a StorageServerException exception and do not retry.
6. Bug: SharedKeyLite authentication scheme for Blob and Queue throws NullReference exception
We now support SharedKeyLite scheme for blobs and queues.
7. Bug: CreateTableIfNotExist has a race condition that can lead to an exception “TableAlreadyExists” being thrown
CreateTableIfNotExist first checks for the table existence before sending a create request. Concurrent requests of CreateTableIfNotExist can lead to all but one of them failing with TableAlreadyExists and this error is not handled in Storage Client Library v1.2 leading to a StorageException being thrown. In Storage Client Library v1.3, we now check for this error and return false indicating that the table already exists without throwing the exception.
8. Provide property for controlling block upload for CloudBlob
In Storage Client Library v1.2, block upload is used only if the upload size is 32MB or greater. Up to 32MB, all blob uploads via UploadByteArray, UploadText, UploadFile are done as a single blob upload via PutBlob.
In Storage Client Library v1.3, we have preserved the default behavior of v1.2 but have provided a property called SingleBlobUploadThresholdInBytes which can be set to control what size of blob will upload via blocks versus a single put blob. For example, the setting the following code will upload all blobs up to 8MB as a single put blob and will use block upload for blob sizes greater than 8MB.
CloudBlobClient.SingleBlobUploadThresholdInBytes = 8 * 1024 *1024;
The valid range for this property is [1MB – 64MB).
Jai Haridas
Thanks for the improvements.
I'm still confused on #3, CreateIfNotExist. It would help me if you would show an example (best practices) for achieving the functionality "CreateIfNotExist", even if that means somehow check to see if it exists, then create. This point has created a bunch of forum questions (including by me) and the answer above (don't do it, create another name) is not very satisfying. The name I use has meaning and I don't want to create another level of redirection in creating a new container name that I have to track someplace else.
Hi Peter,
I understand that this indirection is relatively little more work than defining them as constants in a file, but following this pattern allows you to continue purging your data efficiently and in addition decouples your application logic from Windows Azure Storage service's GC process.
There are various scenarios where this fits is quite well:
1> Scenario: Container for files that need to be purged periodically
Create a container/table for say every month example: News_Nov_2010. This way when you want to purge the data, you still need just one call to delete the container. The application always forms the container name using the current month and year and can set this to change based on a timer.
2> Scenario: Testing environments
The account/container/table names can come from configuration files/Tables. Maintain separate files for Test vs. Production environments. If configuration file is used, then you can check-in the new names into your source control. If the list is maintained in a table, one option is to run a script that goes and creates the new tables before updating the configuration table with the new names. During initialization phase, your application will read this configuration and set the table names to use.
As for the CreateTableIfExists, you would need to handle exceptions. Here is some very trivial code:
try
{
bool isCreated = client.CreateTableIfNotExist(containerName);
}
catch (StorageException e)
{
Log("Exception: {0}, {1}-{2}",
e.ErrorCode,
e.ExtendedErrorInformation.ErrorCode,
e);
// We could have some errors here like TableBeingDeleted and since the container is not created, we cannot continue.
throw;
}
catch (Exception e)
{
Log("Table Exception: {0}", e);
throw;
}
If required, you can wrap the above code in a retry of your own: on an exception, wait for 40 seconds and retry. Retry at most X times (where X is based on how much your application can wait during initialization).
Thanks,
Jai