Blogs  >  Windows Azure Tables: Introducing Upsert and Query Projection

Windows Azure Tables: Introducing Upsert and Query Projection

Windows Azure Tables: Introducing Upsert and Query Projection


As part of the “2011-08-18” version, two Windows Azure Table features were introduced; Upsert, represented by the InsertOrReplace Entity and InserOrMerge Entity APIs, and Query Projection.

In this section, we will first provide an overview of these two features, by defining them and providing use case scenarios. Then, we will illustrate how the Storage Client Library and WCF Data Services can be used to invoke the new APIs by providing sample code based on a usage scenario that highlights both newly introduced features.

Upsert (InsertOrReplace Entity and InsertOrMerge Entity) Feature

The term Upsert is a combination of update and insert. It refers to inserting an entity if the entity does not exist or updating the existing entity if it already exists.

Upsert requests will save an extra call to the storage system when an application wants to insert or update an entity without needing to know if the entity already exists or not. Without Upsert, the application could send up to two requests to get the same behavior i.e. an Update Entity request followed by an Insert Entity request in case the update failed with ‘entity does not exist’. As you can imagine, Upsert can significantly help performance and decrease latencies in certain scenarios (see next section for examples).

The two Upsert APIs provided by Windows Azure Table are InsertOrReplace Entity and InsertOrMerge Entity which are defined as follows:

  •  InsertOrReplace Entity: as the API name implies, InsertOrReplace Entity will insert the entity if the entity does not exist, or if the entity exists, replace the existing one. This means that once the operation successfully completes the table will contain the new entity with properties as defined in the InsertOrReplace Entity request, replacing the prior entity and its properties if it had previously existed.
  • InsertOrMerge Entity: InsertOrMerge Entity will insert the entity if the entity does not exist or, if the entity exists, merges the provided entity properties with the already existing ones. Once the operation successfully completes, the table will contain the provided entity with updated properties provided in the request. In other words, if the entity exists, the API would have the same effect as Merge Entity, where the resultant entity is a union of properties between the existing and the updated ones.

As you can infer from the above, the difference between the two Upsert commands lays in the update behavior if the entity already exists.

Note: Both Upsert APIs can be used in batch operations (Entity Group Transactions).

Usage Examples

Here are some possible scenarios for using Upsert.

InsertOrReplace Usage Example

An example scenario for use of the InsertOrReplace API would be when a component is responsible for creating and updating entities and ensuring that none of the old entity properties are retained in case the entity already exists. In this case, that component will issue an InsertOrReplace request.

As a concrete example, consider an engine as part of an Azure application that constantly updates its data set by pulling the newest version from a data feed that it is subscribed to. That data feed could be a movie listing, real estate listing, weather updates, etc.; provided as an internet service. At a regular interval, the Azure application would pull the latest data from the feed and update its view that is stored in Windows Azure Table. The application could already have an outdated view of the dataset; in this case a replacement for these entities is needed if they already exist. The data feed could also be providing new information and therefore the application would insert those as new entities. Without upsert capabilities the Azure application would first attempt to send an unconditional Update Entity request, and on failure, which means the entity does not exist, the application would then issue an Insert Entity request. This means that the application would end up doing two requests for every entity in the data set if it does not already exist.

This scenario is greatly simplified by the Upsert capability, where the application would issue a single InsertOrReplace Entity request for every entity instead of two requests. Upsert in this scenario doubles the performance when the entity did not already exist. In addition, the application could boost its performance further by batching InsertOrReplace requests together as part of an entity group transaction.

InsertOrMerge Usage Example

The InsertOrMerge API would appeal in situations where two distinct components need to insert/update an entity in a table, while also being responsible for managing properties in the entities. In addition, these components do not want to reset or delete the other’s component specific properties if the entity already exists.

As an example, consider a customer information table that is updated by multiple components of a service; component A is a mobile application and component B is the website of the service accessible through the browser. Based on some user input, assume that component A can detect the mobile phone number of a user called John Smith and wants to insert that information into the table. In this situation, the mobile device will issue an InsertOrMerge command and expects the customer entity to be created with only John Smith’s mobile phone number if the entity does not exist, or merge the properties sent with the existing ones in case John Smith’s entity already exists. Similarly, component B that is running as part of the website which can collect more information from the user such as his address, email address, etc. would do the same if it wants to update John Smith’s information without removing any other previously recorded information.

Without Upsert, both components would have first issued an unconditional Merge Entity request and on failure, they would have sent an Insert Entity request. They would also have to deal with the edge case where both of their Insert requests collide and one of those components had to re-issue an Update Entity request again. Upsert, as you can see, will greatly simplify the code and can provide a significant performance gain. We will highlight this example through code in the subsequent sections.

Query Projection Feature

Projection refers to querying a subset of an entity or entities’ properties. This is analogous to selecting a subset of the columns/properties of a certain table when querying in LINQ. It is a mechanism that would allow an application to reduce the amount of data returned by a query by specifying that only certain properties are returned in the response. For more information, you can also refer to Windows Azure Table: Query Entities, Windows Azure Table: Writing LINQ Queries, WCF Data Services: Query Projections, and OData: Select System Query Option ($select).

Usage Example

The obvious benefit of Query Projection is to reduce the latency of retrieving data from a Windows Azure Table. There are many usage scenarios where you only want to retrieve the needed properties in a table. For example, consider the scenario where we want to retrieve only a few properties in a table for display on a web page for entities that contain hundreds of properties. In this scenario we would save both on bandwidth usage and improve performance by using Projection.

Another usage is for a recurring job that acts on only a few properties and updates them on a regular basis. In this case, Projection would be handy in only retrieving the entities’ properties that needs updating. This latter usage scenario will be highlighted in sample code provided in the subsequent sections.

In addition, the projection feature will be useful in writing code that would count the number of entities in a certain table in a more efficient manner. We are working on providing Count() in the future, but until that is available Projection is useful since it will allow the transfer of a single property back to the client instead of the full row which makes the counting job more efficient. The code that highlights this usage is demonstrated in a later section.

Using Storage Client Library and WCF Data Services to invoke Upsert commands and Query Projection

In this section, we will walk you through sample code that highlights how an application can use the Storage Client Library and WCF Data Services in order to be able to send Upsert commands such as InsertOrReplace Entity or InsertOrMerge Entity and to do Query Projection.

We will use the customer scenario described in the above InsertOrMerge Usage Example section for the below sample code where a system inserts and updates customer information through two different means; a component running as a mobile application and a component running as part of a website service.

Sending new storage version using the Storage Client Library

To unlock the new Windows Azure Storage Table Upsert and Query Projection features, all REST/OData requests should be tagged with the “20011-08-18” storage version as described in the MSDN documents for InsertOrReplace Entity, InsertOrMerge Entity and Query Entities. If your application is using the .NET storage client library, you can leverage the following code that will help you make use of all current capabilities of the TableServiceContext class while being able to invoke the new Upsert and projection APIs.

The reason the below code is necessary is because the released library sends an older storage version “2009-09-19” that does not support these newly released features. Once a new version is released, the below code would not be needed.

public class TableServiceContextV2 : TableServiceContext
{
    private const string StorageVersionHeader = "x-ms-version";
    private const string August2011Version = "2011-08-18";

    public TableServiceContextV2(string baseAddress, StorageCredentials credentials):
        base (baseAddress, credentials)
    {
        this.SendingRequest += SendingRequestWithNewVersion;
    }

    private void SendingRequestWithNewVersion(object sender, SendingRequestEventArgs e)
    {
        HttpWebRequest request = e.Request as HttpWebRequest;

        // Apply the new storage version as a header value
        request.Headers[StorageVersionHeader] = August2011Version;
    }
}

We will be using TableServiceContextV2 throughout the subsequent sections in order to make use of the new feature.

Sample Setup Code

In this section we will define the schema of the “Customers” table that will be used throughout the subsequent sections and the sample code needed to fill this sample table with some initial entities.

Assume that the “Customers” table entity is defined as follows:

[DataServiceKey("PartitionKey", "RowKey")]
public class CustomerEntity
{
    public CustomerEntity(string partitionKey, string rowKey)
    {
        this.PartitionKey = partitionKey;
        this.RowKey = rowKey;
    }

    public CustomerEntity() {}

    /// <summary>
    /// Customer First Name
    /// </summary>
    public string PartitionKey { get; set; }
    
    /// <summary>
    /// Customer Last Name
    /// </summary>
    public string RowKey { get; set; }

    public DateTime Timestamp { get; set; }

    public string Address { get; set; }

    public string Email { get; set; }

    public string PhoneNumber { get; set; }

    // The below 2 properties are declared as nullable since
    // they are considered optional fields in this example
    public DateTime? CustomerSince { get; set; }

    public int? Rating { get; set; }
}

We will initialize the “Customers” table using the following code:

string accountName = "someaccountname";
string accountKey = "SOMEKEY";
string customersTableName = "Customers";

CloudStorageAccount account = CloudStorageAccount.Parse(string.Format("TableEndpoint=http://{0}.table.core.windows.net;AccountName={0};AccountKey={1}", accountName, accountKey));

CloudTableClient tableClient = account.CreateCloudTableClient();
tableClient.CreateTableIfNotExist(customersTableName);

// Bootstrap the Customers table with a set of sample customer entries
TableServiceContext bootstrapContext = new TableServiceContextV2(tableClient.BaseUri.ToString(), tableClient.Credentials);

BootstrapTable(customersTableName, bootstrapContext);

And the BootstrapTable method is defined as follows:

static void BootstrapTable(string tableName, TableServiceContext serviceContext)
{
    CustomerEntity customer1 = new CustomerEntity("Walter", "Harp");
    customer1.Address = "1345 Fictitious St, St Buffalo, NY 98052";
    customer1.CustomerSince = DateTime.Parse("01/05/2010");
    customer1.Email = "Walter@contoso.com";
    customer1.PhoneNumber = "425-555-0101";
    customer1.Rating = 4;

    serviceContext.AddObject(tableName, customer1);

    CustomerEntity customer2 = new CustomerEntity("Jonathan", "Foster");
    customer2.Address = "1234 SomeStreet St, Bellevue, WA 75001";
    customer2.CustomerSince = DateTime.Parse("01/05/2005");
    customer2.Email = "Jonathan@fourthcoffee.com";
    customer2.Rating = 3;

    serviceContext.AddObject(tableName, customer2);

    CustomerEntity customer3 = new CustomerEntity("Lisa", "Miller");
    customer3.Address = "4567 NiceStreet St, Seattle, WA 54332";
    customer3.CustomerSince = DateTime.Parse("01/05/2003");
    customer3.Email = "Lisa@northwindtraders.com";
    customer3.Rating = 2;

    serviceContext.AddObject(tableName, customer3);

    serviceContext.SaveChanges();
}

 

InsertOrMerge Entity API Sample Code

As per the sample code scenario, the mobile app would want to InsertOrMerge an entity for John Smith with his phone number, similarly the website engine would want to do the same but for different and distinct properties. Assume that the mobileServiceContext defined below will represent the DataServiceContext that is running as part of the mobile app and the websiteServiceContext will represent the DataServiceContext that is running as part of the website engine.

// The mobileServiceContext will represent the app running on a mobile phone that is responsible in updating/inserting the customer phone number
TableServiceContext mobileServiceContext = new TableServiceContextV2(tableClient.BaseUri.ToString(), tableClient.Credentials);

// The websiteServiceContext will represent the instance of the system that is able to insert and update other information about the customer 
TableServiceContext websiteServiceContext = new TableServiceContextV2(tableClient.BaseUri.ToString(), tableClient.Credentials);

The WCF DataServiceContext class does not natively support an InsertOrMerge or InsertOrReplace API. As mentioned in the MSDN documents for InsertOrMerge Entity and InsertOrReplace Entity, the OData requests as they appear on the wire are very similar in nature to a Merge Entity and Update Entity respectively with the main difference being that the If-Match header, represented by the Etag parameter in code, is omitted. In WCF Data Services API terms, this would be analogous to first attaching (AttachTo) an entity object to the DataServiceContext without any Etag specified and then invoking the UpdateObject method. This means the DataServiceContext is not initially tracking that object and therefore will omit sending any If-Match header when the SaveChanges method is called. The fact that an entity is being updated without an If-Match header signals to Windows Azure Tables that this is an Upsert request.

Note: Prior to version “20011-08-18”, Windows Azure Table will reject any MERGE or PUT requests made against it where the If-Match header was not specified, as those earlier versions do not support Upsert commands.

The code as part of the mobile application which Upserts John Smith’s phone number would therefore look as follows:

// The mobile app collects the customer's phone number
CustomerEntity mobileCustomer = new CustomerEntity("John", "Smith");
mobileCustomer.PhoneNumber = "505-555-0122";

// Notice how the AttachTo method is called with a null Etag which indicates that this is an Upsert Command
mobileServiceContext.AttachTo(customersTableName, mobileCustomer, null);

mobileServiceContext.UpdateObject(mobileCustomer);

// No SaveChangeOptions is used, which indicates that a MERGE verb will be used. This set of steps will result in an InsertOrMerge command to be sent to Windows Azure Table
mobileServiceContext.SaveChanges();

Similarly, the website engine which may not be aware if any data already exists for John Smith, and would want to InsertOrMerge its collected data so that it does not overwrite any existing data.

CustomerEntity websiteCustomer = new CustomerEntity("John", "Smith");
websiteCustomer.Address = "6789 Main St, Albuquerque, VA 98004";
websiteCustomer.Email = "John@cohowinery.com";

// Since the website system might not know if the customer entry already exists, it will also issue an InsertOrMerge command as follows
websiteServiceContext.AttachTo(customersTableName, websiteCustomer);
websiteServiceContext.UpdateObject(websiteCustomer);
websiteServiceContext.SaveChanges();

At this point, if both components have Upserted John Smith’s information using the InsertOrMerge API as described above, the “Customers” table will now contain a John Smith entry with his phone number, address and email recorded.

InsertOrReplace Entity API Sample Code

Assume that there is an option on the website to sync all of the customer’s data from their mail service, and the semantics the website wants is to replace all of customer’s entity with this new data if it already exists, otherwise insert the entity. In this case, any information that was already in the system needs to be replaced; otherwise, we will insert the information as provided. Therefore, the best option for the website would be to use the InsertOrReplace Entity API.

Since we wish to send an InsertOrReplace Entity request, we would first need to AttachTo the websiteServiceContext without providing any Etag value before calling the UpdateObject method. In addition, the SaveChanges method would need to be invoked with the SaveChangesOptions.ReplaceOnUpdate parameter to indicate that the Upsert operation should replace the existing entity in case it exists. The code for this example for a customer called David would look like this:

CustomerEntity mailServiceCustomer = new CustomerEntity("David", "Alexander");
mailServiceCustomer.PhoneNumber = "333-555-0155";
mailServiceCustomer.Address = "234 Main St, Anaheim, TX, 65000";
mailServiceCustomer.Email = "David@wideworldimporters.com";

// Note how SaveChanges is called with ReplaceOnUpdate which is the differentiation
// factor between an InsertOrMerge Entity API and InsertOrReplace Entity Api
websiteServiceContext.AttachTo(customersTableName, mailServiceCustomer);
websiteServiceContext.UpdateObject(mailServiceCustomer);
websiteServiceContext.SaveChanges(SaveChangesOptions.ReplaceOnUpdate);

Note: If in a rare case, the application wants to Upsert an entity that is already tracked by the DataServiceContext, you should first detach from the context and then attach to it in order to clear out any Etag tracking before performing the rest of the above steps.

Query Projection Sample Code

To demonstrate the query projection feature, we will expand on the sample code provided in the previous sections. Assume that there is an offline job that needs to update the rating for all the customers. It needs to increment the rating by one, for everyone who has been a customer since 2006. To accomplish this, it would not be efficient to read all entities’ properties; rather, it would be more efficient to only retrieve the Rating property for all entities that match the CustomerSince criteria using query projection and then update just that property as needed. Since projection will provide a partial view of all the CustomerEntity properties, the best practice is to create a new data service entity type that would only hold the properties that we are interested in.

[DataServiceEntity]
    public class CustomerRating
    {
        public DateTime? CustomerSince { get; set; }
        
        public int? Rating { get; set; }
    }

Although Rating is the only needed property to perform the job, we will also retrieve CustomerSince that the job could use for debugging and logging purposes.

The scan job code that uses projection and update is as follows:

TableServiceContext ratingServiceContext = new TableServiceContextV2(tableClient.BaseUri.ToString(), tableClient.Credentials);

var query = from entity in ratingServiceContext.CreateQuery<CustomerRating>(customersTableName)
            where entity.CustomerSince < DateTime.Parse("01/01/2006")
            select new CustomerRating
            {
                CustomerSince =  entity.CustomerSince,
                Rating = entity.Rating
            };

// Iterate over all the entities that match the query criteria and increment the rating by 1
foreach (CustomerRating customerRating in query)
{
    if (customerRating.Rating.HasValue)
    {
        ++customerRating.Rating;
    }
    else
    {
        // in case no rating was already set
        customerRating.Rating = 1;
    }
    ratingServiceContext.UpdateObject(customerRating);
}

ratingServiceContext.SaveChanges();

Even though you might not have explicitly projected PartitionKey and RowKey, the server will be returning them as part of the OData entity resource path, known as link in WCF Data Services terms. Etag (or entity timestamp) is also returned as part of the response. These 3 properties are tracked by the DataServiceContext which uses them whenever an entity update is subsequently performed. Therefore, optimistic concurrency is also guaranteed on an entity with partial view as is the case with the CustomerRating.

The above code could be written differently if you wanted to re-use the CustomerEntity class instead of the CustomerRating class to project on. However, we highly recommend the use of a partial view class as it is the case with the CustomerRating. This would avoid resetting any value type properties defined as int, double, datetime, etc. by mistake in case they were not defined as nullable types. Furthermore, if you use the original CustomerEntity class, updates will generate a request with all the full entity properties serialized which would consume unnecessary bandwidth since only the Rating property was intended to be updated.

Note: When projecting on an entity type that defines PartitionKey and RowKey such as CustomerEntity you must project on those keys if these entities are updated in later on.

Note: You can project on up to 250 properties including the partition key, row key and timestamp.

Entity Counting Using Projection

Until we provide Count(),you can use projection as follows. You can select any column you wish, though a common approach is to project the PartitionKey since it is anyway returned as part of the OData query projection response.

/// <summary>
/// Counts the number of entities in an Azure table
/// </summary>
/// <param name="dataServiceContext">The TableServiceContextV2 to use in order to issue the requests.  TableServiceContextV2 makes sure that Aug-2011 version is transmitted on the wire</param>
/// <param name="tableName">The table name that we wish to count its entities</param>
/// <returns></returns>
public long GetEntityCount(TableServiceContext dataServiceContext, string tableName)
{
    long count = 0;
            
    var query = from entity in dataServiceContext.CreateQuery<TableServiceEntity>(tableName)
                select new
                 {
                     entity.PartitionKey
                 };

    foreach (var row in query.AsTableServiceQuery())
    {
        ++count;
    }

    return count;
}
Other Projection Usage and Consideration

If you want to project on a single property, the following WCF code is not supported and the query will be rejected by the server since it will result in an unsupported OData request format:

// The below single-entity projection is not supported and will be rejected by the server
var query = from entity in ratingServiceContext.CreateQuery<CustomerEntity>(customersTableName)
            where entity.CustomerSince < DateTime.Parse("01/01/2006")
            select entity.Rating;

Instead, the following code needs to be used. Note that the below code also demonstrates how you can project using an anonymous non-entity type object.

var query = from entity in ratingServiceContext.CreateQuery<CustomerEntity>(customersTableName)
            where entity.CustomerSince < DateTime.Parse("01/01/2006")
            select new
            {
                entity.Rating
            };

// The below code demonstrates how the projected Rating property could be accessed
foreach (var partialData in query)
{
    Console.WriteLine("Rating: {0}", partialData.Rating);
}

Jean Ghanem

Comments (13)

  1. I wanted to reiterate that Storage Client library that shipped with SDK 1.5 does not use the 2011-08-18 version for x-ms-version header. These features could not make it in that because of timelines but we are working on it. However, Jean has provided extensions in this blog that you can start using today with the Storage Client library. The extension involves:

    Extend TableServiceContext (TableServiceContextV2 in the post above) to register for SendingRequest event, in which you will change the version. From here on, use this new context instance and you will get the new version.

  2. Erick T says:

    Can you post an example of the InsertOrReplace and InsertOrMerge functionality using the HTTP REST API (i.e., the XML version).

    Thanks.

  3. Hi Erick, we have a sample at msdn.microsoft.com/…/hh452241.aspx and msdn.microsoft.com/…/hh452242.aspx.

    Basically InsertOrMerge/InsertOrReplace are similar to Update – Merge/Replace respectively except that the InsertOrXXX version of request does not have the If-Match header. Does that help?

    Thanks,

    jai

  4. ChrisL says:

    I'm still holding my breath for the ability to skip rows without incurring the cost of a full table scan.  This is useful if I have 10,000 items in a table, and I simply want a range of zero to 100 rows.

  5. when I use the code above to test InsertOrMerge operation,a ResourceNotFound error occurred.When I use the table projection code above instead,another error Inputvalid occurred.My Windows Azure SDK version is v1.6.Can anybody reply me?

  6. @terrytao, can you see if there is anything odd via a fiddler trace? Feel free to share the fiddler request/response with jharidas at microsoft? Also with 1.6, since it sends the required version, you can just use TableServiceContext rather than the above TableServiceContextV2.

    Jai

  7. Ricky says:

    Jai, I am having the same problem as terrytao trying projections on development storage

    I have tracked it down to

    23/05/2012 1:12:29 PM [Error] TableDataServiceHost.ProcessException. Rethrowing exception: System.Data.Services.DataServiceException: The response requires that version 2 of the protocol be used, but the MaxProtocolVersion of the data service is set to DataServiceProtocolVersion.V1.

      at System.Data.Services.DataServiceConfiguration.ValidateMaxProtocolVersion(RequestDescription requestDescription)

    My request Headers all seem correct to specify the correct version of data services and storage client

    DataServiceVersion: 2.0;NetFx

    MaxDataServiceVersion: 2.0;NetFx

    x-ms-version: 2011-08-18

  8. Ricky says:

    After a bit more digging the development storage doesn't support these as of yet.

  9. Ricky says:

    I have submitted a feature request to enhance the development storage to support these newer operations

    http://www.mygreatwindowsazureidea.com/…/2872460-storage-emulator-maxprotocolversion-of-the-data-se

  10. @Ricky, Dev storage does not support projections and upsert as it still uses the old WCF data Service interface on server side . We are working on improving this but we do not have an ETA that we can share as of yet.

  11. Does anyone know when insert-or-replace will be supported on the local storage emulator?

    cheers!

  12. @Joel, we are working on providing this and we plan on having this available sometime during fall of this year.

  13. Luke Latham says:

    Jai: Nevermind. I just figured it out. One ? though: I tested my batch-upsert code by only specifying serviceContext.SaveChangesWithRetries(Data.Services.Client.SaveChangesOptions.Batch). I read elsewhere that we are supposed to use serviceContext.SaveChangesWithRetries(Data.Services.Client.SaveChangesOptions.ReplaceOnUpdate Or Data.Services.Client.SaveChangesOptions.Batch). Is it ok to submit the batch-upsert to the service context without providing the option for ReplaceOnUpdate? Thanks, Luke