Page 58 - MSDN Magazine, August 2017
P. 58

Figure 8 Using FeedOptions to Set the Cross-Partition Query Flag
records and then add the new—and potentially updated—documents back in. To do this I need to pass in the partition key to the Delete- DocumentAsync method. An optimization would be to pull the documents back and do a local comparison, update any changed documents and add net new documents. It’s a little taxing, because all of the elements in each document must be compared. Because there’s no primary key defined for the billing documents, you can likely find the matched document using SubscriptionId, MeterId, InstanceId and Date and compare the rest of the elements from there. This would offload some of the work from Azure Cosmos DB and reduce the overall traffic.
With the way cleared to add the documents back into the collection, I simply loop through the docs and call AddAsync on the document- Collector I defined as the output binding for the Azure Function:
// Update the enrollment field in the incomming collection incomingDailyUsage.ForEach (usage => usage.Enrollment = enrollment);
int processedRecordCount=0;
foreach (EnrollmentUsageDetail usageDoc in incomingDailyUsage) {
await documentCollector.AddAsync(usageDoc); processedRecordCount++;
}
While it’s not much of a change, I’ve also done a little bit of enrichment by adding the Enrollment number to each document in the collection. Running one daily split file produces the log information shown in Figure 9.
Final Note
The only thing left to do is to run a good many iterations with varying inputs and then measure so I can properly size the ser- vices I’m using. This includes testing out the geographic replication capabilities and some further prototyping of the security that I’ll want to implement around subscription data access; these were two of the major reasons for choosing Azure Cosmos DB. The net lessons to be gleaned are some of the ones that we seem to keep learning in the world of IT:
1. Therearenomagicbullets,notevenwithaserverlessarchitecture. 2. Nothing replaces thorough testing.
3. Size your dependent services and treat this as seriously as
you did when sizing your hardware in the past.
4. Pay close attention to cost, especially under high through-
put conditions.
The upside of using serverless compute like Azure Functions is
that you pay only for what’s consumed. For regular but infrequent processing such as this, that can be a big benefit in cost savings. Finally, configuring capabilities is a better experience and allows faster time to product than configuring host servers. n
Joseph Fultz is a cloud solution architect at Microsoft. He works with Microsoft customers developing architectures for solving business problems leveraging Microsoft Azure. Formerly, Fultz was responsible for the development and archi- tecture of GM’s car-sharing program (mavendrive.com). Contact him on Twitter: @JosephRFultz or via e-mail at jofultz@microsoft.com.
thanks to the following Microsoft technical expert who reviewed this article: Fabio Calvacante
string docsToDeleteQuery = String.Format(@"SELECT * FROM c where c.Enrollment = ""{0}"" AND c.Date = ""{1}""", enrollment, incomingDataDate);
FeedOptions queryOptions = new FeedOptions { MaxItemCount = -1, EnableCrossPartitionQuery = true };
IQueryable<Document> deleteQuery = docDBClient. CreateDocumentQuery<Document>(
UriFactory.CreateDocumentCollectionUri(dbName, collectionName), new SqlQuerySpec(docsToDeleteQuery), queryOptions);
log.Info("Delete documents");
int deletedDocumentCount = 0; foreach (Document doc in deleteQuery) {
await docDBClient.DeleteDocumentAsync(((dynamic)doc)._self, new RequestOptions { PartitionKey =
new PartitionKey(((dynamic)doc).SubscriptionId) });
deletedDocumentCount++; }
constraints of the platforms on which you’re building, as well as the cost of each decision.
When provisioning more than 2,500 RUs, the system requires that a partition key be specified. This works for me, because I want to partition that data in any case to help with both scale and security in the future.
As you can see in Figure 7, I’ve specified 8,000 RUs, which is a little more than the calculation indicated, and I’ve specified SubscriptionId as the partition key.
Additionally, I set up the ProcessDailyUsage with a blob trigger on the newdailysplit container and with an input and output bind- ing for Azure Cosmos DB. The input binding is used to find the records that exist for the given day and enrollment and to handle duplicates. I’ll ensure that my FeedOptions sets the cross-partition query flag, as shown in Figure 8.
I create a query to grab all the records for the enrollment on that date and then loop through and delete them. This is one instance where SQL Azure could’ve made things easier by issuing a DELETE query or by using an upsert with a known primary key. However, in Azure Cosmos DB, to do the upsert I need the row ID, which means I must make the round trip and do the comparison on fields I know to uniquely identify the document and then use that row’s id or selflink. For this example, I simply delete all the
Figure 9 The Log Information from a Daily Split File
2017-06-10T01:16:55.291 Function started (Id=bfb220aa-97ab-4d36-9c1e-602763b93ff0) 2017-06-10T01:16:56.041 First 15 chars: [{"AccountOwner 2017-06-10T01:16:56.181 get date
2017-06-10T01:16:56.181 getting enrollment
2017-06-10T01:16:56.181 Incoming date: 11/01/2016 for Enrollment: 4944727 2017-06-10T01:16:56.181 Collection: partitionedusage 2017-06-10T01:16:56.181 query: SELECT * FROM c where c.Enrollment = "4944727" AND c.Date = "11/01/2016"
2017-06-10T01:16:56.181 Create delete query 2017-06-10T01:16:56.197 Delete documents 2017-06-10T01:17:23.189 2142 docs deleted while processing 20161101-4944727-dailysplit.json
2017-06-10T01:17:23.189 Import documents
2017-06-10T01:17:44.628 2142 records imported from file 20161101-4944727-dailysplit.json
2017-06-10T01:17:44.628 Moving file 20161101-4944727-dailysplit.json to / processedusage container
2017-06-10T01:17:44.674 Deleting 20161101-4944727-dailysplit.json 2017-06-10T01:17:44.690 Completed!
2017-06-10T01:17:44.690 Function completed (Success, Id=bfb220aa-97ab- 4d36-9c1e-602763b93ff0, Duration=49397ms)
52 msdn magazine
Azure





















































   56   57   58   59   60