MSDN Magazine, August 2017

Page 56 - MSDN Magazine, August 2017

P. 56

Figure 6 Azure Cosmos DB Pricing Calculator
usage container. To keep the diagram in Figure 2 easy to parse, I’ve omitted some containers—in particular, the error files container is missing from the diagram. This is the container that holds any file that causes an exception during processing, whether that file is the entire usage file or just one of the daily splits. I don’t spend time or effort correcting the data for missing or errored days because, once an issue is identified, the process can be triggered for a given month and enrollment or for a single daily split to correct the problem. Also clearly missing from the prototype are alerting and compensating mechanisms for when errors occur, but that’s something I want to bubble up through Application Insights integration with the Operations Management Suite.
Persisting the Data to Azure Cosmos DB
With the files split and ready to be picked up by the ProcessDaily- Usage Function, it’s time to consider some issues that need to be addressed, namely throughput to the target and how to handle updates. Often when working through some solution architecture in an enterprise, you run into older systems that are less capable, or where real-time loads and high-throughput scenarios need to be managed. I don’t naturally have any hard
initial provisioning. I know there will be 31 concurrent executions, but it’s a little harder to nail down how many concurrent requests per second that will create without doing repetitive runs. The end result of this proto- type will help to inform the final architecture and requirements for provisioning, but because I’m work- ing forward on this timeline, I’m going to take a stab at it using the
following as my rules for estimating: • 1,200 records
• 31 concurrent executions (for a single EA)
• 0.124 seconds per request (empirical evidence from
measuring a few individual requests)
I’ll round down to 0.1 seconds for a more conservative estimate,
thus overestimating the load. This nets 310 requests per second per EA, which in turn comes out to about 7,800 request units (RUs) based on the calculator results, as can be seen in Figure 6.
Because the maximum RUs that can be provisioned without calling support is 10,000, this might seem kind of high. However, I’m running an unthrottled parallel process and that drives up the throughput significantly, which in turn will drive up the cost. This is a major consideration when designing the structure because it’s fine for me to run this for some testing, but for the real solution I’ll need a throttling mechanism to slow down the processing so I can provision fewer RUs and save myself a little money. I don’t need the data to be captured as fast as possible, just within a reasonable enough time that someone could review and consume it on a daily basis. The good news is that the Azure Functions team
throughput constraints in my cloud native setup for this architecture, but I could create problems for myself if I don’t take the time to think through the volume and speed of the data I’m feeding into the cloud services I’m consuming.
For my data set, each of the daily splits is about 2.4MB and contains about 1,200 individual documents. Keep in mind that each document represents one meter read- ing for one resource provisioned in Azure. Thus, for each EA the number of documents in a daily split could vary greatly depending on resource usage across the enterprise. The ProcessDailyUsage Function is configured to trigger based on receiving new blobs in the newdailysplit container. This means I’ll have as many as 31 concurrent Function executions manipulating the data. To help me estimate what I need to provision for Azure Cosmos DB, I used the calculator at documentdb.com/ capacityplanner. Without some empirical testing I had to make a few guesses for the
Figure 7 Provisioning a New Collection
has a concurrency control mechanism in the backlog of issues that will eventually get resolved (bit.ly/2tcpAbI), and will provide a good means of control once implemented. Some other options are to introduce artifi- cial arbitrary delays (let’s all agree this is bad) or to rework the processing and handle the parallel execution explicitly in the C# code. Also, as technical expert Fabio Cavalcante pointed out in a conversation, another good option would be to modify the architecture a bit by adding Azure Storage Queues and using features such as visibility timeouts and scheduled delivery to act as a throttling mech- anism. That would add a few moving parts to the system and I’d have to work out the interaction of using a queue for activation while keeping the data in storage, or slice up the data in 64KB blocks for the queue. Once throttling is available in Azure Functions, I’ll be able to keep it in this simpler form with which I’m working. The salient point here is that when working with a serverless architecture you must be familiar with the
50 msdn magazine
Azure

54 55 56 57 58