MSDN Magazine, August 2017

Page 51 - MSDN Magazine, August 2017

P. 51

through the supersystem, and must take into account the partic- ular constraints of the discrete subsystems. The principal change in architecting such a supersystem is more in the depth or scope when defining the system, such as sizing a queue for throughput, but not sizing the hardware that hosts it. You must still consider latency, connectivity, volume, availability, cost, and any number of other factors, but the work of sizing and defining the particulars of the service ends once you’ve defined the capacity and the cost of the capability needed to meet the identified requirements. There’s no additional work of defining the host environment and all its needed artifacts as you might have done in the past.
Before I get into designing what the overall flow of information into the system will look like, let’s note a few facts about the source systems and some requirements for the end-state system:
• All of the data for every subscription under the EA will be returned for all resources for every day it’s available in the designated month. This can result in a lot of data, with a linear growth as the month progresses.
• Any and all records may be updated throughout the month. The stated settlement timing is 72 hours. As a point of safety, I’ll consider all records in flux for a given month until 72 hours past the beginning of the subsequent month.
• The usage data isn’t returned with an ID for the enrollment, so I’ll have to add it.
• Determining cost is a separate activity and requires retrieving the rate card and further processing.
• No information will be received for subscriptions that aren’t in the specified EA.
Additionally, there are a few technical business requirements that the prototype must include:
• The ability to create read-only and geographically distributed datasets must be included.
• Processing performance should be adjustable for cost versus performance.
• The ability to secure access at the subscription level should be designed in.
The overall flow itself is fairly simple in that I’m simply going to retrieve the data, add a small amount of information and persist it into the target storage.
As depicted in Figure 1, the path for getting the data to its tar- get is fairly simple because there’s no integration with any external systems other than the EA Billing API. I know that when I work through the data, I’ll have to do some amount of initial processing and enrichment (for example, add the enrollment ID), and on the persistence side I’ll have to deal with existing records from the previous day’s fetches. I’ll probably want to look at separating those two processes.
Thus, you see three major blocks that represent retrieval, enrich- ment and persistence, which are all separated by some queuing mechanism. The complications start after I make some technology picks and start looking at the details of implementing with those components and making the processing pipeline run in parallel.
Technology Mapping
At this point in the process, two factors beyond the requirements of the overall system may come into play: enterprise standards and personal preference. If these are in conflict, the result can be almost endless debate. Fortunately, in this instance I don’t have to worry about this. I do have my own mix of constraints, along with those I noted from the initial requirements. In this case, I’d like to make sure to hit these marks:
• Simplest compute provisioning and edits/updates for quick cycles of testing
• Easy automatic scaling
• Easy provisioning for geographic distribution of data • Easy mechanisms for scheduling and triggering work
Here, I want to focus on the work and not on the system setup. I’ll leave things like cost analysis for various implementations and adherence to corporate standards until after I have a working prototype. I did consider some alternatives, such as Azure SQL Database versus Azure Cosmos DB, but I’m going to focus on my choices and the primary motivations for each of those choices.
• Compute: Azure Functions will serve me well here. It meets my need for scale and simplicity while also providing easy configuration of scheduled and triggered jobs and easy integrations with bindings.
• Queuing: Keeping things simple, I’ll use Azure Storage Blobs and separate the files by containers. The unknown but expect- edly large size of each initial input file makes storage queues a non-option for initial retrieval, and likely takes them out of the running for processing individual subscription data splits. Beyond that, I’d like to keep the mechanism uniform and I really don’t need any advanced capabilities, such as priority messages, routing, message-specific security and poisoned message handling.
• Storage: Azure Cosmos DB is indeed my friend here. Using the subscription ID as the partition key allows me to limit access by subscription, if necessary. Additionally, the ease of adding and removing read and read-write geographically distributed replicas and native support in Power BI makes this a no-brainer for my system. Last, I have to admit a little personal bias: I want a proper document storage mecha- nism that supports the SQL syntax I’ve used for too many years to abandon.
Figure 2 represents the application of technology to the logical architecture, as well as adding some processing flow to it.
I’ve taken the liberty of including the names I used in this dia- gram, but you might not have names at this stage of the design. The shapes used indicate the technology in play; the numbers on the line are the sequence in which the process is executed, and the arrows indicate which component initiates the outbound call. Note that I’ve identified four Azure Functions, four Azure Storage
August 2017 45
Enrich and Split for Processing
Enrich and Split for Processing
Fetch Data for Enrollment
Intake Queue
Figure 1 Logical Flow msdnmagazine.com
Parallel
Persistence Queue

49 50 51 52 53