Page 52 - MSDN Magazine, August 2017
P. 52

Blob Containers and three Azure Cosmos DB collections that I’ll employ as the working pieces of my implementation.
Separating the data into three collections is useful for explain- ing, but serves a grander purpose. I won’t need the same security for each of the types of documents and the separation makes that easy to understand and manage. More im- portant, I define the performance characteristics by collection and the separation allows me to more eas- ily optimize that by having a large high-throughput collection specif- ically for the DetailedUsageData, while the other two remain minimal.
Retrieving Data
Starting with the first two legs of the
data journey, I want to run some-
thing similar to what I do with a Cron job. While the WebJobs SDK itself would support this type of implementation, it would leave a lot of work of configuring the runtime environment to me and increase my overall development effort. Because Azure Functions is built on top of the WebJobs SDK and naturally supports Timer Trigger, it’s an easy choice. I could’ve used Azure Data Factory because it’s a tool made specifically for moving data around and it supports retrieving Web data and working with Blobs. However, that would mean I’d need to work out certain things with regard to reference data and updating duplicate records in Azure Cosmos DB when I don’t have the row ID. Familiarity with development and debugging using Azure Functions, and the information I can get from Azure Functions integration with Application Insights, makes Azure Functions my preferred choice in this instance.
The Timer Trigger has an obvious function, but in order for DailyEABatchControl to know what to process, it retrieves con- figuration information from the Enrollments collection, which has the following schema:
{
"enrollmentNumber": "<enrollment number>", "description": "",
"accessKey": "<access key>", "detailedEnabled": "true", "summaryEnabled": "false",
}
For now, having the enrollment number, access key and a flag to turn on processing (“detailedEnabled”) is sufficient for me to do work. However, should I start adding capabilities and need additional run configuration information, Azure Cosmos DB will allow me to easily add elements to the document schema without having to do a bunch of reworking and data migration. Once the DailyEABatchControl is triggered, it will loop through all of the documents and call RetrieveUsage for each enrollment that has “detailedEnabled” set to true, separating the logic to start a job from the logic to retrieve the source data. I use the JobLog
12
DailyEABatchControl
3 45
EA Portal (Billing API)
RetrieveUsage
SplitDailyUsage
ProcessDailyUsage
6
newdailyusage
newdailysplit
8
11
{ } Enrollments
DetailedUsageData { }
7
{ }
EAUsage
{ } JobLog
9 10
processedusage
processeddailysplit
Power BI Reporting
12
Figure 2 Technology Map and Data Flow
collection to determine if a job has already been run for the day, as shown in Figure 3.
The last lamba results in a filtered list of enrollments for which data hasn’t been retrieved for the day in question. Next, I’ll call the RetrieveUsage (step 3 in Figure 2) from within DailyEABatch- Control by calling it with HTTPClient with sufficient data in the post body for it to know the enrollment for which it’s fetching data and the month for which it’s fetching it, as shown in Figure 4.
It’s worth pointing out that this isn’t intended to be an open sys- tem. I’m creating a closed processing loop so I don’t want just any caller executing the RetrieveUsage Function. Thus, I’ve secured it by requiring a code that’s not shown in Figure 4, but is part of the URI returned from GetEnvironmentVariable(“retrieveUsageUri”). In an enterprise implementation, a service principal and Azure Active Directory integration would be a more realistic choice to achieve a higher degree of security.
Figure 3 Job Control Logic
// Get list of enrollments for daily processing List<Enrollment> enrollments =
inputDocument.CreateDocumentQuery<Enrollment>( UriFactory.CreateDocumentCollectionUri(dbName, enrollmentCollection), new SqlQuerySpec("SELECT * FROM c WHERE c.detailedEnabled = 'true'"), queryOptions).ToList<Enrollment>();
// Get yesterday's date to make sure there are logs for today int comparisonEpoch =
(int)(DateTime.UtcNow.AddDays(-1) - new DateTime(1970, 1, 1)).TotalSeconds;
string logQuery =
"SELECT * FROM c WHERE c.epoch > '" + comparisonEpoch.ToString() + "'";
List<JobLog> logs = inputDocument.CreateDocumentQuery<JobLog>( UriFactory.CreateDocumentCollectionUri(dbName, jobLogCollection), new SqlQuerySpec(logQuery), queryOptions).ToList<JobLog>();
// Get list of enrollments for which there is no match var jobList = enrollments.Where(x =>
!logs.Any (l => l.enrollmentNumber == x.enrollmentNumber));
46 msdn magazine
Azure
















































   50   51   52   53   54