MSDN Magazine, August 2017

Page 54 - MSDN Magazine, August 2017

P. 54

Figure 4 Retrieving Usage Data
Splitting Data for Parallel Processing
With so much data coming in and the work of somehow updating records for a given month of processing each day, it’s important to process this data in a parallel fashion. Usually, at least nowadays, this is when I break out the parallel libraries for C#, write a few lines of code and pat myself on the back for being a genius at par- allel processing. However, in this instance, I’d really like to just rely on the capabilities of the platform to do that for me and allow me to focus on each discrete task.
The next Azure Function in the sequence has been configured with a blob trigger so it will pick up files that land in the inbound processing storage container. The job at this step is to split the inbound file into a file-per-day per enrollment. All in all, this is a pretty simple step, but it does require deserializing the JSON file into RAM. It’s important to note this, because the method I’ve chosen to use for the prototype simply calls the deserialize method:
JsonConvert.DeserializeObject<List<EAUsageDetail>>(myBlob);
I know this to be sufficient for my purposes, but the present RAM allocation for the Azure Function host is 1.5GB. It’s possible that, for a large enrollment with substantial resources provisioned, a file would become too big at some point in the month to load into RAM, in which case an alternate method for parsing and split- ting the file will have to be used. Moreover, if you create an Azure Function that takes more than five minutes to run, it will have to be modified because the current default is five minutes, though this can be adjusted to a max of 10 minutes via the host configura- tion JSON. As I mentioned early on, knowing the volume of data will be key at each point and for integration in the overall system. Once the data has been deserialized, I’ll grab the max day out of it and set up a loop from day one to day max to start selecting out the data for each of those days, as shown in Figure 5.
Once all the days have been split into separate files and written out (see step 7 in Figure 2), I simply move the file to the processed-
Figure 5 Selecting Each Day’s Data
foreach(var doc in jobList) {
HttpClient httpClient = new HttpClient();
string retrieveUsageUri = @"https://" + System.Environment.GetEnvironmentVariable("retrieveUsageUri");
string postBody = "{\\\\"enrollment\\\\":\\\\"" + doc.enrollmentNumber + "\\\\"," + "\\\\"month\\\\":\\\\"" + DateTime.Now.ToString("yyyy-MM") + "\\\\"}";
httpClient.DefaultRequestHeaders.Accept.Add(
new MediaTypeWithQualityHeaderValue("application/json"));
var content = new StringContent(postBody, Encoding.UTF8, "application/json"); var response = await httpClient.PostAsync(theUri, content);
response.EnsureSuccessStatusCode();
string fetchResult = await response.Content.ReadAsStringAsync(); }
The last step of the first leg of my data’s journey is within the RetrieveUsage function, where it’s persisted to the newdailyusage container with Azure Blob Storage. However, in order to get that data I have to construct the call and include the accessKey as a bearer token in the header:
HttpClient httpClient = new HttpClient();
string retrieveUsageUri = usageQB.FullEAReportUrl();
httpClient.DefaultRequestHeaders.Add("authorization", bearerTokenHeader); httpClient.DefaultRequestHeaders.Add("api-version", "1.0");
var response = await httpClient.GetAsync(retrieveUsageUri);
response.EnsureSuccessStatusCode();
string responseText = await response.Content.ReadAsStringAsync();
For the sake of brevity, I’ve cut some date manipulations out of this code block and haven’t included a helper class for generating the bearerTokenHeader or the UsageReportQueryBuilder. However, this should be sufficient to illustrate how they’re used and ordered. The accessKey is passed into the static method FromJwt, which will return the BearerToken type, from which I simply grab the header and add it to the request that’s created from the URL constructed by the call to usageQB.FullEAReportUrl. Last, I update the out-
put binding to the path and filename I want for the Blob target:
path = "newdailyusage/" + workingDate.ToString("yyyyMMdd") + "-" + data.enrollment + "-usage.json";
var attributes = new Attribute[] {
new BlobAttribute(path),
new StorageAccountAttribute("eabillingstorage_STORAGE") };
using (var writer = await binder.BindAsync<TextWriter>(attributes)) {
writer.Write(responseText); }
This will result in a structure in Azure Storage that looks like this:
newdailyusage/
20170508-1234-usage.json 20170508-456-usage.json 20170507-123-usage.json
This allows me to store data multiple enrollments and multiple files for each enrollment in case processing doesn’t happen for some reason. Additionally, because data can change for previous days as the month progresses, it’s important to have the files available for research and reconciliation in case anomalies show up in the report data.
// Loop through collection filtering by day
for(int dayToProcess = 1; dayToProcess <= maxDayOfMonth; dayToProcess++) {
// Get documents for current processing day
var docsForCurrentDay = results.Where (d => d.Day==dayToProcess);
// Serialize to string string jsonForCurrentDay =
JsonConvert.SerializeObject(docsForCurrentDay); log.Info($"***** Docs for day {dayToProcess} *****");
// Get date for one of the records for today
string processDateString = (from docs in results where docs.Day ==
dayToProcess select docs.Date).First();
path = "newdailysplit/" + DateTime.Parse(processDateString).ToString("yyyyMMdd") + "-" + enrollment + "-dailysplit.json";
// Write out each day's data to file in container "\\
ewdailysplit" var attributes = new Attribute[]
{
new BlobAttribute(path),
new StorageAccountAttribute("eabillingstorage_STORAGE") };
using (var writer = await binder.BindAsync<TextWriter>(attributes)) {
writer.Write(jsonForCurrentDay); }
}
48 msdn magazine
Azure

52 53 54 55 56