Architecture
Overview
Datasource
Service Logging Library (SLL) is an API that allows the services to generate telemetry events based on common schema using server-side code. Part B Scenario of the Correlation Context will be used for associating events to scenarios and partners/customers/constituents.
The events generated by SLL could be consumed by different event processors such as Xpert, Xlens, and Asimov Cooker. Both Xpert and Xlens need to have their agent installed in the machine. They process the events locally and send the aggregated data to their respective data stores. Asimov cooker is an offline cooker (data processed outside the machine) that contains set of data pipelines that takes the raw events as input and produces cooked hourly streams. In Asimov pipeline, the events are sent to cosmos thru Cosmos Data Loader (CDL). Asimov cooker removes the duplicates, applying common schema and cluster them based on the event name and produces the hourly streams.
As Asimov Cooker allows us to handle bulk data (without any service restrictions), our Availability pipeline will be based on the SLL events cooked by Asimov Cooker. We will be also looking for integrating data from other sources such as Xlens, Application Insights, etc.
Data Processing
The data processing contains the following components.
- Aggregate
- Translate
- Validate
- Publish
Aggregate
Aggregation of the events is done in cosmos. It involves two steps.
- Filter and Extract
- Minute and Daily Aggregation
Required fields are extracted from Asimov Cooker View. Scenario ID and Partner ID are extracted from Correlation Context. If Correlation Context does not exist (for SLL version <5), We will be using ScScenario and ScPartner or Operation Name and Caller Name from Part C to calculate the Scenario ID and Partner ID, respectively. The extracted fields will be stored as a structured stream (RawEvents) under respective hour folder in cosmos. The minute level aggregation is done using the produced structure stream. The daily aggregation is done on the minute level aggregated stream. Both Aggregated streams are copied to the datastore.
Translate
In datastore, the aggregated data is joined with Scenario and Customers tables based on the scenario ID and Partner ID.
Validate
The data is validated (duplicates, invalid scenario Ids, etc), in datastore.
Publish
The validated data will be pivoted to Scenarios and published for the UI to consume.
Schema
Filtered fields
- ScenarioID (from cC: ms.b.tel.scenario)
- PartnerID (from cC: ms.b.te.partner)
- ScScenario
- ScPartner
- OperationName
- CallerName
- Time
- RequestType
- Environment
- RoleInstance
- SLLVersion
- Latency
- RequestStatus
- testHeader
- quoteIsTest
- isTest
Aggregated Stream
see Calculations for how it is calculated.
- ScenarioID
- PartnerID
- Time
- RequestType
- Environment
- TotalRequests
- SuccessfulRequests
- FailedRequests
- AverageLatency
- STID
Filters
These are the filters applied while collecting the SLL data from Asimov pipeline.
- Data_BaseType = Ms.QoS.IncomingServiceRequest
- CloudEnvironment - Excludes dev, int, ppe, non-prod, nonprod, sandbox, perf, test (exact match or contains prefixed with "-")
- If cC is empty, scScenario
- If cC and ScScenario is empty, OperationName from the list provided by the OMS team
Excluding the test traffic using testHeader, quoteIsTest and isTest fields. Considering the events which satisfy following
string.IsNullOrEmpty(testHeader) && (string.IsNullOrEmpty(quoteIsTest) || quoteIsTest.ToLowerInvariant() != "true") && (string.IsNullOrEmpty(isTest) || isTest.ToLowerInvariant() != "true")
WHERE (data_baseType == "Ms.Qos.IncomingServiceRequest" OR data_baseType == "IncomingServiceRequest") AND ( string.IsNullOrEmpty(cloudEnvironment) OR ( cloudEnvironment.ToLowerInvariant() != "dev" AND cloudEnvironment.ToLowerInvariant() != "int" AND cloudEnvironment.ToLowerInvariant() != "ppe" AND cloudEnvironment.ToLowerInvariant() != "non-prod" AND cloudEnvironment.ToLowerInvariant() != "nonprod" AND cloudEnvironment.ToLowerInvariant() != "sandbox" AND cloudEnvironment.ToLowerInvariant() != "perf" AND cloudEnvironment.ToLowerInvariant() != "test" AND !cloudEnvironment.ToLowerInvariant().Contains("-dev") AND !cloudEnvironment.ToLowerInvariant().Contains("-int") AND !cloudEnvironment.ToLowerInvariant().Contains("-ppe") AND !cloudEnvironment.ToLowerInvariant().Contains("-sandbox") AND !cloudEnvironment.ToLowerInvariant().Contains("-perf") AND !cloudEnvironment.ToLowerInvariant().Contains("-test") ) OR ( cloudEnvironment.ToLowerInvariant().Contains("prod") AND !cloudEnvironment.ToLowerInvariant().Contains("non-prod") AND !cloudEnvironment.ToLowerInvariant().Contains("nonprod") ) ) AND ( string.IsNullOrEmpty(testHeader) && (string.IsNullOrEmpty(quoteIsTest) || quoteIsTest.ToLowerInvariant() != "true") && (string.IsNullOrEmpty(isTest) || isTest.ToLowerInvariant() != "true") )
Calculations
Calculating ScenarioId: The following priority is used to calculate the scenario Id: Ms.b.tel.ScenarioId -> ScScenario -> OperationName
Determining RequestStatus:
if(RequestStatus <5) status=success else //(>=5 or empty) status =failure
Calculating Availability:
Availability = 100*SuccessfulRequests/TotalRequests