Events - Polled Event Hash Verification

Problem Statement

In the case where a service provider does not support push notifications via web hooks or some other mechanism, element instances can be configured to poll for "change data" events. The polling configuration can be implemented using the POST, PATCH or PUT /instances API, as well as via the Cloud Elements UI.

There are scenarios, however, when a large quantity of data may get affected or changed at the service provider, e.g., if a large number of contacts in Hubspot are updated via a batch load. The element instance's event poller will pick up all these changes, one page at a time, and dispatch the event for downstream processing, which can be a formula execution or dispatch to a customer's endpoint.

In most use cases, a customer is only interested in changes to a subset of fields for a given record. For example, if a Hubspot contact record has 200 fields, the Cloud Elements customer may care about changes to only 10 of these fields. When large data updates occur at the service provider, in this example at Hubspot, the poller grabs all the records that have changed even though only a small fraction of changed record may include changes to the 10 fields, for example, that the customer cares about. This can result in unnecessary event dispatch, and possibly result in higher latency for downstream event process operations.

Solution

To help reduce the ensuing latency issue for downstream event processing, Cloud Elements has implemented a new, optional feature called "Event Hash Verification", which can be configured via the event poller configuration to help filter out unnecessary events, i.e., those where the changed fields have no bearing on downstream event process, from the dispatched events.

Following is a snippet with an example of the polling configuration when creating a new instance via the POST /instances API.

POST /instances request payload
{
  ...
  "configuration": {
    "event.helper.key": "sfdcPolling",
    "event.poller.refresh_interval": "15",
    "event.notification.callback.url": "https://myevents.mycompany.com",
    "event.vendor.type": "polling",
    "event.notification.enabled": true,
    "event.poller.configuration": "{\"contacts\":{\"url\":\"/hubs/crm/contacts?where=LastModifiedDate>'${gmtDate:yyyy-MM-dd'T'HH:mm:ss.SSS'Z'}'\",\"idField\":\"Id\",\"datesConfiguration\":{\"updatedDateField\":\"LastModifiedDate\",\"updatedDateFormat\":\"yyyy-MM-dd'T'HH:mm:ss.SSSZ\",\"updatedDateTimezone\":\"GMT\",\"createdDateField\":\"CreatedDate\",\"createdDateFormat\":\"yyyy-MM-dd'T'HH:mm:ss.SSSZ\",\"createdDateTimezone\":\"GMT\"}}}"
  },
  ...
}

The above sample payload is to create an instance of the Salesforce.com element, with the polling interval set to 15 minutes, i.e., the poller will run every 15 minutes and determine if any change data is available to retrieve. Additionally, the poller is configured to poll for changes to the contacts object/resource, using the provide query URL. Lastly, the date configuration is set so that the poller can discern between created data vs updated data in order to be able to provide an accurate event type to downstream event processing.

Imagine that the contacts resource in the Salesforce.com element instance is 150 fields wide, which is not out of the ordinary for a contact record. Say that the user is interested in receiving change data events only when either of the FirstName, LastName or Email address fields change for a given record. With the event hash validation enhancement, a user can configure the poller to use these fields to create a hash, which will then be used in conjunction with the configured last update date field to determine if an event has changes to these fields or not, and further filter the received events to dispatch only those events where one or more of these fields have changed. An example of the poller configuration with event has validation enabled is shown below.

POST /instances with event hash validation
{
  ...
  "configuration": {
    "event.helper.key": "sfdcPolling",
    "event.poller.refresh_interval": "1",
    "event.notification.callback.url": "http://localhost:2222/events",
    "event.vendor.type": "polling",
    "event.notification.enabled": true,
    "event.poller.configuration": "{\"contacts\":{\"url\":\"/hubs/crm/contacts?where=LastModifiedDate>'${gmtDate:yyyy-MM-dd'T'HH:mm:ss.SSS'Z'}'\",\"idField\":\"Id\",\"datesConfiguration\":{\"updatedDateField\":\"LastModifiedDate\",\"updatedDateFormat\":\"yyyy-MM-dd'T'HH:mm:ss.SSSZ\",\"updatedDateTimezone\":\"GMT\",\"createdDateField\":\"CreatedDate\",\"createdDateFormat\":\"yyyy-MM-dd'T'HH:mm:ss.SSSZ\",\"createdDateTimezone\":\"GMT\"},\"validationData\":{\"fields\":[\"FirstName\", \"LastName\",\"Email\"]}}}"
  },
  ...
}

The \"validationData\":{\"fields\":[\"FirstName\", \"LastName\",\"Email\"]} section in the above configuration is used to create a hash for each record, which in turn will facilitate the event filtering described above.

The payload in the above example can also be used with the PATCH /instances/{id} and PUT /instances/{id} APIs.

The first time a given record is received via a change data event, it will be dispatched regardless of whether the configured validation fields have changed or not. This is because there is no hash value with which to compare changes for the record. Every subsequent reception of the same record, identified by the configured "idField" value, will be hashed and compared against the persisted hash value to determine if the record needs to be filtered from the dispatched events or not.

In addition to the "validationData" section, the "idField" is required for event hash validation to be enabled.

The persisted hashed record is unique to each element instance. In other words, if multiple element instances are configured to point to the same vendor service endpoint, polled data will be partitioned and filtered separately for each element instance. 

For event hash validation to function the following data must be stored by Cloud Elements:

  •  the record ID (identified by the "idField")
  • a hash of the configured validation fields
  • the last updated date, if one is configured and available.

Please note that the above fields are encrypted and nothing is searchable other than the object ID itself.

Additionally, to ensure minimum latency from event hash validation, a time to live (TTL) will be implemented for persisted records in the future. The TTL value is to be determined.

Summary

Event hash validation is a great feature to reduce the event dispatch traffic, which in turn can result in reduced lag or latency of the downstream event processing.