-
Notifications
You must be signed in to change notification settings - Fork 8
Architecture
A Java Web application runs on the backend to collect and store all tracking events.
- Tracking of web events from the Javascript snippet is done by loading a tracking GIF image. The backend application receives the tracking events from the Javascript snippet, serves the GIF image and enqueues the tracking event.
- Additionally, a JSON HTTP API is provided, for push of events from third party API clients.
Processing of tracking events is done in an asynchronous queue, which mitigates the possible effects of slowdowns on the processing and storage threads. Such slowdowns remain invisible from the final clients, ensuring very low latency.
Storage is without delivery guarantee: if the storage backend cannot keep up with the incoming flow (for example if the storage backend is slow or down), the queue will fill up. Once it reaches a configurable threshold, events start being dropped to avoid a memory overrun.
Several implementations of the application itself are provided, as well as several implementations of the backend storage processor.
The standalone implementation is a Java Web application, which can be run on any Java6-compatible Servlet container (like Tomcat).
Multiple instances of the backend server can run in parallel. There is no "built-in" affinity between backend servers and visitors. Therefore, tracking all interaction of a given visitor requires examining all files produced by all backend servers.
The infrastructure to ensure load balancing and failover is out of scope of the WT1 web tracker. For instance, the following infrastructures could be deployed:
- redundant hardware load balancers
- a corosync/pacemaker/HA-Proxy cluster
- a Cloud provider's load balancing solution (like AWS Elastic LoadBalancer or Rackspace Cloud Load Balancer)
The backend servers do not provide transactionality guarantees on each logging event. In case of normal shutdown of the servlet container, the queued events are written to disk. In case of abnormal shutdown of the servlet container, the last file can be lost. The maximum amount of lost events depends on the size and time interval that trigger the creation of a new file.
The GAE implementation runs a Java Web application on the Google App Engine PaaS platform.For more information, please see https://developers.google.com/appengine/docs/whatisgoogleappengine
As WT1 uses background threads and asynchronous processing, "Backend" types of GAE servers are used. For more information, please see https://developers.google.com/appengine/docs/java/backends/
Multiple instances of the backend server can run in parallel. There is no affinity. between backend servers and visitors. Therefore, tracking all interaction of a given visitor requires examining all files produced by all backend servers.
Load balancing of tracking events between backend servers, and failover of dead backend servers is done by the GAE infrastructure.
The backend servers do not provide transactionality guarantees on each logging event. In case of normal shutdown of a GAE backend server, the queued events are written to disk. In case of abnormal shutdown of a GAE backend server, the last file can be lost. The maximum amount of lost events depends on the size and time interval that trigger the creation of a new file.
The GAE implementation provides a simple deployment, and handles the whole high availability setup. It is a good fit if your application already uses Google App Engine.
However, for high load handling, we strongly recommend the standalone Java application implementation.