How to track people on the internet

Quick tutorial showing how you can utilize cookies to track people

The most common way to track a user across different websites is to use tracking cookies. In this short tutorial I will show you how you can build a tracker of your own. Our application will be written in Python in the Flask framework.

1. Quick overview
Our algorithm will serve a 1x1 transparent pixel and will set cookies for the newly arrived users. Many browsers allow sending cross origin HTTP GET <img> requests with cookies and allow responses to set new cookies. This is forbidden for many other types of requests and that is why we use this trick with an .

2. Implementation
Our code will be quite simple. We will split it to three parts:

  • Generating random cookie for a user
  • Processing the tracking pixel on the server
  • Issuing the pixel request from the browser

For random cookie generation we can simply use the hashlib library within Python:

                                
import hashlib


RANDOM_BYTES_LEN = 64
ID_LENGTH = 32
COOKIE_VERSION = '1'


def generate_random_hash():
    return hashlib.sha256(os.urandom(RANDOM_BYTES_LEN)).hexdigest()[:ID_LENGTH]
                                
                            

Next thing we need to write a function to handle the tracking pixel requests:

                                
@tracking_service_bp.route('/pixel/<timestamp>', methods=['GET'])
def tracking_pixel(timestamp):  # Timestamp is used only to avoid caching of the pixel in the browser
    pixel_gif = b'R0lGODlhAQABAJAAAP8AAAAAACH5BAUQAAAALAAAAAABAAEAAAICBAEAOw=='  # 1x1 transparent gif
    response = make_response(send_file(io.BytesIO(pixel_gif), mimetype='image/gif'))
    try:
        cookies = request.cookies or {}
        referer = request.headers.get('Referer')  # from where did the user come from
        domain = urlparse(referer).netloc  # do whatever you want with the domain
        if 'secret_cookie' not in cookies:
            user_id = generate_random_hash()
            cookies['secret_cookie'] = user_id
            response.set_cookie('secret_cookie', user_id)
            response.set_cookie('cookie_version', COOKIE_VERSION)
        elif 'cookie_version' not in cookies or cookies['cookie_version'] != COOKIE_VERSION:
            response.set_cookie('cookie_version', COOKIE_VERSION)
        # we can keep the information from which website user has visited us and store it somewhere in a dict or db
    finally:
        return response
                                
                            

As you can see the code to serve the tracking pixel is quite simple. We additionally store the version in case we want to do some upgrades or see which websites have not updated our client logic yet.

And now to the hardest part. Issuing a request from a browser with a good ad blocker has proven to be quite difficult, but there are some ways to surpass that. In this short tutorial I will not show them, but if you are interested in that please read more about unblocking projects and how this can be done. In the few lines of code below I will present to you how you can issue a request on every visit:

                                
var tracking_img = new Image();
tracking_img.onload = function() {document.getElementsByTagName('body')[0].appendChild(tracking_img); };
tracking_img.src = "http://PIXEL_HOST_AND_PORT/pixel/" + Date.now();
                                
                            

In the code above we create a new image, ensure that it gets rendered and set the source url of the image. Setting the source will make the browser send a get request for that image and we will be able to process our user's visit on our server.
As you can see it all looks quite simple, but the difficulty starts when we want to ensure that your request passes through an ad blocker and that you have the capability to process many requests at the same time.
For that I would recommend writing the pixel server in a language like Go and sending the users' visits to a queue like kafka for later processing with Hadoop or Spark.