How to get object change notifications on Google Cloud Storage using Pub/Sub topic messages
We were previously meeting business requirements with a front-end application to upload files and process them through microservices. But the microservices had certain limitations such as file size constraint, number of files that could be uploaded, latency challenges, proper monitoring and tracking, and a major risk of losing the requested files vs. processed/unprocessed files.
The following challenges were addressed through Google Cloud Storage (GCS), integrated with Pub/Sub notifications:
Size limitation is not much of a concern in GCS as per business requirements.
Every file upload will send complete metadata as a notification message to Pub/Sub topic.
Successfully processed files are deleted and moved to different buckets of GCS, while the failed ones are picked up for reprocessing. Hence, the tracking, maintenance, and risk of skipping/missing files gets reduced.
Google Cloud Storage
GCS allows dealing with a huge amount of data for both storage as well as accessing/retrieval. GCS can be used for plenty of use cases that include backups and recovery, disaster management, the repository for analytics and machine learning, etc. Data stored in GCS is referred to as an Object. These objects are stored in containers called Buckets.
Pub/Sub is a Publish-Subscribe messaging service that is asynchronous in nature. It is responsible for decoupling all the services that produce events from the ones that process events. Pub/Sub’s major benefits include:
Consistency in performance.
Message storage durability.
High availability in real-time delivery of messages.
Common Pub/Sub Use-Cases
1. Implementing Asynchronous Workflows
Example: You can create an application that is used to place and process orders. This application will use a Pub/Sub topic, where the order would be placed and can be picked up by multiple workers for its processing.
2. Distributing Event Notifications
Example: You can create an application that has a module that handles new user registrations. This module will fetch the data from user signups and send notifications whenever a new registration occurs. The other module of the application will subscribe to receive the notifications sent by the first module.
3. Refreshing a Distributed Cache
Example: You can create an application that will be responsible for publishing invalidation events to update the primary keys of objects that have been altered.
4. Reliability Improvement
Example: You can create a service in one region/zone and operate it from multiple regions by subscribing to a topic common to all regions. This can be used for failure recoveries in specific regions.
Storage Events — Pub/Sub Notifications
If you want to track the GCS events/activities, pub/sub notifications are your best bet.
Pub/Sub notifications send information about changes to objects in the storage buckets of GCS to Pub/Sub subscribed topics.
This information is stored in Pub/Subtopics in the form of messages that can be consumed/acknowledged by any application for further usage.
Prerequisites to setup Pub/Sub Notifications
1. Enable Pub/Sub API for the project
2. Have sufficient permissions on the bucket
Owner role required
If not the owner, you should have the storage.buckets.updatepermission
3. Have sufficient permissions on Pub/Sub topics
Owner role required
If not the owner, you should have the pubsub.topics.create or pubsub.topics.setIamPolicy permissions
Steps to Enable Notifications
1. Create a bucket using the console or the command given below
gsutil mb gs://BUCKET_NAME
2. Create a topic using the console or the command given below
gcloud pubsub topics create TOPIC_NAME
3. Create a subscription for a topic where the notification messages will be subscribed
gcloud pubsub subscriptions create SUBSCRIPTION_NAME
4. The final step to creating notifications
gsutil notification create -f json -t gs://<TOPIC_NAME> gs://<BUCKET_NAME>
-f specifies the payload format, value “JSON” specifies the payload matching JSON API format
-t specifies the topic name where notifications will be sent
The above command will send notifications for all the events, such as object creation, deletion, update, etc.
In order to enable notifications for specific events, you can use -e EVENT_NAME
The available event types are:
1. OBJECT_FINALIZE — Triggers event on object creation in GCS
2. OBJECT_METADATA_UPDATE — Triggers event on object metadata changes
3. OBJECT_DELETE — Triggers events when the object is deleted permanently
4. OBJECT_ARCHIVE — Triggers events when the object is temporarily deleted or when the object state is inactive
How can these notifications be used?
1. Send out emails in the event of the occurrence of changes in storage buckets
2. Use the metadata from pub/sub-messages in the application for processing the uploaded files, folders, etc.
Let’s see the following use case to understand the use of pub/sub notifications in detail:
Consider that, in your application, you need to create a scheduler where you need to fetch every file uploaded in the GCS, extract required information, proceed with some calculations, and return the results.
The flow is explained below:
1.1. User uploads a file
2.1. Notification OBJECT_FINALIZE event gets triggered, and a message is published to the topic
3. This message will be acknowledged by our application (through scheduler) hosted in the App Engine
4. After the file is processed, delete it from the bucket
5. Store the calculated results in the Google Datastore
Pub/sub notification for GCS events
Above mentioned use-cases will give you a brief idea about where Pub/Sub notifications can be applied in your application. It requires minimal setup and is very simple to implement. You just need to know how to read and extract the required information from the notification message and you are good to go. Now that you have your ingredients, and recipe ready - Get cooking!
For all your cloud based services requirements, reach out to us at Clairvoyant.