GCP Data Engineer : Cloud Storage

Storage Services in GCP

This blog will focus on Cloud Storage service in GCP.

Object Storage ( Buckets ) : To store unstructured data and for archival use-cases.

Instance Storage ( Persistent Disks ) : To work with VMs, Kubernetes Clusters

SQL ( Cloud SQL, Cloud Spanner ) : For Relational DB use-case and for transaction support

NoSQL ( BigTable, DataStore ) : For storing non-relational data

Analytic ( Cloud BigQuery ) : For Data warehousing

Cloud Storage

Cloud Storage helps in storing unstructured data , by having files or images been stored into buckets in GCS.

Login to cloud console : https://console.cloud.google.com/storage and move to Storage service section.

While creating a bucket, below are the storage class type options provided

Once bucket got created, user can login to the console to upload files or folders to move into the bucket.

Loading Data into Storage

For less Volume of data, we can use one of the following

gsutil : command line utility to copy files into bucket, to create bucket and to move files. Activate Cloud Shell from Cloud Console ( appear on the right corner with image “>”

Cloud Console

API

For Bulk and large volume of data, we can use

Cloud Storage Transfer : Moving data from On Premises and other cloud services, to move data between cloud storage buckets, transfer more than 1Tb from on-premises. Supports one-time and recurring transfers.

Transfer Appliance : High Capacity storage server, to move large volumes of data (100 Tb)

Security

Data stored in GCP is encrypted by default. While creating bucket, under advanced settings, you can chose the option to encryption as below.

Data uploaded into GCP is divided into multiple data chunks, each been encrypted with its own key ( Data Encryption Key : DEK ) and data encryption keys are been encrypted using Key Encryption Key ( KEK ) which gets stored in KMS ( Key Management Service )

ACLs

Uniform Bucket Level Access : Recommended, uses IAM (Identity and Access Management ), permission at Bucket Level

Fine-grained Access : Uses ACLs, permission apply at both Bucket and Object Level

Signed URLs

Time Limit read or write access to an object ( short-term access )

Allow access to those without IAM authorizations. Generate signed url comprise of two steps

Creating Service Account Key with relevant permission

Using gsutil command to generate the url passing the service account key as json and the time for url to be accessed using param : “-d”

Signed Policy Documents

Specify what can be loaded to a bucket with a form post

Allow greater control over size,content type and other upload characteristics than signed urls.

Lifecycle Policy Management

It consist of Actions and Conditions. What is the action to be performed on the bucket when the condition met

Action can be moving object from one storage class to another one

Actions executed when condition applies eg : Change Storage class based on age, Delete object based on date, Purge versions etc

Happy Learning !

Bharathy Poovalingam

GCP Learning Series_ Cloud Functions

Introduction

Cloud Functions — FaaS ( Function as a Service ) / serverless compute service provided by GCP to handle events generated by GCP resources. It is fully managed service by GCP as shown below.

Developers don’t have to worry about administrative infrastructure

  • runtime environments
  • Security Updates
  • OS Patches
  • Scaling
  • Capacity

Developers can focus on writing softwares

Events, Triggers and Functions

  • Events — particular action happen in GCP ( file upload/archive, message published to pubsub)
  • Triggers ( Responding to an event )
  • Functions ( Functions associated with trigger to handle event )

Supported Events

Cloud Functions support even based driven architecture for the events emitted in GCP. For the event, we can wire or tag a CF to trigger a API or an action as shown in below diagram. It does not support for all events in GCP. It supports below GCP resource events.

HTTP Events

Cloud Storage

Cloud Pub/Sub

Firebase

Cloud Firestore

The above resource events can be tagged into two categories

  • Background Functions ( Cloud Storage / Cloud Pub/Sub )
  • HTTP Functions ( Web hook / HTTP Requests )

Language Support

Cloud Functions support below languages

Node JS

Python

Go

Java 11

It does not provide support for containers yet. i.e docker images containerized can not be deployed into Cloud Functions.

Deployment of Cloud Functions

Login to Google Cloud Console and click on hamburger menu and choose Cloud Function to create /deploy cloud function

  • Provide Function name
  • Trigger
  • Event Type
  • Resource
  • Runtime

CF can be deployed using gcloud command also

To deploy Cloud Function to handle event when user uploads file into Google Cloud Storage, it can be deployed by providing resource name ( bucket name ) and resource event ( on file upload completion )

gcloud functions deploy cloud_function_test_gcs — runtime python37 — trigger-resource gcs-bucket-test — trigger-event google.storage.object.finalize

Similarly if we need to deploy cloud function to handle pubsub event, it can be defined as below

gcloud functions deploy cloud_function_test_pubsub — runtime python37 — trigger-resource test-topic

For cloud pubsub, there is only one event ( i.e message published ) on the topic, here it is test-topic.

Limitations

Cloud functions will timeout after one minute, although timeout can be configured to extend up to 9 minutes.

Hence cloud functions are suitable for handling events and should not be applicable for time consuming process.

#GCP #LearningContinues #Serverless

Thank You

Bharathy Poovalingam

GCP Data Engineer : Cloud SQL/Cloud Spanner

Storage Services in GCP

This blog will focus on Cloud Storage service in GCP.

Object Storage ( Buckets ) : To store unstructured data and for archival use-cases.

Instance Storage ( Persistent Disks ) : To work with VMs, Kubernetes Clusters

SQL ( Cloud SQL, Cloud Spanner ) : For Relational DB use-case and for transaction support

NoSQL ( BigTable, DataStore ) : For storing non-relational data

Analytic ( Cloud BigQuery ) : For Data warehousing

Cloud Storage

Cloud Storage helps in storing unstructured data , by having files or images been stored into buckets in GCS.

Login to cloud console : https://console.cloud.google.com/storage and move to Storage service section.

While creating a bucket, below are the storage class type options provided

Once bucket got created, user can login to the console to upload files or folders to move into the bucket.

Loading Data into Storage

For less Volume of data, we can use one of the following

gsutil : command line utility to copy files into bucket, to create bucket and to move files. Activate Cloud Shell from Cloud Console ( appear on the right corner with image “>”

Cloud Console

API

For Bulk and large volume of data, we can use

Cloud Storage Transfer : Moving data from On Premises and other cloud services, to move data between cloud storage buckets, transfer more than 1Tb from on-premises. Supports one-time and recurring transfers.

Transfer Appliance : High Capacity storage server, to move large volumes of data (100 Tb)

Security

Data stored in GCP is encrypted by default. While creating bucket, under advanced settings, you can chose the option to encryption as below.

Data uploaded into GCP is divided into multiple data chunks, each been encrypted with its own key ( Data Encryption Key : DEK ) and data encryption keys are been encrypted using Key Encryption Key ( KEK ) which gets stored in KMS ( Key Management Service )

ACLs

Uniform Bucket Level Access : Recommended, uses IAM (Identity and Access Management ), permission at Bucket Level

Fine-grained Access : Uses ACLs, permission apply at both Bucket and Object Level

Signed URLs

Time Limit read or write access to an object ( short-term access )

Allow access to those without IAM authorizations. Generate signed url comprise of two steps

Creating Service Account Key with relevant permission

Using gsutil command to generate the url passing the service account key as json and the time for url to be accessed using param : “-d”

Signed Policy Documents

Specify what can be loaded to a bucket with a form post

Allow greater control over size,content type and other upload characteristics than signed urls.

Lifecycle Policy Management

It consist of Actions and Conditions. What is the action to be performed on the bucket when the condition met

Action can be moving object from one storage class to another one

Actions executed when condition applies eg : Change Storage class based on age, Delete object based on date, Purge versions etc

Happy Learning !

Bharathy Poovalingam

#GCP #Learning #Data Engineer #CloudStorage

GCP Learning Series _ App Engine Part 3

Overview

This blog will show how to deploy image from Container Registry to Google App Engine

Configuration

To deploy an app into App Engine, need to have app.yaml file configured as below

Then run command :

gcloud app deploy — image-url=gcr.io/gcpnikki/appengine/demo:latest

After successful deployment, we can navigate to the AppEngine dashboard ( Instances/Services ) to review the changes

gcloud app browse

Using gcloud app browse command to view the application been deployed and specify the endpoint been exposed, can able to hit the endpoints

Happy Learning !

Bharathy Poovalingam

#GCP #Learning #Serverless #AppEngine

GCP Learning Series _ App Engine Part 2

Overview

This blog will show how to setup Cloud SDK in your local machine, configuring docker , create docker image and push it to Container Registry

Cloud SDK Installation

To install Cloud SDK, Python3 is pre-requisite.

Please install Python, using brew

brew install python3

Then download the tar file : google-cloud-sdk-350.0.0-darwin-x86_64.tar

Unzip it and then run below command to do Cloud SDK installation

Cloud Authentication

Need to run below command to do authentication

gcloud auth login

the above command, will open up the gcloud console and need to give your consent there to access gcloud console

Building a sample application and push it to Container Registry

https://github.com/bhanikki28/GCPSamples/tree/master/AppEngineDemo

To build and dockerize the application, please refer to : https://medium.com/@bharathy.poovalingam/spring-boot-with-docker-d4129a353f87

Note : here , need to tag the image, that match to gcr registry as shown below

Syntax : gcr.io/PROJ_ID/folder/image_name

Authorizing Docker

Need to run below command to authorize docker to push image to Google Cloud Container Registry

gcloud auth configure-docker

Pushing the image to Container Registry

Login to gcloud console and navigate to Container Registry section, to view the image been pushed there.

to be continued… ( Deploying the image to Google App Engine )

Happy Learning !

Bharathy Poovalingam

#GCP #Learning #Serverless #AppEngine

GCP Learning Series _App Engine Part 1

Overview

This App Engine series comprise of three parts

  1. Overview of App Engine (Part 1 )
  2. Cloud SDK Setup , Configuring and Authorizing Docker to push image to Google Container Registry ( gcr ) ( Part 2 )
  3. Deploying the image from gcr into App Engine Flexible environment (Part 3 )

App Engine is a Platform-as-a-Service. It means that you simply deploy your code, and the platform does everything else for you. For example, if your app becomes very successful, App Engine will automatically create more instances to handle the increased volume. ( Scaling support )

Note : It’s a PaaS (service) like Elastic Beanstalk in AWS Cloud

Compute Choices

Compute Engine ( IaaS ) : To create VMs and have more control ( Lift and Swift )

GKE ( CaaS ) : To deploy images into container and have control in managing the cluster ( Container Clusters )

App Engine ( PaaS ) : Managed Service by GCP

Cloud Functions ( FaaS) : Serverless , Event Driven Architecture

Managed Service (Serverless)
Build and Deploy Apps quickly, built-in support for load balancing and autoscaling
Pay As You Go
Focus on Code, as App Engine is a managed service from GCP, which will take care of provisioning Infra Structure
Choose between Standard and Flexible environment
Traffic Splitting/Versioning

App Engine Variance

App Engine comes in two flavours. One is Standard Environment and another one is Flexible Environment

App Engine Standard Environment

Google App Engine (Standard) is like a read-only sandboxed folder where you upload code to execute from and don’t worry about the rest (yes: read-only — there are a fixed set of libraries installed for you and you cannot deploy 3rd party libraries at will). DNS / Sub-domains etc is so much easier to map.

Pros:

Instance can be startup in milliseconds.

It provides Manual, basic and automatic scaling.

Cons:

Lacks support for 3rd Party Library

Minimal Language Support

Supported Runtimes

App Engine Flexible Environment

Google App Engine (Flexible) is in fact like a whole file-system (not just a locked down folder), where you have more power than the Standard engine, e.g. you have read/write permissions, (but less compared to a Compute Engine). In GAE standard you have a fixed set of libraries installed for you and you cannot deploy 3rd party libraries at will. In the Flexible environment you can install whatever library your app depends on, including custom build environments (such as Python 3).

Pros:

Customizable Stack

Support for 3rd library

Cons:

Can take minutes to start

Need to have one instance running all time, can not be bring down to zero. It might cost more compared to Standard environment

Supported Runtimes

Note : Both Standard and Flexible environment runs on Container based environment, while Standard container is maintained by Google Proprietary and Flexible container relies on Docker one.

GCP Console

Login to GCP Console, click on Hamburger Menu and navigate to Serverless section and click on App Engine. You will be redirected to AppEngine dashboard.

to be continued … ( Cloud SDK Installation , GCloud Authentication, Container Registry) . Stay Tuned !

Happy Learning !

Bharathy Poovalingam

#GCP #Serverless #AppEngine #Learning

Related Course:

Architecting Scalable Web Applications using Google App Engine ( Janani Ravi / Pluralsight )