GCP Data Engineer : Cloud Storage

igreendataadmin

Storage Services in GCP

This blog will focus on Cloud Storage service in GCP.

Object Storage ( Buckets ) : To store unstructured data and for archival use-cases.

Instance Storage ( Persistent Disks ) : To work with VMs, Kubernetes Clusters

SQL ( Cloud SQL, Cloud Spanner ) : For Relational DB use-case and for transaction support

NoSQL ( BigTable, DataStore ) : For storing non-relational data

Analytic ( Cloud BigQuery ) : For Data warehousing

Cloud Storage

Cloud Storage helps in storing unstructured data , by having files or images been stored into buckets in GCS.

Login to cloud console : https://console.cloud.google.com/storage and move to Storage service section.

While creating a bucket, below are the storage class type options provided

Once bucket got created, user can login to the console to upload files or folders to move into the bucket.

Loading Data into Storage

For less Volume of data, we can use one of the following

gsutil : command line utility to copy files into bucket, to create bucket and to move files. Activate Cloud Shell from Cloud Console ( appear on the right corner with image “>”

Cloud Console

API

For Bulk and large volume of data, we can use

Cloud Storage Transfer : Moving data from On Premises and other cloud services, to move data between cloud storage buckets, transfer more than 1Tb from on-premises. Supports one-time and recurring transfers.

Transfer Appliance : High Capacity storage server, to move large volumes of data (100 Tb)

Security

Data stored in GCP is encrypted by default. While creating bucket, under advanced settings, you can chose the option to encryption as below.

Data uploaded into GCP is divided into multiple data chunks, each been encrypted with its own key ( Data Encryption Key : DEK ) and data encryption keys are been encrypted using Key Encryption Key ( KEK ) which gets stored in KMS ( Key Management Service )

ACLs

Uniform Bucket Level Access : Recommended, uses IAM (Identity and Access Management ), permission at Bucket Level

Fine-grained Access : Uses ACLs, permission apply at both Bucket and Object Level

Signed URLs

Time Limit read or write access to an object ( short-term access )

Allow access to those without IAM authorizations. Generate signed url comprise of two steps

Creating Service Account Key with relevant permission

Using gsutil command to generate the url passing the service account key as json and the time for url to be accessed using param : “-d”

Signed Policy Documents

Specify what can be loaded to a bucket with a form post

Allow greater control over size,content type and other upload characteristics than signed urls.

Lifecycle Policy Management

It consist of Actions and Conditions. What is the action to be performed on the bucket when the condition met

Action can be moving object from one storage class to another one

Actions executed when condition applies eg : Change Storage class based on age, Delete object based on date, Purge versions etc

Happy Learning !

Bharathy Poovalingam