Category Archives: GCP Storage

Build GCP Cloud Storage Client

The post demonstrates how to build a Google Cloud Storage client application in Java. This command line client app shows the basic logic to interact with Google Cloud Storage service and eliminates unnecessary clutter.

The application interacts with Google Cloud Storage via its JSON API using the related Google Java client library.

For more information, go to Google API Client Libraries then click on the Java link. In the menu bar click APIS, then enter Ctrl-F and search for storage, You will get this:

Click on the version link (v1, in the example). This will take you to the Cloud Storage JSON API Client Library for Java. Note that at the bottom of the page in the section “Add Library to Your Project”, there are several tabs. If you click the Maven tab, you get the dependency in JSON format to add to the pom.xml file in your project. This is an example:

  <dependency>
      <groupId>com.google.apis</groupId>
      <artifactId>google-api-services-storage</artifactId>
      <version>v1-rev111-1.22.0</version>
  </dependency>

See also Putting to REST the Cloud Service APIs.

The app uses a simple UI which allows the user to perform tasks such as: list the buckets in a project, list objects in a bucket, create a bucket, create an object and so on.

You can download the code at: gcp-storage-client.  See also Import a Maven Project. Please, refer to README file for the latest example code information.

For background information, see GCP Cloud Storage Background.

Application Architecture

This section describes the  application components and delegates the details to the actual code implementation. The following is the app architecture:

  1. Main.  This class is the application entry point. It performs the following tasks:
    • Gets the authenticated client object authorized to access the Google Cloud Storage service.
    • Reads the default settings.
    • Instantiates the operations classes.
    • Delegates to the SimpleUI class the display of the selection menu and the processing of the user’s input.
  2. User Interface
    • UserInterface. Abstract class that defines the variables and methods required to implement the SimpleUI class.
    • SimpleUI. It extends the UserInterface class and performs the following tasks:
    • Displays a selection menu for the user.
    • Processes the user’s input and calls the proper method based on the user’s selection.
    • Each method calls the related Google Cloud Storage JSON API.
  3. Core Classes
    • ProjectOperations. Contains methods to perform Google Cloud Storage project operations.
    • BucketOperations.  Contains methods to perform Google Cloud Storage bucket operations.
    • ObjectsOperations. Contains methods to perform Google Cloud Storage object operations.
      • ObjectLoaderUtility. Performs object upload. This class is just a container. The actual work is done by the contained classes:
        • RandomDataBlockinputStream. Generates a random data block and repeats it to provide the stream for resumable object upload
        • CustomUploadProgressListener. Implements a progress listener to be invoked during object upload.
  4. Authentication.
    • GoogleServiceClientAuthentication. This is an abstract class which obtains the credentials for the client application to allow the use of the requested Google service REST API.
    • IGoogleServiceClientAuthentication.  Defines variables and methods to authenticate clients so they can use the selected Google service REST APIs.
    • AuthenticateServiceClient. Creates an authenticated client object that is authorized to access the selected Google service API.

For more information, see Create Google Service Authentication App.

  1. Utilities.
    • IUtility.  Defines fields and methods to implement the Utility class.
    • Utility.  Defines utility methods and variables to support the application operations such as menu creation, regions list initialization and so on.
    • ServiceDefaultSettings.  Reads the service client default settings from the related JSON file. The file contains information such as project ID, default e-mail and so on.

Application Workflow

The following figure shows the application time sequence (or workflow):


The first time the user starts the application, the Main class performs the following tasks:

  • Reads the default settings.
  • Creates authenticated storage service client.
  • Initializes the operation classes.
  • Initializes the SimpleUI class.
  • Starts the loop to process user inputs.

The SimpleUI class loops to process the user’s commands until she terminates the loop. At that point, the application terminates.

Application Implementation

Enable Google Cloud Storage API

To build the application, you will use Eclipse. Before you can do that, assure that you have enabled the service API as described next.

  1. Follow the steps described in Enable Google Service API.
  2. Download the client credentials information in a file (for example, client_secrets.json). Follow the steps described in Create OAuth Client Credentials.

Create the Application Project

  1. In Eclipse, create a Maven project.  For more information, see Create a Maven Project.
  2. Add reference to the authentication app JAR file created in  Build GCP Service Client Authentication. Alternatively, and a for quickest results, import the downloaded project. For more information, see Import a Maven Project 

Modify the pom.xml File

A key step in creating the application project is to configure the pom.xml file correctly to define the dependencies required to implement the client application. For more information see Define Dependencies in pom.xml.
That’s it. Happy googling with Google Cloud Storage.

GCP Cloud Storage Background

Google Cloud Storage (GCS) is an Infrastructure As Service (IasS) for storing and accessing
customers data. The service combines the performance and scalability of Google’s cloud
with advanced security and sharing capabilities.
GCS provides a simple programming interface through standard HTTP methods PUT, GET,
POST, HEAD, and DELETE to store, share and manage data in the cloud. In this way, you
don’t have to rely on complicated SOAP toolkits or RPC programming.

Google Cloud Storage Architecture

Let’s analyze the GCS main architectural components, as shown in the next figure, to gain
an understanding of GCS inner working and capabilities.

To use Google Cloud Storage effectively you need to understand some of the concepts on
which it is built. These concepts define how your data is stored in Google Cloud Storage.

Projects

All data in Google Cloud Storage belongs inside a project. A project consists of a set
of users, a set of APIs, and billing, authentication, and monitoring settings for those
APIs. You can have one project or multiple projects.

Buckets

Buckets are the basic containers that hold your data. Everything that you store in
Google Cloud Storage must be contained in a bucket. You can use buckets to
organize your data and control access to your data, but unlike directories and
folders, you cannot nest buckets.

  • Bucket names. Bucket names must across the entire Google Cloud Storage and have more restrictions than object names because every bucket resides in a single Google Cloud Storage namespace. Also, bucket names can be used with a CNAME redirect, which means they need to conform to DNS naming conventions. For more information, see Bucket and Object Naming Guidelines .

Objects

Objects are the individual pieces of data that you store in Google Cloud Storage. Objects have two components: object data and object metadata . The object data component is usually a file that you want to store in Google Cloud Storage. The object metadata component is a collection of name-value pairs that describe various object qualities.

  • Object names. An object name is just metadata to Google Cloud Storage. The following are the main properties:
    • An object name can contain any combination of Unicode characters (UTF-8 encoded) less than 1024 bytes in length.
    • An object name must be unique within a given bucket.
    • A common character to include in file names is a slash (/). By using slashes in an object name, you can make objects appear as though they’re stored in a hierarchical structure. For example, you could name an object /europe/france/paris.jpg and another object /europe/france/cannes.jpg. When you list these objects they appear to be in a hierarchical directory structure based on location; however, Google Cloud Storage sees the objects as independent objects with no hierarchical relationship whatsoever.
  • Object Immutability. Objects are immutable, which means that an uploaded object cannot change throughout its storage lifetime. An object’s storage lifetime is the time between successful object creation (upload) and successful object deletion. In practice, this means that you cannot make incremental changes to objects, such as append operations or truncate operations. However, it is possible to overwrite objects that are stored in Google Cloud Storage because an overwrite operation is in effect a delete object operation followed immediately by an upload object operation. So a single overwrite operation simply marks the end of one immutable object’s lifetime and the beginning of a new immutable object’s lifetime.
  • Data opacity.  An object’s data component is completely opaque to Google Cloud Storage. It is just a chunk of data to Google Cloud Storage.
  • Hierarchy. Google Cloud Storage uses a flat hierarchical structure to store buckets and objects. All buckets reside in a single flat hierarchy (you can’t put buckets inside buckets),
    and all objects reside in a single flat hierarchy within a given bucket.
  • Namespace.  There is only one Google Cloud Storage namespace, which means:
    • Every bucket must have a unique name across the entire Google Cloud
      Storage namespace.
    • Object names must be unique only within a given bucket.

Google Cloud Storage Characteristics

When you store your data on Google Cloud Storage, the service does all the background work to make data operations fast so you can focus on your application. The following are the main reasons:

  • GCS is built on Google’s proprietary network and datacenter technology.  Google spent several years building proprietary infrastructure and technology to power Google’s sites (after all, fast is better than slow). When you use GCS, the same network goes to work for your data.

  • GCS replicates data to multiple data centers and serves end-user’s requests from the nearest data center that holds a copy of the data. You have a choice of regions (currently U.S. and Europe) to allow you to keep your data close to where it is most needed. Data is also replicated to different disaster zones to ensure high availability.

  • GCS takes the replication one step further. When you upload an object and mark it as cacheable (by setting the standard HTTP Cache-Control header), GCS automatically figures out how best to serve it using Google’s broad network infrastructure, including caching it closer to the end-user if possible.

  • Last but not least, you don’t have to worry about optimizing your storage layout (like you would on a physical disk), or the lookups (i.e. directory and naming structure) like you would on most file systems and some other storage services. GCS takes care of all the “file system” optimizations behind the scenes.

Performance Considerations

When you select a service, one of the most important things to consider is its performance. The performance of a cloud storage service (or any cloud service for that matter) depends on two main factors:

  • The network that moves the data between the service and the end user.

  • The performance of the storage service itself.

Network

A key performance factor is the network path between the user’s location and the cloud service provider’s data centers. This path is critical because if the network is slow or unreliable, it doesn’t really matter how fast the service  is. These are two main ways to make the network faster:

  • Serve the request from a center as close  as possible to the user’s location.

  • Optimize the network routing between the user’s location and the data center.

Storage

The other performance factor is how quickly the data center processes a user’s request. This mainly implies the following:

  • Data must be managed optimally.

  • The request must be processed as fast as possible.

In a way, a cloud storage service is similar to a big distributed file system that performs the following tasks as efficiently as possible:

  • Checks authorization.

  • Locks the object (data)  to access.

  • Reads the requested data from the physical storage medium.

  • Transfers data to the user.

For an example of an application using GCS, see Build GCP Cloud Storage Client.

Build GCP Storage Client (Python)

This post demonstrates how to build a Google Cloud Storage JSON API Python application. The application interacts with the storage service via the JSON API (and the Python client library). It uses a simple interface which allows you to perform tasks such as: list the buckets in a project, list objects in a bucket, create a bucket, create an object and so on. The intent of the application is educational. It should help you to understand the API syntax (and semantics) when interacting with Google Cloud Storage..

JSON API Python Client Library

JSON API Python Client Library

To build the application, you will use Eclipse as shown next. Before you can perform the shown steps, assure that you have satisfied the requirements described here:

  1. Google Cloud Storage Python Applications Prerequisites.
  2. Getting Started with JSON API Client Library in Python

Create the Application Project

  1. Activate Eclipse.
  2. From the File menu, select New->PyDevProject. The PyDev Project window is displayed.
  3. In the Project name: enter PythonJsonApi.
  4. Accept the user default workspace; for example: /Users/[your username]/[your work directory]/PythonJsonApi. This is the directory where the program will be stored.
  5. If you have more than one Python version installed perform the following steps:
    • In the Grammar Version selection list select 2.7.
    • In the Interpreter selection list select the desired Python interpreter. For more information, see Configure PyDev.
    PythonJsonApi Project

    PythonJsonApi Project

  6. Click Finish. The PythonJsonApi project is created and displayed in the Package Explorer.
  7. If the PyDev perspective is not selected yet, click on the Open Perspective icon in the upper right corner of Eclipse. From the pop-up menu select PyDev and click OK.
  8. In the Package Explorer, expand the PythonJsonApi project node.
  9. Right-click the src folder.
  10. Select New->PyDev Module.
  11. In the pop-up dialog window enter the following information:
    • In the Package box enter gcs_examples.
    • In the Name box enter PythonJsonApiMain.
  12. Click Finish.
  13. Select Module Main and  click OK.
    An empty PythonJsonApiMain.py is created. Actually the module contains the standard Python check for main because the template chosen. You’ll enter the code needed to interact with Google Cloud Storage in the next sections.

Download the Application Code

You can download the archived project from this location: storage-xmlapi-python-client. Then import it into Eclipse. The project contains the following modules:

  1. main.py. This is the main entry point of the application which allows you to
    interact with Google Cloud Storage (GCS) using XML API. It contains the main function which is called when the application is executed.
  2. simple_ui.py.  Contains the GCS_SimpleUI class which provides a simple UI to interact with Google Cloud Storage.
  3. commands.py.  Contains the GCS_Command class which provides the entry point for
    processing user’s selection. Based on the user’s selection, the related Google Cloud Storage
    operation is executed.
  4. authentication.py.  It contains the class that generates an authenticated HTTP client object.
  5. bucket_commands.py. Contains the GCS_Bucket class which handles Google Cloud Storage bucket operations.
  6. object_commands.py. Contains the GCS_Object class which handles Google Cloud Storage object operations.
  7. config.py.   Contains data shared by all the modules.

PythonXmlApi Implementation

The PythonXmlApi main function performs the preliminary initialization and starts the application. A set of classes then perform the actual tasks such as: creating a simple UI, processing user’s input, interacting with the storage service and so on. The following picture shows the classes and their hierarchy.

PythonXmlApi Workflow

The first time you start the application, the main function instantiates GCS_SimpleUI andGCS_Command classes then initializes a simple user interface which accepts the user’s input. You will be asked to authenticate the application which uses OAuth2.0 and stores the credentials in a local file called stored_credentials.json. Also, you will be asked to enter the project ID which will be stored in a local file called project.dat.

Every time you make a selection a command call is issued which in turn calls a GCS_Bucket orGCS_Object method. This method executes the actual Google Cloud Storage bucket or object request. The storage service’s response is then displayed for your information. If the request fails, an error is displayed. Finally, depending on the debugging level, the application displays what goes on the “wire”. This is to help you understand the actual HTTP request and response content as explained by the Google Cloud Storage documentation Reference Methods. The following picture depicts the workflow just described.

 

Python

 

Code here

Java

 

Code here

Build a Google Cloud Storage XML API – Python Application

This post demonstrates how to build a Google Cloud Storage XML API Python application. The application interacts with the storage service via the XML API (and the httplib2 library).  It uses a simple interface which allows you to perform tasks such as: list the buckets in a project, list objects in a bucket, create a bucket, create an object and so on. The intent of the application is educational. It should help you to understand the API syntax (and semantics) when interacting with Google Cloud Storage.

Application Contextual Environment

Application Contextual Environment

Prerequisites

To build the application, you will use Eclipse as shown next. Before you can perform the next steps, assure that you have satisfied the requirements described here: Google Cloud Storage Python Applications Prerequisites.

Also make sure to satisfy the following :

  1. Install the required software as listed next:
  2. Update the information contained in the client_secrets.json file. Use your client id and secret available in the Google API Console.

Create the Application Project

  1. Activate Eclipse.
  2. From the File menu, select New->PyDevProject. The PyDev Project window is displayed.
  3. In the Project name: enter PythonXmlApi.
  4. Accept the user default workspace; for example: /Users/[your username]/[your work directory]/PythonXmlApi. This is the directory where the program will be stored.
  5. If you have more than one Python version installed perform the following steps:
    • In the Grammar Version selection list select 2.7.
    • In the Interpreter selection list select the desired Python interpreter. For more information, see Configure PyDev.
  6. Click Finish. The PythonXmlApi project is created and displayed in the Package Explorer.
  7. If the PyDev perspective is not selected yet, click on the Open Perspective icon in the upper right corner of Eclipse. From the pop-up menu select PyDev and click OK.
  8. In the Package Explorer, expand the PythonXmlApi project node.
  9. Right-click the src folder.
  10. Select New->PyDev Module.
  11. In the pop-up dialog window enter the following information:
    • In the Package box enter gcs_examples.
    • In the Name box enter main.
  12. Click Finish.
  13. Select Module Main and  click OK.
    An empty main.py is created. Actually the module contains the standard Python check for main because the template chosen. You’ll enter the code needed to interact with Google Cloud Storage in the next section.

Download the Application Code

You can download the archived project from this location: storage-xmlapi-python-client. Then import it into Eclipse. The project contains the following modules:

  1. main.py. This is the main entry point of the application which allows you to
    interact with Google Cloud Storage (GCS) using XML API. It contains the main function which is called when the application is executed.
  2. simple_ui.py.  Contains the GCS_SimpleUI class which provides a simple UI to interact with Google Cloud Storage.
  3. commands.py.  Contains the GCS_Command class which provides the entry point for
    processing user’s selection. Based on the user’s selection, the related Google Cloud Storage
    operation is executed.
  4. authentication.py.  It contains the class that generates an authenticated HTTP client object.
  5. bucket_commands.py. Contains the GCS_Bucket class which handles Google Cloud Storage bucket operations.
  6. object_commands.py. Contains the GCS_Object class which handles Google Cloud Storage object operations.
  7. config.py.   Contains data shared by all the modules.

PythonXmlApi Implementation

The PythonXmlApi main function performs the preliminary initialization and starts the application. A set of classes then perform the actual tasks such as: creating a simple UI, processing user’s input, interacting with the storage service and so on. The following picture shows the classes and their hierarchy.

PythonXmlApi Class Hierarchy

PythonXmlApi Class Hierarchy

PythonXmlApi Workflow

The first time you start the application, the main function instantiates GCS_SimpleUI and GCS_Command classes then initializes a simple user interface which accepts the user’s input. You will be asked to authenticate the application which uses OAuth2.0 and stores the credentials in a local file called stored_credentials.json. Also, you will be asked to enter the project ID which will be stored in a local file called project.dat.

Every time you make a selection a command call is issued which in turn calls a GCS_Bucket or GCS_Object method. This method executes the actual Google Cloud Storage bucket or object request. The storage service’s response is then displayed for your information. If the request fails, an error is displayed. Finally, depending on the debugging level, the application displays what goes on the “wire”. This is to help you understand the actual HTTP request and response content as explained by the Google Cloud Storage documentation Reference Methods. The following picture depicts the workflow just described.

PythonXmlApi Work Flow

PythonXmlApi Work Flow

Usage

In a terminal window activate the program as follows:

python main.py  --logging_level [DEBUG | INFO | WARNING | ERROR | CRITICAL] 

Have Fun!!

Getting Started with JSON API Client Library in Python

The following are the preliminary steps to access Google Cloud Storage by using the JSON API client library in Python.  Basically, you must do the following:

  • Download and install the Python JSON API client library. In a terminal window execute the following command:
    [sudo] pip install --upgrade google-api-python-client
    
  • Enable the use of the JSON API for your Google Cloud Storage project.
  • Set the client authorization information.

See the Quickstart steps described below.

Background

Before an application can use the JSON API, the user must allow access to her Google Cloud Storage private data. Therefore, the following steps must be performed:

  • The application must be authenticated.
  • The user must grant access to the application.
  • The user must be authenticated in order to grant that access.

All of this is accomplished with OAuth 2.0 and libraries written for it.

Important Concepts

  • Scope. JSON API defines one or more scopes that declare a set of operations permitted. When an application requests access to user data, the request must include one or more scopes. The user needs to approve the scope of access the application is requesting.
  • Refresh and Access Tokens.  When a user grants an application access, the OAuth 2.0 authorization server provides the application with refresh and access tokens. These tokens are only valid for the scope requested. The application uses access tokens to authorize API calls. Access tokens expire, but refresh tokens do not. Your application can use a refresh token to acquire a new access token.
    Warning: Keep refresh and access tokens private. If someone obtains your tokens, they could use them to access private user data.
  • Client ID and Client Secret. These strings uniquely identify an application and are used to acquire tokens. They are created for your Google APIs Console project on the API Access pane of the Google APIs Console. There are three types of client IDs, so be sure to get the correct type for your application:
    • Web application client IDs
    • Installed application client IDs
    • Service Account client IDs
    Keep your client secret private. If someone obtains your client secret, they could use it to consume your quota, incur charges against your Google APIs Console project, and request access to user data.

Building and Calling the Service

The following steps describe how to build an API-specific service object, make calls to the service, and process the response.

  1. Build the Service. You use the build() function to create a service object. It takes an API name and API version as arguments. You can see the list of all API versions on the Supported APIs page. The service object is constructed with methods specific to the given API. The following is an example:
    from apiclient.discovery import build as discovery_build
    service = discovery_build('storage', 'v1beta2', ....)
    
  2. Create the Request. Methods are defined by the API. After calling a method, it returns an HttpRequest object. The following is an example:
    request = service.buckets().insert(
                    project=project_id, 
                    body={'name': bucket_name})
    
  3. Get the Response. Creating a request does not actually call the API. To execute the request and get a response, call the execute() function as follows:
    response = request.execute()
    

Quickstart

To best way to get started is to access the documentation Google Cloud Storage JSON API Client Library for Python. Follow the steps in the Quickstart section to create a starter application to get you up and running.  In particular, perform these steps:

  1. Select the platform you want to use. For simplicity, select Command Line.
    Command Line Application

    Command Line Application

  2. Click the Configure Project button. A dialog window is displayed.
  3. In the dropdown list select the name of the project for which you want to enable the JSON API.
  4. Click the Continue button. An instruction list is displayed. For convenience the instructions are repeated below.
    • Install Python 2 (if not installed yet).
    • Download the starter application and unzip it. Notice, you must download the application from the link shown in the instruction list.
    • Download the client secrets file. Use it to replace the file included in the starter application. Notice, you must download the client secrets file from the link shown in the instruction list.
  5. In a terminal window, from within the application directory, run the application as follows:
    python sample.py
    
    The first time you run the application, a Web page is displayed asking you to allow access to your Google Cloud Storage. Click the Accept button,
    Allow Project Access

    Allow Project Access

    The first time you will get the following output:

    Authentication successful.
    Success! Now add code here.

You are up and running! At this point you will want to add Cloud Storage API calls to the sample.py file as shown below.

Analyzing Sample.py

In this section we analyze the sample.py code to highlight the key points. In essence sample.py shows how to set up the OAuth 2.0 credentials to access a project. Notice the code shown is slightly different from the downloaded one. This is to make it more readable.

The following line obtains the path to the client_secrets.json. This file contains the credentials (OAuth 2.0 information) the sample.py needs to access your project. You can download this file from the Cloud Console at this location: <https://cloud.google.com/console#/project/[your project ID]/apiui>

CLIENT_SECRETS = os.path.join(os.path.dirname(__file__), 'client_secrets.json')

Next you set the Flow object to be used for authentication. The example below add two scopes, but you should add only the scope you need. For more information on using scopes. see Google+ API Best Practices.

RW_SCOPE = 'https://www.googleapis.com/auth/devstorage.read_write'
RO_SCOPE = 'https://www.googleapis.com/auth/devstorage.read_only'
FC_SCOPE = 'https://www.googleapis.com/auth/devstorage.full_control'

FLOW = client.flow_from_clientsecrets(
  CLIENT_SECRETS,
  scope=[RW_SCOPE, RO_SCOPE], 
  message=tools.message_if_missing(CLIENT_SECRETS))

The following lines are critical. If the credentials (storedcredentials.json) don’t exist or are invalid the native client flow runs. The Storage object will ensure that if successful the good credentials will get written back to the file.

storage = file.Storage('storedcredentials.json')
credentials = storage.get()
if credentials is None or credentials.invalid:
  credentials = tools.run_flow(FLOW, storage, flags)

Customization

Add the following function to list the objects contained in a bucket.

from json import dumps as json_dumps

def listObjects(bucketName, service):
    print 'List objects contained by the bucket "%s".' % bucketName
    fields_to_return = 
      'nextPageToken,items(bucket,name,metadata(my-key))'
    request = service.objects().list(bucket=bucketName, 
                fields=fields_to_return)
    response = request.execute()
    print json_dumps(response, indent=2)

Call the function from main as follows:

listObjects('myBucket', service)

RELATED ARTICLES

Build a Google Cloud Storage Boto Application

This post demonstrates how to build a Google Cloud Storage Boto application which interacts with the storage service through the XML API (using the boto library). This is a simple application to help you understand the API syntax (and semantics) when interacting with Google Cloud Storage.  It uses a simple interface which allows you to perform tasks such as: list the buckets in a project, list objects in a bucket, create a bucket, create an object and so on.

BotoXmlApi Application

BotoXmlApi Application

The boto library is written in Python, therefore the application is built using the same programming language.

To build the application, you will use Eclipse as shown next. Before you can perform the shown steps, assure that you have satisfied the requirements described here: Google Cloud Storage Python Applications Prerequisites.

Create the Application Project

  1. Activate Eclipse.
  2. From the File menu, select New->PyDevProject. The PyDev Project window is displayed.
  3. In the Project name: enter BotoXmlApi.
  4. Accept the user default workspace; for example: /Users/[your username]/[your work directory]/BotoXmlApi. This is the directory where the program will be stored.
  5. If you have more than one Python version installed perform the following steps:
    • In the Grammar Version selection list select 2.7.
    • In the Interpreter selection list select the desired Python interpreter. For more information, see Configure PyDev.
      PyDev Project

      PyDev Project

  6. Click Finish. The BotoXmlApi project is created and displayed in the Package Explorer.
  7. If the PyDev perspective is not selected yet, click on the Open Perspective icon in the upper right corner of Eclipse. From the pop-up menu select PyDev and click OK.
  8. In the Package Explorer, expand the BotoXmlApi project node.
  9. Right-click the src folder.
  10. Select New->PyDev Module.
  11. In the pop-up dialog window enter the following information:
    • In the Package box enter gcs_examples.
    • In the Name box enter BotoXmlApiMain.
    PyDev Module

    PyDev Module

  12. Click Finish.
  13. Select Module Main and  click OK.
    An empty BotoXmlApiMain.py is created. Actually the module contains the standard Python check for main because the template chosen. You’ll enter the code needed to interact with Google Cloud Storage in the next sections. But before you do that you must configure the project to access some needed external libraries.

Configure External Libraries

To build the BotoXmlApi application you need to configure the required libraries. These libraries are contained in the gsutil folder that you obtained when installing the gsutil tool. For more information, see Install gsutil Tool.

    1. For convenience, create a string substitution variable which point to the gsutil directory, as follows:
      • In the Package Explorer, right-click on BotoXmlApi project name.
      • In the displayed selection window, click Properties. The Properties for BotoXmlApi window is displayed.
      • In the left pane, click PyDev-PYTONPATH.
      • In the right pane, click String Substitution Variables button.
      • Click the Add Variable tab. A dialog window is displayed. Perform the following steps:
          • In the Name box enter gsutildir.
          • In the Value box enter the path to the gsutil folder. Click the Browse button and select the VERSION file in the gsutil directory. Then delete VERSION in the path and keep the rest.
        {gsutildir} variable

        {gsutildir} variable

        • Click OK.
    2. Click the External Libraries tab.
    3. Click Add based on variable button.
    4. Add a reference to the boto library by entering ${gsutildir}third_party/boto.
    5. Repeat step 3 and 4 to add the rest of the libraries as shown in the following picture. Click the picture to enlarge it.
External Libraries

External Libraries

BotoXmlApi Implementation

Finally, let’s get our hands dirty and dig into the actual code.
The best way to do this is to:

  1. Download the code here: storage-xmlapi-python.
  2. Copy and paste the various modules in your project.

The application contains the following modules:

  1. BotoXmlApiMain.py. It contains the main function which is called when the application is executed. Provides a simple user interface to exercise Google Cloud Storage XML API.
  2. Project.py. It contains the functions to perform project wide operations.
  3. Buckets.py. It contains the functions to perform bucket operations in the specified project.
  4. Objects.py. It contains the functions to perform object operations in the specified project.

Before running the application, study the code and read the comments to get familiar with the application functionality and the way it is built.

Run the Application in Eclipse

  1. In the Package Explorer right-click BotoXmlApiMain.py.
  2. In the popup window select Run As-> Run Configurations. The Run Configurations window is displayed.
  3. In the Run Configurations window click the Arguments tab.
  4. In the Program arguments box enter the following arguments:
    • –proiect_id [your Google Cloud Storage project ID].
    • –bucket_name [name of your bucket].
    • –object_name [name of your object].
    • –debug_level [0|1|2] message debug level. If this argument is
      missing, the default value of zero (no messages) is used.
  5. Click the Apply button.
  6. In the Package Explorer, right-click BotoXmlApiMain.py.
  7. In the popup window select Run As->Python Run.
  8. In the Eclipse console window enter p1. This display the
    buckets in the specified project.
  9. Clear the output by entering any key.
  10. Select any entry from the menu such as b1 to list the objects
    in the specified bucket.
  11. Clear the output by entering any key.
  12. Make other selections from the menu to exercise the XML API.
  13. Enter x to exit the application.

Run the Application from the Command Line

To run the application from the command line you must configure your environment properly. Remember that you are going to leverage the libraries that came with the gsutil tool.  For more information, see Install gsutil Tool.
Perform these steps:

  1. Include gsutil, boto, and third_party libraries in your PYTONPATH by adding this line to the .bash_profile file:
    export PYTHONPATH=${PYTHONPATH}:${TOOLS}/gsutil:${TOOLS}/gsutil/third_party/boto:${TOOLS}/gsutil/gslib

    Replace ${TOOLS} with the directory where you installed gsutil (by default it is ${HOME}). For example, add this in your .bash_profile: TOOLS=”/Users/[username]/Tools”. Replace username with an applicable value.

  2. If you do not have virtualenv installed, from a Terminal window, install it as follows: pip install virtualenv.
  3. Create (optional) a directory where to keep your virtual environments, for example:
    mkdir Work/Programming/Python/BotoEnv.
  4. Create the virtual environment, for example:
    virtualenv Work/Programming/Python/BotoEnv/BotoXmlApi
  5. Switch to the virtual environment directory, for example:
    cd Work/Programming/Python/BotoEnv/BotoXmlApi.
  6. Activate the virtual environment as follows: virtualenv activate.
  7. Perform the following installs which allow the application to reference the needed libraries:
    sudo pip install ${TOOLS}/gsutil/third_party/python-gflags
    pip install  ${TOOLS}/gsutil/third_party/socksipy-branch
    pip install  ${TOOLS}/gsutil/third_party/httplib2
    pip install  ${TOOLS}/gsutil/third_party/retry-decorator
    pip install  ${TOOLS}/gsutil/third_party/google-api-python-client
  8. Copy all the Python modules (BotoXmlApi.py, Project.py, Buckets.py, Objects.py, Playpen.py) in your virtual environment directory.
  9. At the Terminal command prompt enter the following:
    python BotoXmlApiMain.py --project_id [project ID] --bucket_name [bucket name] --object_name [object name] --debug_level [0|1|2]

    Replace the arguments with your values.
    This displays the application menu which allows you to exercise the desired API.

Have Fun!!

Google Cloud Storage Python Applications Prerequisites

This post describes Google Cloud Storage python applications prerequisites. To this end we’ll use the Python programming language, MAC OS 10.x platform and Eclipse IDE. Later we’ll focus on other programming languages and/or platforms.

Install Python

Remember that Mac OSX 10.x platform has Python already installed.  You must install Python version 2.7.x instead, for example from here: http://www.python.org/download/releases/2.7.6/.

Apple uses its own version of Python and proprietary extensions to implement some of the software distributed as a part of Mac OS X. Unless you know what you are doing, do not mess with it.

After the installation, make sure that the environment variable PATH is properly set in the ~/.bash_profile as follows:

PATH="/Library/Frameworks/Python.framework/Versions/2.7/bin:${PATH}"

Install PyDev in Eclipse

PyDev is a Python IDE for Eclipse. Its latest version requires Java SDK version 7 or above. If you do not have this Java version, install it from here: Java SE Development Kit 7. Then follow these steps:

  1. Activate Eclipse.
  2. In the menu bar click Help->Install New Software.
  3. In the Available Software window click the Add button.
  4. In the Add Repository window click the Add button. Then perform these operations:
    • In the Name box enter PyDev.
    • In the Location box enter http://pydev.org/updates/.
    PyDev Repository

    PyDev Repository

  5. Click OK.
  6. In the combo box that is displayed, check the box by PyDev.
  7. Accept the default selection and click Next.
  8. Accept the terms and conditions, then click Finish.

This install the latest PyDev version.

Configure PyDev

If you have more than one Python version installed, you must configure PyDev to use Python 2.7.x interpreter as follows:

  1. In Eclipse menu bar, click Eclipse->Preferences.
  2. In the left pane of the Preferences window, expand the PyDev node.
  3. Expand the Interpreter node.
  4. Click Python Interpreter.
  5. In the right pane, click the Quick Auto-Config button. This allows to configure the default Python interpreter (i.e., Python 2.7). A selection window is displayed.
  6. Select python. In the lower section you will see the list of Python 2.7 libraries.
  7. Click OK

Create a Google Cloud Storage Project

Select or create a Cloud Storage Project as described here: How to activate Google Cloud Storage. For your convenience, the steps are also described next.

  1. If you do not have a Google account, create one. For more information, see  Create a Google Account.
  2. Activate the Google Cloud Console. Then perform the following steps:
    • If you already have a project, select it.
    • If you do not have a project, create one. Notice the project ID. You will use it often to perform the Google Cloud Storage operations.
  3. Enable Google Cloud Storage for the project as follows:
    • In the console left pane, expand the APIs & auth node and select APIs.
    • In the console right pane, turn the button by Google Cloud Storage from Off to On.
  4. Enable billing.
  5. In the left pane click Settings.
  6. In the right pane click Billing Account for [your project name] and perform these steps:
    • Set your billing profile.
    • Select your modality of payment.
    • Submit and activate the account.
    • Assure that your main e-mail is verified so you can receive billing information.

That’s it. Now you are ready to use the service.

Enabling billing does not necessarily mean that you will be charged. For more information, see Pricing and Terms.

Let’s verify that you can use the service with the gsutil tool. Assure that you have installed the tool first. For more information, see Install gsutil Tool.

  1. Open a Terminal window.
  2. At the command prompt enter: gsutil mb gs://<unique bucket name>. This creates a  bucket in your project using the default region. Notice that the bucket name must be unique in the entire Google Cloud Storage name space.
  3. To verify that the bucket has been created, at the command prompt enter: gsutil ls. The bucket name you just created will be listed.

For a complete list of gsutil commands and related syntax, see gsutil Tool.

Install gsutil Tool

To install the tool follow the steps described next. For more information, see Install gsutil.

  1. Download gsutil.tar.gz.
  2. Extract the archive in the directory of your choice as follows: tar xfz gsutil.tar.gz -C ~/myDir. If you do not specify the target directory the tool installs in your  ${HOME} directory.
  3. Add gsutil to your PATH environment variable. On MAC add the following to the ~/.bash_profile : PATH=${PATH}:~/myDir/gsutil.
  4. Restart the Terminal.
  5. At the command prompt enter gsutil. You should get the tool help.
  6. At the command prompt enter: gsutil config.  A link is displayed. This is to configure the tool with security information so it can access your project.
  7. Open a new browser session and go to the link obtained in the previous step.
  8. Click the Accept button. An access code is displayed.
  9. Copy the access code and enter it in the Terminal window.
  10. Enter your project ID. The gsutil creates the  .boto configuration file that contains information such as security data needed when performing Google Cloud Storage operations.

Build Google Cloud Storage Client Applications

You can build Google Cloud Storage client applications selecting one of the supported RESTful APIs. Google Cloud Storage (GCS) supports 2 kinds of APIs as described next.

XML API

The XML API is the first API created by the GCS development team. It uses the HTTP protocol with the payload in XML format.

XML API Context

XML API Context

This API is used by current and earlier applications mainly written in Python using the boto library and in Java using other libraries such as JetS3t.

The XML API v1.0 is interoperable with some cloud storage tools and libraries that work with services such as Amazon Simple Storage Service (Amazon S3) and Eucalyptus Systems, Inc.

The following Python code snippet shows how to list the buckets contained in a project using the boto library. In future posts, we’ll show you how to exercise other parts of the XML API using the same library.

def list_buckets(project_id, debug_level):
    '''
    Perform a GET Service operation to list the buckets 
    contained in the specified project.
    @param project_id: The id of the project that contains 
    the buckets to list.
    @param debug_level: The level of debug messages to be printed.
    '''
    try:
        # URI scheme for Google Cloud Storage.
        GOOGLE_STORAGE = "gs"

        # Define the project URI
        uri = boto.storage_uri("", GOOGLE_STORAGE, debug_level)
        
        # Define the header values.
        header_values = {"x-goog-api-version": "2",
                         "x-goog-project-id": str(project_id)}

        # List the buckets in the projects.
        for bucket in uri.get_all_buckets(headers=header_values):
            print bucket.name

    except boto.exception, e:
        logging.error("list_buckets, error occurred: %s", e)

For testing purposes, you can use XML API directly with the curl tool.

JSON API

The JSON API is the second API created by the GCS development team. It uses the HTTP protocol with the payload in JSON format. At the moment, this API is still in the experimental stage.

JSON API Context

JSON API Context

JSON format is poised to become the standard way to communicate with any Google cloud service. Even though the details may differ from one service to another, once you know how to use a certain API, you should be able to apply this knowledge anywhere else.

Examples of using JSON API can be shown from the browser. For example, if you have already a project you can list the buckets from this location:  Bucket:List.

The libraries support several programming languages and this allows for a wider range of applications, compared to XML API for example. For information about the supported languages, see Libraries.

Both XML and JSON API use the HTTP protocol as defined by the HTTP/1.1 specifications and provide a RESTful interface for accessing Google Cloud Storage to perform Create, Read, Update, Delete (CRUD) operations. While the first API uses XML format the second uses JSON format for the payload encoding.

Conclusions

No matter what format you use, you are not going to build your HTTP method calls from scratch. In theory you could get down to the metal and use the protocol directly.  However, instead of creating HTTP requests and parsing responses manually, you may want to use the Google APIs client libraries. 

You could use client libraries such as httplib2 library. But it is advisable to stay with the supported Google libraries. They provide better language integration, improved security, and support for making calls that require user authorization.