Monday, February 29, 2016

Generate Google Application User Credentials with Python/Flask

    Recently, I got a demand of building Google Application for our company. After some trials, I understood how things work. And I want to document it, hope it is going to help people who met a similar need.

    First of all, you need to create a project for your application on Google Develops Console. And then you need to enable the service you are going to interface, by click the link 'Enable and manage APIs' or enter the 'API Manager.' Now you are suppose to see a few lists of Google APIs, choose the one that you need, and click the enable button.

    The following step is to generate application credentials. Go to the 'API Manager > Credentials,' and click Create Credentials > API key. You are going to select between types 'Server key', 'Browser key', 'iOS key' or 'Android key', this depends on which platform your application is(Here I choose the server key, because my application is going to call APIs on a server instead of browsers or mobiles.) All information you need to provide for this key is server name(If you can provide your server IP address is better, of course.)  Now you can see you API key in the credentials page.

    After you generated an API key, there are another credential to generate. Which is OAuth 2.0 Client ID. Before you click the 'create credentials' button, you need to switch to the tab 'OAuth consent screen' which is just above the 'create credentials' button and fill the field 'Product name shown to users'. Now you are fine to create the 'OAuth client ID' by clicking 'create credentials'. You need to choose which type of your application type(I guess this doesn't really matter, I choose the 'Other' type), and fill your application name. After you created the client ID, you are able to download your client key in the credentials page, the right side of your client ID. And your client secret are able to be seen by click your client ID name.

** Remember, anyone of API keys, client IDs and client secret are suppose to be kept secret. DO NOT share with anyone which is not in your project. **

    Now you are done with all the paperworks, let's get hands dirty. You need to get user credentials to perform actions for this user. I deploy an web application to acquire user credentials with Python/Flask. You need to build two pages, one is for requesting authentications, the other one is for retrieve authentications. The application may like this
from oauth2client.client import OAuth2WebServerFlow
from oauth2client.file import Storage

from flask import Flask, url_for, redirect, request
app = Flask(__name__)

CLIENT_ID_FILE = "your_client_id_file_name"
CLIENT_SECRET = "your_client_secret"
SCHEME = "http://"
DOMAIN = "localhost:5000"
AUTH_RETURN_PATH = "/auth_return"
REDIRECT_URI = SCHEME + DOMAIN + AUTH_RETURN_PATH
CREDENTIAL_NAME = "credential_name"
SCOPES = [
  "https://www.googleapis.com/auth/drive.file"
]

@app.route("/auth")
def auth():
  storage = Storage(CREDENTIAL_NAME)
  credentials = storage.get()
  if not credentials or credentials.invalid:
    print("!!! Cannot find this credential !!!")
    return request_credential(storage)
  else:
    return "Your have authorized your credential."

def request_credential(storage):
  flow = OAuth2WebServerFlow(client_id=CLIENT_ID_FILE,
    client_secret=CLIENT_SECRET,
    scope=SCOPES,
    redirect_uri=REDIRECT_URI)
  auth_uri = flow.step1_get_authorize_url()
  return redirect(auth_uri)

@app.route("/auth_return")
def auth_return():
  flow = OAuth2WebServerFlow(client_id=CLIENT_ID_FILE,
    client_secret=CLIENT_SECRET,
    scope=SCOPES,
    redirect_uri=REDIRECT_URI)
  credentials = flow.step2_exchange(request.args.get("code", ""))
  storage = Storage(CREDENTIAL_NAME)
  storage.put(credentials)
  credentials = storage.get()
  if not credentials or credentials.invalid:
    return "Authorization failed."
  else:
  return "Authorization succeed."


    In the above codes, the entry point is 'http://localhost:5000/auth', which would lead user to auth(). It check if the credential exists, it call the request_credential if not. The authentication is following OAuth 2.0 two-step authentication, so it will lead user to Google to authenticate his/her authentication and then redirect to the return path that you have given in redirect_uri.

    After you got the credential and call storage.put(credential), the credential will be stored at the location CREDENTIAL_NAME(Absolute or relative path depends on it.)

    There is a constant named CLIENT_ID_FILE, which is the path to your client ID in your file system, however, without the extension(.json). And obviously, CLIENT_SECRET is what have been mentioned in the above article.

    And the SCOPES in the code, can be found in the API document. In this example, I requested the https://www.googleapis.com/auth/drive.file scope, which is shown in the Drive document.



    After you read this article, you should be able to get a Google service credential from a user. If there is any part is unclear, please ask without hesitation, I am glad to answer it. I may write another post for how to operate some Google services that I have interfaced with.

Saturday, February 20, 2016

Spark: Read/Write Sequence files

    Recently, I got some extraordinary demands for Spark RDD. I need an RDD which supports multi-key value pair, followed by IO operations. As usual, I started from python and I found that applying saveAsSequenceFile does not always work. After some searches, I assume it is because of the writing type of sequence file. Which is not demanded when writing, however, is demanded when reading(sequenceFile). After all, if I did not specify the output type, how should I know what type to read?

    After that, I think I had to quit my obsession on leveraging Spark with Python. I switched the language to Scala, which is fully supported to Spark. Nonetheless, story did not end here perfectly like the fairy tale of prince charming and snow white. I tried to write the key with classes implemented Writable, which would cause an implicit conversion on RDD to SequenceFileRDDFunctions. Unfortunately, I noticed that both key and value has to implemented Serializable, which is reasonable. If the key and value are not serializable, how do we pass them between workers?

    At last, to simplify all problems, I decided to transform my keys to a String with concatenation. Since the beginning, all of my keys are String for sure.

TL;DR - The key and value of an RDD have to implement Serializable. If you want to save the RDD, the kay and value have to implement Writable.

    Well I guess you are here for how to read/write a sequence file. If you meet the above requirement. You can just do
val path: String = "your/path/for/rdd/here";
val rdd = sc.parallelize(

List(("Raccoon", 1), ("Squirrel", 2), ("Ferret", 3))
);
rdd.saveAsSequenceFile(path);

val readRdd = sc.sequenceFile(path, classOf[Text], classOf[IntWritable]);
// here, the type you have read is Tuple2[Text, IntWritable].
// Do not forget to transform them into Tuple[String, Int].
val usableRdd = readRdd.map{ case (key, value) =>
(key.toString, value.get)
};