#hash Farm

Thursday, July 6, 2017

Retrieve Access(Bearer) Token via oauth2client

I am doing a project interfacing with Google Cloud Platform and service account. But this time, I am not going to access the service with personal account, I do not want user experience the authentication window, and the personal profile doesn't matter. Then I found there was a kind of credential called service account, that I can make all users access the service with this account.

Besides that, I don't want create a credential file on users device. So I need a lite-weight authentication. And fortunately, there is a kind of stuff called access token, which generated by the service account key file(which suppose to be a secret). With the access token, you can access the service via a simple request associated with a bearer token.

After all these survey, I am going to find a way to generate the access token automatically. Initially, I knew that gcloud could generate token via gcloud auth activate-service-account then gcloud auth print-access-token, but I don't want to make the application make system call if not necessary. And it took me quite some time and effort to learn how to do that. I find myself got poor knowledge about OAuth2, so I downloaded the gcloud source code via from google-cloud-sdk. (I just read the gcloud installation bash and found this link.)

And I found the concept of OAuth2 service account authentication pretty intuitive. All you need to do is read the key file which you obtained from Google Developer Console, and refresh the service account credential. The following is an illustration:

from oauth2client.service_account import ServiceAccountCredentials
import httplib2

fileName = "/path/to/your/service-account-secret.json"
creds = ServiceAccountCredentials.from_json_keyfile_name(
  fileName,
  scopes=['https://www.googleapis.com/auth/cloud-platform'])
creds.user_agent = creds._user_agent = "google-cloud-sdk"
# User agent may not be necessary.
# This suffered from oauth2client bug, so we have to assign it manually.
# See https://github.com/google/oauth2client/issues/445
creds.refresh(httplib.Http())
print(creds.access_token)

Since the snippet is from gcloud, it inherits some knowledge from Google. The Credential object suffering a bug about _user_agent. so we need to assign the user_agent manually.

And with the access token generating script, I could deploy a micro-service for authenticating usage. If users needed an access token, I just refresh the credential and respond the token to users. And don't forget, the token is valid for one hour, so you may apply some micro-cache mechanism(like nginx provided) to your application and reduce the server loading.

Sunday, March 12, 2017

Enable nmred/kafka-php and Zookeeper on php and CentOS

I was working on a project that needs to send messages to Kafka. After some survey, I chose the library nmred/kafka-php. Since the setup might be difficult to people who are new to zookeeper and Kakfa, I would like to leave some record.

The first step, import nmred/kafka-php with composer.json.

  "require": {
    ...,
    "nmred/kafka-php": "0.1.*"
  },

And you need to install the zookeeper environment on your machine. By executing the following

yum --nogpgcheck localinstall -y https://archive.cloudera.com/cdh5/one-click-install/redhat/7/x86_64/cloudera-cdh-5-0.x86_64.rpm
yum install -y zookeeper

This allows you to access cdh5 rpm and install zookeeper client on the machine.

But your php still not have the zookeeper.so to make php communicate with zookeeper. So now we need to generate the zookeeper.so. By the following

yum install -y php-devel # this enables phpize
cd /tmp
wget http://apache.stu.edu.tw/zookeeper/zookeeper-3.4.9/zookeeper-3.4.9.tar.gz # the zookeeper source
tar zxvf zookeeper-3.4.9.tar.gz
cd /tmp/zookeeper-3.4.9/src/c # source of c lang
./configure --prefix=/usr/local/zookeeperlib # configure the environment of libzookeeper
make && make install
cd /tmp
wget https://pecl.php.net/get/zookeeper-0.3.1.tgz # download the php-zookeeper source
tar zxvf zookeeper-0.3.1.tgz
cd zookeeper-0.3.1
phpize
./configure --with-php-config=/usr/bin/php-config --with-libzookeeper-dir=/usr/local/zookeeperlib/ # setup the environment
make && make install # install zookeeper.so
echo '; zookeeper' >> /etc/php.ini
echo 'extension=zookeeper.so' >> /etc/php.ini # attach the zookeeper.so information
cd /tmp
rm -rf zookeeper-3.4.9
rm -rf zookeeper-0.3.1

Notice that, choose the php-zookeeper wisely. Make sure the version of library you use matches your php version. For example, I use 0.3.1 because the project is a php70 project.

Saturday, August 13, 2016

Binding with Infamous 0000-00-00 in MySQL via JDBC

  Sometimes you are dealing with a poor maintained MySQL server. It may contain some dirty data, including the infamous datetime 0000-00-00 00:00:00 or date 0000-00-00. First of all, if you are developing a new service, just don't use the value 0000-00-00 as a date. Or you are running an existed service and have the privileges, write a script map them into NULL or 1970-01-01. Zero dates cause a lot of problems, some of them may not solvable with this post.

  Back to the topic, when you read a 0000-00-00 from MySQL as a java.sql.Date, it raises an error '0000-00-00' can not be represented as java.sql.Date.
  JDBC has provided a configuration to solve this problem, which is

zeroDateTimeBehavior=convertToNull

  When you are connecting MySQL server, you may specify this configuration in the URL, like mysql://user:pass@host:port/db?zeroDateTimeBehavior=convertToNull. With this configuration, JDBC automatically transforms your zero dates/datetimes to 1970-01-01 (00:00:00).

  There are some space left, so let me talk about the problems you are going to face if you continue to use zero dates.
  The first is the cause of this post, reading with program. Zero dates are obviously invalid date to any formats. When you use most dataframe framework, it would cause an error the above.
  Second, transferring database. If you want to migrate to PostgreSQL(or any other popular database), it would be an inevitable problem. Since PostgreSQL do not allow zero dates in it (and most database do not). My solution is creating a view foreach table which map the zero dates to NULL or 1970-01-01, then export them to the new database.
  Third, binding data visualize software or analysis software. Of course most commercial software would deal with the zero dates problem. The real problem is, reading a date 0000-00-00 doesn't make any sense to marketing or decision team. You would have to filter them or transform them to an explainable report, so why not do it when designing the table schema?

Reference:
  zeroDateTimeBehavior

Saturday, August 6, 2016

MSQLdb/SQLAlchemy Python Streaming

When you are dealing with huge query result from sql server(let's say, about 30k rows), your program may consume a lot of memory to collect the result. In such a case, you probably want to retrieve the result by streaming. This article shows how to enable the streaming feature with python.

If you are using MySQLdb, the drill is simple. You just instantiate your cursor with SSCursor, and boom, this cursor serve you as a server-side streaming cursor. The following snippet is a demonstration.

import MySQLdb.cursors
import MySQLdb

conn = MySQLdb.connect(host=host, user=user, passwd=password, db=db)
cursor = SSCursor(conn)
query = "SELECT * FROM big_table;"
cursor.execute(query);
# rowcount wouldn't work here

For those projects using SQLAlchemy, you may have tried conn.execution_options(stream_results=True) and fruitless, it still consumes a lot of memory. Since the flag doesn't work with MySQL.
Cheers, Love! The Cavalry's Here! There is another solution there, since SQLAlchemy is based on MySQLdb, they have actually provided a way to inject the SSCrusor. Here is an example.

import sqlalchemy
import MySQLdb.cursors
CHUNK_SIZE = 10000

url = "mysql://%s:%s@%s:%d/%s" % (user, password, host, port, db)
query = "SELECT * FROM big_table;"
conn = sqla.create_engine(url, encoding="utf-8",
  connect_args={"cursorclass": MySQLdb.cursors.SSCursor})
cursor = conn.execute(query)

rows = cursor.fetchmany(CHUNK_SIZE)
while(len(rows) > 0):
  # do whatever it is you do to the data
  rows = cursor.fetchmany(CHUNK_SIZE)

Of course, you would like to read the streaming result as some chunks, so the overwhelming rows number wouldn't cause a network transferring bottleneck.

Another thing to mention. While enabling the streaming feature, rowcount wouldn't work. Probably because the results are stored in server-side, it's impossible to read how many rows it actually be.

Thursday, July 7, 2016

Bind Same Event with Different Identifier in JavaScript

Sometimes, you want to bind multiple function to an event in js. Accomplish that is simple, just repeatedly bind your functions to the event. Like the following,

$("#hatch-chick").bind("click", function(){ "remark the hatch date on the sheet.";});
$("#hatch-chick").bind("click", function(){ "turn on the heat lamp.";});

And you are able to remove the event functions by calling .unbind(eventName), like the following,

$("#hatch-chick").unbind("click");

However, sometimes you do not want to remove all of them at once. You may want to remove the "remark" one specifically. To achieve that, you need to name your event by adding postfix to the origin event name. For example, click to click.remark. Adding the postfix would not affect the functionality, the function is still triggered by click event. But now, you are able to identify the function by naming click.remark, so you can unbind the "remark" function only.
The following is a simple example.

$("#hatch-chick").bind("click.remark", function(){ "remark the hatch date";});
$("#hatch-chick").bind("click.heat", function(){ "turn on the heat lamp";});

$("#hatch-chick").unbind("click.remark");

Sunday, July 3, 2016

Add a Local Dependency with Maven

  Sometimes, you are building your private project, and you developed some personal library for your very own usage. you do not want to publish your library to public Maven repository, and you need not. This article shows you how to import a local Maven dependency.

  Before we start, assume that you have already have a jar file, coop-0.0.1.jar. Its group id, artifact id and version are net.sunshire, coop and 0.0.1, respectively. And your current project is net.sunshire:farm:0.0.1, which is at FARM_HOME=~/WorkSpace/farm.

  First I recommend you to make a directory for your local library, named LIB_SOURCE=$FARM_HOME/mvn-lib, and copy your dependency jar into there. Then you make another directory for local Maven repository, named LOCAL_MVN=$FARM_HOME/local-maven-repo. So you are prepared, by executing the following, you are able to deploy the jar to your local Maven repository.

mvn deploy:deploy-file \
  -Durl=file:///$LOCAL_MVN \
  -Dfile=$LIB_SOURCE/coop-0.0.1.jar \
  -Dpackaging=jar \
  -DgroupId=net.sunshire \
  -DartifactId=coop \
  -Dversion=0.0.1

To verify if your deploy is successful, you may check the files under $LOCAL_MVN/. There should be directories and jar named after your group id and artifact id.(e.g. )

After you have successfully deployed your local dependency. You may add the following into your current project pom.xml.

<project ...>
  ...
  <dependencies>
    <dependency>
      <groupId>net.sunshire</groupId>
      <artifactId>coop</artifactId>
      <version>0.0.1</version>
    </dependency>
  </dependencies>
  <repositories>
    <repository>
      <id>local-maven-repo</id>
      <url>file:${project.basedir}/local-maven-repo</url>
    </repository>
  </repositories>
</project>

When you are finished, you are able to compile your project by executing mvn scala:compile test.

Another worth mentioning thing, when you deploy the local repository with mvn deploy:deploy-file ..., it also deploy a duplication to ~/.m2/. So I would recommend you change the version number of the dependency every time you made a change of it, otherwise Maven uses the cache in ~/.m2/ prior.

Saturday, June 25, 2016

Including and Reading Resources with Maven/Java/Scala

  In last few days, I tended to make a Maven dependency to read resources from its own jar. And I suddenly found that I have no experience of that before. After some googling and experiments, I managed to read the resources, and here is the notes.

  Let's say, you want to publish a jar library which associated a few default configuration files, of course, you never want to write the configuration into your code.(don't do that if you had ever thought about it before.) Then how do you keep the file in the jar? Experienced Java developer should know that jar is a compress file, or more literally, a zip file, so being able to keep a file inside it is reasonable, isn't it? If you are using Maven as your publishing tool, thing should be easy.

  Setting up your pom.xml, assign the resource setup like a boss, uh, I mean, like the following.

<project>
  ...
  <name>My Resources Plugin Practice Project</name>
  ...
  <build>
    ...
    <resources>
      <resource>
        <directory>src/resources</directory>
        <includes>
          <include>**/*</include>
        </includes>
      </resource>
      ...
    </resources>
    ...
  </build>
  ...
</project>

Of course you can assign wherever you want, but in practical, I would recommend to put the resources directory under src/, or src/main if you want to separate resources between the main and the test program.

  Another tip, for some files that are included for a specific package or class, you may want to put the files into the directory same as the package name. For example, I got a picture chick.jpg for net.sunshire.farm.Chick, I would put the picture at the path src/main/resources/net/sunshire/farm/chick.jpg. By doing so, Maven packages the chick.jpg in the package net/sunshire/farm, which allows you keep the independency of each jar.(You won't mass resources between different jars when you are importing many different dependencies.)

  Now you are done with including resources, if you package the jar with Maven, you can see the resource that you just included inside your jar file.(To inspect a jar file, simply change the extension to .zip and decompress it.)

  And how do we read the resource that we included with Scala? It is actually pretty simple, there is a method getResourceAsStream of Class. Here is the usage in Scala:

package net.sunshire.farm;

class Chick {
  val picPath: String = "/net/sunshire/farm/chick.jpg";
  // relative one: "chick.jpg", do NOT recommend
  val chickPic: InputStream = getClass.getResourceAsStream(picPath);
  // do whatever you want with the stream.
}

The above demonstration is pretty straightforward, I guess most most problem is that we do not know the method. The worth mentioning part is that getResourceAsStream takes either relative path or absolute path, same as UNIX/POSIX system ,/ represents root path. To clarify, the absolute path did not mean from the root of the file system, it is from the root of the jar. And the relative path is start from the package declared on the top(/net/sunshire/farm/ in the example).
Another thing, if you specify the relative path, it is going to read the relative path of the run time package.(net.sunshire.house.Chicken extends Chick, and Chick specified the path with the relative one, then the path is going to be /net/sunshire/house/chick.jpg)

If you do not understand Scala and need a Java example, the pom.xml part is the same, for the code part please refer to the 2nd reference. This is all about how to including and reading resources, hope it is helpful. :D

References:
[1] Resources Packaging - Maven Official
[2] Read a Resource in Java - Stack Overflow