#hash Farm

Thursday, July 13, 2017

Download Sentry Events via RESTful API

For those users who use Sentry as a error recording service. When you got an error issue, and wanting to know all events detail. But on Sentry web interface, you can read an event each time. And, of course, you wouldn't like the idea of copying each event til it ends. Fortunately, Sentry provided a web API for users who analysis data on their Sentry server. You may refer to Sentry API Docs for more detail.

After some time to explore, I understand that you need to create an access token on your sentry server http://your-host/api/. And after you are done, you are ready to access the APIs listed on the above document. And here is a simple example I used to retrieve all events of an issue and make some analysis.

import json, re, requests, sys

def main(host, issueId, token):
  endpoint = "http://%s/api/0/issues/%s/events/" % (host, issueId)
  headers = {"Authorization": "Bearer " + token}
  with open("data/%s.json" % issueId, "w") as f:
    f.write("[")
    nextLink = endpoint
    while (nextLink != None):
      if nextLink != endpoint :
        f.write(",")
      response = requests.get(nextLink, headers=headers)
      text = response.text[1:-1]
      f.write(text)
      headerLink = response.headers.get("link", "")
      regSearch = re.search("<([^<]*)>; rel=\"next\"; results=\"(true|false)\";", headerLink)
      if len(regSearch.groups()) >= 2 and regSearch.group(2) == "true":
        nextLink = regSearch.group(1)
      else:
        nextLink = None
    f.write("]")

In the above code, I uses Python requests to send requests to the sentry server. And I apply the regular expression to extract the next link of this page. I guess this shall be a simple one, and now you are ready to get your hands on the massive error logs.

Thursday, July 6, 2017

Retrieve Access(Bearer) Token via oauth2client

I am doing a project interfacing with Google Cloud Platform and service account. But this time, I am not going to access the service with personal account, I do not want user experience the authentication window, and the personal profile doesn't matter. Then I found there was a kind of credential called service account, that I can make all users access the service with this account.

Besides that, I don't want create a credential file on users device. So I need a lite-weight authentication. And fortunately, there is a kind of stuff called access token, which generated by the service account key file(which suppose to be a secret). With the access token, you can access the service via a simple request associated with a bearer token.

After all these survey, I am going to find a way to generate the access token automatically. Initially, I knew that gcloud could generate token via gcloud auth activate-service-account then gcloud auth print-access-token, but I don't want to make the application make system call if not necessary. And it took me quite some time and effort to learn how to do that. I find myself got poor knowledge about OAuth2, so I downloaded the gcloud source code via from google-cloud-sdk. (I just read the gcloud installation bash and found this link.)

And I found the concept of OAuth2 service account authentication pretty intuitive. All you need to do is read the key file which you obtained from Google Developer Console, and refresh the service account credential. The following is an illustration:

from oauth2client.service_account import ServiceAccountCredentials
import httplib2

fileName = "/path/to/your/service-account-secret.json"
creds = ServiceAccountCredentials.from_json_keyfile_name(
  fileName,
  scopes=['https://www.googleapis.com/auth/cloud-platform'])
creds.user_agent = creds._user_agent = "google-cloud-sdk"
# User agent may not be necessary.
# This suffered from oauth2client bug, so we have to assign it manually.
# See https://github.com/google/oauth2client/issues/445
creds.refresh(httplib.Http())
print(creds.access_token)

Since the snippet is from gcloud, it inherits some knowledge from Google. The Credential object suffering a bug about _user_agent. so we need to assign the user_agent manually.

And with the access token generating script, I could deploy a micro-service for authenticating usage. If users needed an access token, I just refresh the credential and respond the token to users. And don't forget, the token is valid for one hour, so you may apply some micro-cache mechanism(like nginx provided) to your application and reduce the server loading.

Sunday, March 12, 2017

Enable nmred/kafka-php and Zookeeper on php and CentOS

I was working on a project that needs to send messages to Kafka. After some survey, I chose the library nmred/kafka-php. Since the setup might be difficult to people who are new to zookeeper and Kakfa, I would like to leave some record.

The first step, import nmred/kafka-php with composer.json.

  "require": {
    ...,
    "nmred/kafka-php": "0.1.*"
  },

And you need to install the zookeeper environment on your machine. By executing the following

yum --nogpgcheck localinstall -y https://archive.cloudera.com/cdh5/one-click-install/redhat/7/x86_64/cloudera-cdh-5-0.x86_64.rpm
yum install -y zookeeper

This allows you to access cdh5 rpm and install zookeeper client on the machine.

But your php still not have the zookeeper.so to make php communicate with zookeeper. So now we need to generate the zookeeper.so. By the following

yum install -y php-devel # this enables phpize
cd /tmp
wget http://apache.stu.edu.tw/zookeeper/zookeeper-3.4.9/zookeeper-3.4.9.tar.gz # the zookeeper source
tar zxvf zookeeper-3.4.9.tar.gz
cd /tmp/zookeeper-3.4.9/src/c # source of c lang
./configure --prefix=/usr/local/zookeeperlib # configure the environment of libzookeeper
make && make install
cd /tmp
wget https://pecl.php.net/get/zookeeper-0.3.1.tgz # download the php-zookeeper source
tar zxvf zookeeper-0.3.1.tgz
cd zookeeper-0.3.1
phpize
./configure --with-php-config=/usr/bin/php-config --with-libzookeeper-dir=/usr/local/zookeeperlib/ # setup the environment
make && make install # install zookeeper.so
echo '; zookeeper' >> /etc/php.ini
echo 'extension=zookeeper.so' >> /etc/php.ini # attach the zookeeper.so information
cd /tmp
rm -rf zookeeper-3.4.9
rm -rf zookeeper-0.3.1

Notice that, choose the php-zookeeper wisely. Make sure the version of library you use matches your php version. For example, I use 0.3.1 because the project is a php70 project.

Saturday, August 13, 2016

Binding with Infamous 0000-00-00 in MySQL via JDBC

  Sometimes you are dealing with a poor maintained MySQL server. It may contain some dirty data, including the infamous datetime 0000-00-00 00:00:00 or date 0000-00-00. First of all, if you are developing a new service, just don't use the value 0000-00-00 as a date. Or you are running an existed service and have the privileges, write a script map them into NULL or 1970-01-01. Zero dates cause a lot of problems, some of them may not solvable with this post.

  Back to the topic, when you read a 0000-00-00 from MySQL as a java.sql.Date, it raises an error '0000-00-00' can not be represented as java.sql.Date.
  JDBC has provided a configuration to solve this problem, which is

zeroDateTimeBehavior=convertToNull

  When you are connecting MySQL server, you may specify this configuration in the URL, like mysql://user:pass@host:port/db?zeroDateTimeBehavior=convertToNull. With this configuration, JDBC automatically transforms your zero dates/datetimes to 1970-01-01 (00:00:00).

  There are some space left, so let me talk about the problems you are going to face if you continue to use zero dates.
  The first is the cause of this post, reading with program. Zero dates are obviously invalid date to any formats. When you use most dataframe framework, it would cause an error the above.
  Second, transferring database. If you want to migrate to PostgreSQL(or any other popular database), it would be an inevitable problem. Since PostgreSQL do not allow zero dates in it (and most database do not). My solution is creating a view foreach table which map the zero dates to NULL or 1970-01-01, then export them to the new database.
  Third, binding data visualize software or analysis software. Of course most commercial software would deal with the zero dates problem. The real problem is, reading a date 0000-00-00 doesn't make any sense to marketing or decision team. You would have to filter them or transform them to an explainable report, so why not do it when designing the table schema?

Reference:
  zeroDateTimeBehavior

Saturday, August 6, 2016

MSQLdb/SQLAlchemy Python Streaming

When you are dealing with huge query result from sql server(let's say, about 30k rows), your program may consume a lot of memory to collect the result. In such a case, you probably want to retrieve the result by streaming. This article shows how to enable the streaming feature with python.

If you are using MySQLdb, the drill is simple. You just instantiate your cursor with SSCursor, and boom, this cursor serve you as a server-side streaming cursor. The following snippet is a demonstration.

import MySQLdb.cursors
import MySQLdb

conn = MySQLdb.connect(host=host, user=user, passwd=password, db=db)
cursor = SSCursor(conn)
query = "SELECT * FROM big_table;"
cursor.execute(query);
# rowcount wouldn't work here

For those projects using SQLAlchemy, you may have tried conn.execution_options(stream_results=True) and fruitless, it still consumes a lot of memory. Since the flag doesn't work with MySQL.
Cheers, Love! The Cavalry's Here! There is another solution there, since SQLAlchemy is based on MySQLdb, they have actually provided a way to inject the SSCrusor. Here is an example.

import sqlalchemy
import MySQLdb.cursors
CHUNK_SIZE = 10000

url = "mysql://%s:%s@%s:%d/%s" % (user, password, host, port, db)
query = "SELECT * FROM big_table;"
conn = sqla.create_engine(url, encoding="utf-8",
  connect_args={"cursorclass": MySQLdb.cursors.SSCursor})
cursor = conn.execute(query)

rows = cursor.fetchmany(CHUNK_SIZE)
while(len(rows) > 0):
  # do whatever it is you do to the data
  rows = cursor.fetchmany(CHUNK_SIZE)

Of course, you would like to read the streaming result as some chunks, so the overwhelming rows number wouldn't cause a network transferring bottleneck.

Another thing to mention. While enabling the streaming feature, rowcount wouldn't work. Probably because the results are stored in server-side, it's impossible to read how many rows it actually be.

Thursday, July 7, 2016

Bind Same Event with Different Identifier in JavaScript

Sometimes, you want to bind multiple function to an event in js. Accomplish that is simple, just repeatedly bind your functions to the event. Like the following,

$("#hatch-chick").bind("click", function(){ "remark the hatch date on the sheet.";});
$("#hatch-chick").bind("click", function(){ "turn on the heat lamp.";});

And you are able to remove the event functions by calling .unbind(eventName), like the following,

$("#hatch-chick").unbind("click");

However, sometimes you do not want to remove all of them at once. You may want to remove the "remark" one specifically. To achieve that, you need to name your event by adding postfix to the origin event name. For example, click to click.remark. Adding the postfix would not affect the functionality, the function is still triggered by click event. But now, you are able to identify the function by naming click.remark, so you can unbind the "remark" function only.
The following is a simple example.

$("#hatch-chick").bind("click.remark", function(){ "remark the hatch date";});
$("#hatch-chick").bind("click.heat", function(){ "turn on the heat lamp";});

$("#hatch-chick").unbind("click.remark");

Sunday, July 3, 2016

Add a Local Dependency with Maven

  Sometimes, you are building your private project, and you developed some personal library for your very own usage. you do not want to publish your library to public Maven repository, and you need not. This article shows you how to import a local Maven dependency.

  Before we start, assume that you have already have a jar file, coop-0.0.1.jar. Its group id, artifact id and version are net.sunshire, coop and 0.0.1, respectively. And your current project is net.sunshire:farm:0.0.1, which is at FARM_HOME=~/WorkSpace/farm.

  First I recommend you to make a directory for your local library, named LIB_SOURCE=$FARM_HOME/mvn-lib, and copy your dependency jar into there. Then you make another directory for local Maven repository, named LOCAL_MVN=$FARM_HOME/local-maven-repo. So you are prepared, by executing the following, you are able to deploy the jar to your local Maven repository.

mvn deploy:deploy-file \
  -Durl=file:///$LOCAL_MVN \
  -Dfile=$LIB_SOURCE/coop-0.0.1.jar \
  -Dpackaging=jar \
  -DgroupId=net.sunshire \
  -DartifactId=coop \
  -Dversion=0.0.1

To verify if your deploy is successful, you may check the files under $LOCAL_MVN/. There should be directories and jar named after your group id and artifact id.(e.g. )

After you have successfully deployed your local dependency. You may add the following into your current project pom.xml.

<project ...>
  ...
  <dependencies>
    <dependency>
      <groupId>net.sunshire</groupId>
      <artifactId>coop</artifactId>
      <version>0.0.1</version>
    </dependency>
  </dependencies>
  <repositories>
    <repository>
      <id>local-maven-repo</id>
      <url>file:${project.basedir}/local-maven-repo</url>
    </repository>
  </repositories>
</project>

When you are finished, you are able to compile your project by executing mvn scala:compile test.

Another worth mentioning thing, when you deploy the local repository with mvn deploy:deploy-file ..., it also deploy a duplication to ~/.m2/. So I would recommend you change the version number of the dependency every time you made a change of it, otherwise Maven uses the cache in ~/.m2/ prior.