ETHAN'S BLOG

"This cow gets stuck in a chair"

Open Search Network - Visioning the Next Generation of Internet Searching

“Many do what they need, some do what they want” – Rajsarkapally S. (Salesforce)

Ever since Google start indexing the internet from 1998, the way people do searches hasn’t change much since. (Read this paper to see how does page rank work today)

1
http://ilpubs.stanford.edu:8090/422/1/1999-66.pdf

Over the years, the scale of data that allowed to be processed has been improved, along with the advancement of AI, these brought us the incremental improvements to better accuracy of search results. However, two fundamental issues still remains:

Two Fundamental Issues

First and foremost, user’s trust for the quality of information.

Let’s say for Google, because their revenue model, it is already difficult to justify the quality of those search results, such as the which result should be ranked higher. Then there is a famous “do no evil” promise. However, after 2016 when blockchain concept become more and more popular (Read this famous Satoshi Nakamoto’s paper),

1
https://bitcoin.org/bitcoin.pdf

people start to not just satisfy with ethical promises, they learned that to get the highest level of consensus, we have to reply on some form of mathematical proof. Ethically correct is not enough, people want mathematically correct.

Secondly, the privacy.

Every user asks google therefore google knows everyone. Once the centralized part of this system is compromised (by either a hacker or a government), it is unavoidable that user should start worry about safety of their searching records, due to a centralized system like this:

The Instinct to Solve

These two issues are hard to solve, because asking a single authority (google) is not the natural way people search for answers.

Let’s look at the bees, each bee communicate with peers, the network optimizes the transportation and processing of information as a whole. So how does people in the natural word search for answers? Let’s say you want to know where to find Ethan, you go ask those friends you think are reliable. If your friends don’t have the answer right of the bat, they will ask their friends for help, so on so forth like a recursion. When the answer comes back, the are collected and aggregated, that’s when you will rank the trust of each answers based on the credibility of the peers, not the credibility of the sources. Hence the idea of “Open Search Network”. Ultimately, the sources (like stackoverflow.com) will also become one of the peers as well, but they are just not the whole of it.

In real world, people communicate peers for information. The technology advancements enables us to be able to scan through tons of information in sub-seconds, but the bandwidth of information that a fresh human being is able to interface hasn’t improve much. During thousands years of evolution, people learned to evaluate the source first then apply the trust on it moving forward. How wide is this interface…. Robin Dunbar (see this wiki) has a number to quantifies exactly this. Let’s start from there.

1
https://en.wikipedia.org/wiki/Dunbar%27s_number

Open Search Network

In the next generation of searching, I envision people broadcast their question to only the direct neighbors, and they broadcast to their neighbors. All the answers collected from the outside world will be evaluated and sent back to user the same path they fan out, weighted on each person’s “perspective”(see below) to form an ranked answer.

So, for every particular topic, if we assign a number “weight” to every friends we have, we now have a system to rank the credibility of that information source. For all the future incoming answers, we will use this “weight” to adjust how much should we trust this answer. Of course we’ll adjust these weights as we go, each time an answer get picked or not get picked, the corresponding source becomes more or less trustable (by a little). At the end of the day, every user under every particular topic will have a set of weights. That is called “perspective” matrix, representing the perspective of how a person view the world nearby.

In action it looks like this:

Page ranking with collective filter vs OSN

Google’s search reply on a set of sources, with the collective filter account for peers that may be similar, but that’s still just half of OSN has. In this process, page ranking is done as the following:

in which, R(u) is the page rank assigned, E is the array of web pages that corresponds to a source of rank.

If we let S be as follows almost any vector over Web pages (E.g., E), Then the page rank can be computed in this loop, (referred from that paper)

There are two observations here, for one, no information of user-user’s peer is involved; and two, the rank is tagged to each source. But a source can be more fond to one user but not the others. To solve that, let’s see how OSN’s matrix gets updated.

Essentially, to evaluate how one bad answer changes one person’s perspective towards that source, is a problem of partial derivative., hence, let’s picture something like this: In ANN, how does a feedback adjust the weight matrix of network, is essentially updating the network with:

the change of result applies feedback to the of network, straight forward.

Fast forward to result, updating weight matrix in OSN is very alike to updating weight matrix in ANN, which described here.

Read this

1
http://www.cs.cornell.edu/courses/cs5740/2016sp/resources/backprop.pdf

Refer to this, https://github.com/aertoria/OSN, each user will have a matrix looks like below

| Topic | neighborId | Weight |

————- :————-: —–:
t1 n1 0.65
t1 n2 0.35
t2 n1 1.00

Sum(weight) for all neighborId = 1

Pulling off

This search model looks more like Facebook, as it is peer to peer searcher based, i.e., a social network. Let’s create something called Open Search Engine Agent, a daemon thread that broadcast and route the answers on behalf of each user. This thing can run in cloud, but not necessary, since ultimately it’s owned by that user.

Posting this idea since I want it to start as another open source project…search on.

Purposing a New BloomFilter Design That Optimized for Apache Hbase

Got this idea when last week discuss with Rahul Gidwani from Salesforce about bloom filters. Put up some draft here, feedback goes to apache hbase dev list.

the “Idea”

Regular bloom filter has been widely used in HBase and LSM, however in these cases, certain important knowledge of the key space is pre-known but not been take advantage of. Such as: 1, the MIN/MAX of the keys is pre-known. 2, the keys are usually densely and continuously sorted within a range.

Issues of regular bloom filter

Current bloom filter makes no the assumptions of the distributions of the incoming objects. After hashing, the hashes is assumed to be evenly distributed among (-int, int). Take most popular implementation for example (google/guava.bloomfilter), there are two processes involved: hashing and modulo, as illustrated below. (code link)

Fig 1 Two processes for most Bloomfilter implementation. Hashing (fnv or murmur being most popular chooses) and modulo. Modulo operation is usually necessary in order to make hashing range fit into the length of bloom Filter.

Issue is that, no matter which hashing algorithm it choose(fnv or murmur being most popular), a modularization process is usually necessary to fit the size of bloom filter (Such as the Line 61 of com.google.common.hash.BloomFilterStrategies.java code link).

As illustrated below, each bloom filter buckets are not giving the even collision chance therefore yield a sub-optimum collision/space ratio.

Fig 2. In this example, range of hashes is [4, 29]. When mod this down to [1,10], we can observe that the range [4, 9] will be “visit” 3 times while other ranges only 2 times. Therefore the items are denser between [4, 9] on the bloom filter. Reason of this is that the regular bloom filter don’t know the min/max value of the incoming objects ahead of time, therefore can not prepare a good corresponding Hash/modulo process for it.

Improvement from HbaseBloomFilter (purposing)

During memstore flushing time, we know Min/Max of the key space a head of time, we can use this information to prepare a better hash and modulo process that avoid this issue. Illustrated below.

So How much does it improve?

A naive implementation of this idea is that: once hashing is done, based on min and max hash value re-align them based on the bloom filter length. (just implemented here for experiment use Code Link)

1
https://github.com/aertoria/HbaseBloomFilter

Experiment Result(each experiment result is the average of multiple experiments):

Bloom filter size is 100, fit in 100 values ranging from 1-150. Use identity based hash: OldBloomFilter: Occupied Rate 65%, Collision rate:35% NewBloomFilter: Occupied Rate 67%, Collision rate:33% (2% better)

Bloom filter size is 10K, fit in 100 values ranging from 1-10K. Use FNV hash: OldBloomFilter: Occupied Rate 63.22%, Collision rate:36.78% NewBloomFilter: Occupied Rate 62.72%, Collision rate:37.28% (0.5% worse)

How to Design a Time Machine to Predict Future

This post is a draft of something I want to build, a time machine, a system allows information to traverse through time.

“The Problem”

This is how this idea gets started: in my day-to-day job, we deal with lots of application servers and database servers. At the center of it, here is the gist: the connection between app server and db server is critical because it directly impacts customer. However, the request that comes down from an app server, due to all sorts of unpredictable reasons, often ends up not reaching the db end. Some times they just got lost, some times they got abandoned.

So we have a whole team to study how can we minimize these cases. The traditional approach is by gathering the knowledge from the past outages, to build a model so we can predict future… straight forward. If in the past ten incidents, every time incident started we first observed a high CPU usage. Therefore, next time if someone saw another CPU usage peak, it’s intuitive to predict the system is likely gonna broke soon.

But how about solve this problem in a different way: can we design a system to in general predict future, like a time machine?

Wait, a time machine?

With this in mind, please read on. So this is a plan: to use the quantum physics help us predict the connectivity between app server and db server..

First, this is the link of Double Slit Experiment. Strongly recommend this video if you haven’t read it.

1
https://www.youtube.com/watch?v=DfPeprQ7oGc

The take away of first video is the following: “the particle changes its behavior simply because of ‘aware’ of being watched”

Next, this one:

1
https://www.youtube.com/watch?v=H6HLjpj4Nt4

The take away of the second video is, “the particle can predict what would have happened in the future, and travel back in time to decide to behave accordingly.”

So to sum up, below is a prototype of the system, and has been proved will be functioning, as quoted from this third video(https://www.youtube.com/watch?v=DsxA7OU7fR0)

“The Design”

First we have a particle gun, shooting electrons(or photons) through a double-slit barrier and landing on a screen. Both screen and each slit will have a detector connecting to a computer. The detectors attached at each slit will stay on at all time. And their data is constantly being collected by a computer. As in initial state, the screen will show a two-bands, because observer knowledge is collected constantly being collected.

Now, if we leave detectors stay on, but disconnect the cable between the detectors and the computer end, the screen will show a interference pattern. This because no observation knowledge will be successfully collected.

If we think about this for a sec: because the time takes for particle to travel from the slits to screen is almost simultaneous, however the time takes for data to transmit from slit detector to computer is usually much longer, this creates a future-prediction mechanism I can take advantage of: I will read the result from screen, to predict if the ongoing detector data would have or would have not end up at computer end, even thought the detector data bits may be still in the middle of being transmitted!

“An Application”

Detector reading is the end that will be carried by our application. Those bits will be packaged into small units, attached with each AppServer-To-DbServer-Request. If customers request can successfully reach the destination, so does the predictor bits. Therefore we should have reading a double-band on screen long before. And vice versa, if the screen starts showing a interference pattern, we will know for sure that soon the connection will broke somewhere, guaranteed.

GUICE All in One

“Not enough mana.” –Diablo3

“You are right….Every project you touched turn into shit” – Nathan Drake, Uncharted 3

Yes that project is Argus, from Salesforce.

Recently I need to work on some core java projects like Argus (https://github.com/SalesforceEng/Argus). It uses GUICE, as a classic DI framework. After searched around, there is no good tutorial for beginer to quickly jump in—all materials seem to be incomplete in some way, include the one provided by google/guice. So I decide to make one myself. Thanks to #rajsarkapally-sfdc helped me.

So this is basic idea

Now let’s use one example from Otaku’s blog

The more I use dependency injection (DI) in my code, the more it alters the way I see both my design and implementation. Injection is so convenient and powerful that you end up wanting to make sure you use it as often as you can. And as it turns out, you can use it in many, many places. Let’s cover briefly the most obvious scenarios where DI, and more specifically, Guice, are a good fit: objects created either at class loading time or very early in your application. These two aspects are covered by either direct injection or by providers, which allow you to start building some of your object graph before you can inject more objects. I won’t go too much in details about these two use cases since they are explained in pretty much any Guice tutorial you can find on the net.

Once the injector has created your graph of objects, you are pretty much back to normal and instantiating your “runtime objects” (the objects you create during the life time of your application) the normal way, most likely with “new” or factories. However, you will quickly start noticing that you need some runtime information to create these objects, other parts of them could be injected.

Let’s take the following example: we have a GeoService interface that provides various geolocation functions, such as telling you location if given a person’s name:

1
2
3
4
5
6
7
8
package com.salesforce.service;

public class GeoService implements GeoServiceInterface{
  public boolean judge(String Name){
      System.out.println("juding...");
      return Name.isEmpty()?false:true;
  }
}

Then you have a Person class which uses this service and also needs a name and an address to be instantiated:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
package com.salesforce.kepler;
import com.google.inject.Inject;
import com.google.inject.assistedinject.Assisted;
import com.salesforce.service.GeoServiceInterface;

public class Person implements PersonInterface {
  private String Name;
  private GeoServiceInterface gs;

  private Person(String Name, GeoServiceInterface gs){
      this.Name=Name;
      this.gs=gs;
  }
  public boolean getLocationByName(){
      return gs.judge(Name);
  }
}

Something odd should jump at you right away with this class: while name and address are part of the identity of a Person object, the presence of the GeoService instance in it feels wrong. The service is a singleton that is created on start up, so a perfect candidate to be injected, but how can I achieve the creation of a Person object when some of its information is supplied by Guice and the other part by myself? Guice gives you a very elegant and flexible way to implement this scenario with “assisted injection”. (Quote ended here)

Above are basic. To use provider.

So in the end, this is what you need.

Links http://stackoverflow.com/questions/9237996/pass-parameter-to-constructor-with-guice https://github.com/google/guice/wiki/ProviderBindings http://beust.com/weblog/2012/08/21/advanced-dependency-injection-with-guice/

To Build an AI Like Never Before

JUST a draft…… it goes to in memorization of Shailesh

Ideas about AI.

1, during evolution, human has this skill of transfer information into knowledge. But is running on a slow crappy computer(brain) with low memory and horrible inter-connection. …if it can be simulated on modern computer, with computer’s high inter-connection and speed, will be supper. 2, Right now, All AI development I learned so far are based on mathematic. From English language sentiment categorizing to picture pattern matching….those ANN or SVM are all based on mathematic logic reasoning and calculation. Those are not the way when I learn stuff and get intelligence.

Therefore,

Yesterday, I consciously trying hard to record the process of how I learning a new thing…..So that I may be able to see the process of how do normal people transfer information into knowledge when they learn a new thing…..I think this is a key piece that is missing to make AI really work. So I learned a new terminology called Selenium. And this is what I recorded: 1, I remember I heard it years ago. 2, I saw on web there is a digram, three layer, selenium is in the middle. So I know it’s a key thing that connecting up and down. Therefore I decide to learn this Selenium. So far, What I did was, I try to draw a structure diagram/picture in my head. And now when I keep reading things about this Selenium,

I try to PLACE this selenium into that structure. And, during that, for every new piece of info I got, I was also trying to fit it so that I’m also validating the correctness of that structure. Based on this process, I think I should build an AI, not purely mathematical based(like traditional deep learning(ANN)), but this way

Step 1: crawling web and dig out a new terminology Step 2: read about this terminology and trying to place it into a network diagram, whilst also validating/revising the digram itself. Step 3: Repeat from Step 1

AngularJS Template

This post wrap up a hello-world demo for ngularJS web app, and should serve well as a template.

AngularJS is also MVC. However, different from true MVC, it’s everything running at client side within javascript runtime. Which means less performance and less security.

So what is Microservice Architecture:

index.html

1
2
3
4
5
6
7
8
9
10
11
12
13
14
<html lang="en" ng-app="mvcApp">
<head>
  <script src="bower_components/angular/angular.js"></script>
  <script src="bower_components/angular-route/angular-route.js"></script>
  <script src="main/main.js"></script>
  <script src="about/about.js"></script>
</head>
<body>

  <p>Hello world index page</p>
  <div ng-view>this used for routing</div>

</body>
</html>

main.js

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
'use strict';


/* App Module */
var mvcApp = angular.module('mvcApp', [
      'ngRoute',
      'module_about'
      ]);


mvcApp.config(['$routeProvider',
  function($routeProvider) {
      //console.log($routeProvider)
      $routeProvider.when('/about', {
              templateUrl: 'about/about.html',
              controller: 'aboutCrtl'
          }).otherwise({
              redirectTo: '/about'
          });
  }]);

about/about.html

1
2
3
<h2>About page</h2>
Author is: 
<helloworld/>

about/about.js

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38

//Declaration of this module
var module_about = angular.module('module_about',[]);

/* Controllers */
module_about.controller('aboutCrtl',['$scope','aboutFac',aboutCrtl]);
function aboutCrtl($scope,aboutFac){
  $scope.about_author='ethan wang';
  //Now I'm going to use this factory
  aboutFac.doGreet();
}

/* Model */
module_about.factory('aboutFac',aboutFac);
function aboutFac(){
  var factory={};
  factory.doGreet = function(){
      alert('doGreet method called!');
  }
  return factory;
}


//TO show case about directive
mvcApp.directive('helloworld',helloworldDirective);
function helloworldDirective(){
  return {
      restrict: 'AEC',
      replace: 'true',
      template: '<h3>Helloworld! Welcome to About page!!</h3>',
      link: function(scope, element, attrs){
          element.bind('click',function(){
              alert('you clicked!');
              //scope.$apply(function(){scope.color = "black";});
          });
      }
  }
}

To access code and see hello world in 5 sec, just go to this link:

1
https://github.com/Template-EthanFavoriate/TEMPLATE_ANGULARJS.git

Why Are There So Many Songs About Rainbow? ---My Journey as an Audiophile

So why to become am audiophile?

As a person watching computer screen all day, researching audio equipment (earphone headphone) is one of only few hobbies that I have. To investing on them is easily proven to be cost-efficent: I listen to them while I use computer, which is pretty much all day. Anyways, it’s been 5 years since I stepped into this once-step-in-no-step-back hi-fi journey. I decided to write something to wrap them up and give beginners some advices.

So why earphone?

As a way of enjoying sound, first off you have some options. You may go one of these three: speakers(those loud speaker boxes), headphones or earphones. First speaker is no brainer not an option for me, they are heavy to move and I can’t use them in office. Headphones are also bad because some open back ones leaks sound pretty bad which I also can not use in office. Ideally the one I’m looking for is something supper low-profile and not compromising hi-fidelity at the same time.

Journey of Sennheiser CX300

My journey starts from Sennheiser CX300, an earphone that designed for entry-level users. Even though my first CX300 turn out to be not an authentic one from Sennheiser, but it sounded just amazing, as I have no other to compare, and it opened a door for me to this aspiring hi-fidelity world. That one costs me about 40 bucks, which was quite a lot for me at that moment as a junior undergraduate. My biggest impressions were great isolation and surrounding feeling. Never before has a earphone feels so comfortable and so sound-isolating that after wearing it for a while, you magically feel the sound volume somehow gets turned up. Great isolation also leads to great surrounding feeling enriching any song with so much flavor and imaging.

->for beginners: In general at this price range(Less than $50), you are expecting some major improvement compare to apple’s earbuds. The ecstasy of trance begin to appear, and you will find yourself quite enjoying from the repetition of those trances in a music.

It is still in general horrible, for pretty much all earphones at this range, to handle classics or jazz. But for popular music and some vocals, this range is a great start. Recommend: Xiaomi Piston 2.(great great earphone at this range kind of overkills all others)

Journey of Sennheiser IE6

When I was still at Iowa doing my college program, one day at Cyber Monday I saw Sennheiser IE6 was list as a lighting-deal, pricing at $99. Therefore that soon became my second milestone on earphones. I have been marrying IE 6 for about 2 years straight. Hands down great earphones. I would’ve continue use then if not their cable finally worn out and I failed to fix them after couple tries. IE is a class that designed for audiophiles, and IE6 is an entry-level of that class.

->for beginners: At this price range ($50 - $200), there are a bunch of great earphones or headphones that are available. You will feel they are 1 to 2 level better than the previous price range, specially when comes down to fidelity and balance. They are considered ‘sub-flagships’ and is in my view the most cost-efficent price range to stay on. However compare to the flagships (IE 8, SE535, EX1000 etc), they more or less has one or two flaws regarding bass, mid-range or treble.

At this point they usually handle very well at popular music. Some maybe good at one particular genre than others, such as vocals or ACGs. Still, most are struggling at classics and jazz. Recommend: SE215, ATH M50x(it’s a headphone).

Journey of Shure SE535

Then comes SE535. In a word, I regret having bought it. Unlike my every other upgrades, which I usually felt satisfying and sense of accomplishments afterwards, the purchase of SE535 however was a disaster. She impressed me by a lot at first, then left me craving for more. For the starters, SE535 has a great fidelity, which means lots of details. It enables me to notice much more details that I wasn’t aware of, even from a song that I was looping all day long. The best of SE535 was her mid-range. Those ‘pure liquid gold’ was just incredible. This earphone made me think, the ultimate goal of audiophile, rather than to pursue those hard specs, is to ‘find yourself being moved’. When listening Tori Kelly was singing Colors of the wind, or when V.K was in the middle performing his piano solo Reflection, SE535 made me realize, after all, at its core, the music is an art of expressing feelings. And to accomplish this connection between listener and performer you will have to also involve your imagination…..SE535 gives you all that. She not only precisely captures the music structures and every other details, it blends them all into a poem or a story. This magic earphone forced me to ignore every individual instruments, but to focus on performer’s enthusiasm that deep inside. As a result, I feel Tori Kelly’s voice was so real that her lip were literally 1 inch away from my ear. It’s addictive. It’s almost a guilty pleasure.

Now, here is deal breaker. Besides middle range and vocals, everything else about SE535 is depressing. Specially when comes down to sound stage. The bass and treble is just OMG bad. Till one day I decide not to tolerate those anymore and I moved on.

->for beginners: At this price range($200 - $600), you will have to do a major study regarding the genre you usually listen to. There are SE535, IE 8, TF 10, Sony EX1000 and ATH CK100 etc. If you are looking for something that is a versatile, slightly analytical and well balanced, my recommendation is Westone W4R.

My current Stuff. HD600, HD800, LCD 2. AKG K701, JH13Pro, DAC/Amp Audist HUD MX1, Pico Slim, Fiio E8

Some pictures. And I will keep updating.

Django Micro-services Template

This post wrap up a hello-world demo for django Microservices web app, and shold serve well as a template.

First, What is Pattern: Microservices Architecture:

refer below from http://microservices.io/patterns/microservices.html

Context You are developing a server-side enterprise application. It must support a variety of different clients including desktop browsers, mobile browsers and native mobile applications. The application might also expose an API for 3rd parties to consume. It might also integrate with other applications via either web services or a message broker. The application handles requests (HTTP requests and messages) by executing business logic; accessing a database; exchanging messages with other systems; and returning a HTML/JSON/XML response.

The application has either a layered or hexagonal architecture and consists of different types of components:

Presentation components - responsible for handling HTTP requests and responding with either HTML or JSON/XML (for web services APIS) Business logic - the application’s business logic Database access logic - data access objects responsible for access the database Application integration logic - messaging layer, e.g. based on Spring integration. There are logical components corresponding to different functional areas of the application.

The motivations are :

There is a team of developers working on the application New team members must quickly become productive The application must be easy to understand and modify You want to practice continuous deployment of the application You must run multiple copies of the application on multiple machines in order to satisfy scalability and availability requirements You want to take advantage of emerging technologies (frameworks, programming languages, etc)

So what is Microservice Architecture:

Reference ends here. Now start my demo code. Below is how I plan to do it:

Tier one app. It’s an empty app, only has Settings.py and url.py is important. You start your whole application by bootstraping this app python manage.py runserver from here.

settings.py

1
2
3
4
5
6
7
8
9
10
INSTALLED_APPS = (
    'django.contrib.admin',
    'django.contrib.auth',
    'django.contrib.contenttypes',
    'django.contrib.sessions',
    'django.contrib.messages',
    'django.contrib.staticfiles',
    'keplerapp_tbcheck',
    'keplerapp_tbmodel',
)

url.py

1
2
3
4
5
6
7
8
9
urlpatterns = patterns('',
    # Examples:
    # url(r'^$', 'keplerweb.views.home', name='home'),
    # url(r'^blog/', include('blog.urls')),
    url(r'^demo/', include(keplerapp_tbcheck.urls)),
    url(r'^$', include(keplerapp_tbcheck.urls)),
    url(r'^admin/', include(admin.site.urls)),
    #url(r'^$', include(admin.site.urls))
)

Now go to another tier one app, tbcheck. This app serves as view app, with all template, tamplate tag, css stuff there url.py

1
2
3
4
urlpatterns = patterns('',
    url(r'', 'keplerapp_tbcheck.views.service_check'),
    url(r'template', 'keplerapp_tbcheck.views.service_default')
)

views.py

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
from keplerapp_tbmodel.models import Employee

def service_check(request):
    emp = Employee()
    emp.truncate()
    emp.prepare_data()

    sec_orglist = map(lambda x:x.EmpName,emp.get_Name_by_OrgID(2))

    context={}
    context['sec_orglist']=sec_orglist
    context['dictionary_key_A']='value of dictionary key A: 10230'
    return render_to_response('keplerapp_tbcheck/intro_index.html', context)

def service_default(request):
    #return HttpResponse("defalt page 404")
    return render_to_response('keplerapp_tbcheck/intro_index.html', {})

At last is tier two app. tbmodel

Only has models.py in it. It is just another package.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
from django.db import models
from datetime import datetime

class Employee(models.Model):
  EmpID = models.AutoField(primary_key=True,blank=True)
  OrgId = models.CharField(max_length=50, db_index=True)#PNC
  EmpLevelCode = models.IntegerField(db_index=True,blank=True,null=True)#6
  EmpName = models.CharField(max_length=100)#Gender
  EmpGender = models.CharField(max_length=50,blank=True,null=True)#F
  EmpHire_dttm = models.DateTimeField(blank=True,null=True)
  upd_dttm = models.DateTimeField(auto_now=True)

  def __unicode__(self):
      return u'System: %s'%self.EmpName

  class Meta:
      app_label = 'keplerapp_tbmodel'
      db_table = u'employee'

  #Return a Generic object
  @classmethod
  def get_Name_by_EmpID(cls,EmpID):
      return cls.objects.filter(EmpID=EmpID)

  @classmethod
  def get_Name_by_OrgID(cls,OrgId):
      return cls.objects.filter(OrgId=OrgId)

  @classmethod
  def truncate(cls):
      cls.objects.all().delete()

  @classmethod
  def prepare_data(cls):
      Employee(OrgId='2',EmpLevelCode=1,EmpName='Ethan W').save()
      Employee(OrgId='2',EmpLevelCode=1,EmpName='Tom H').save()
      Employee(OrgId='2',EmpLevelCode=2,EmpName='Eric G').save()
      Employee(OrgId='3',EmpLevelCode=1,EmpName='Larry E').save()
      Employee(OrgId='3',EmpLevelCode=2,EmpName='Larry P').save()

now when we run it, we first pip install all view, model app as package to site-package places. Then bootstrap the first app to start.

To access code and see hello world in 5 sec, just go to this link:

1
2
https://github.com/Template-EthanFavoriate/TEMPLATE_MICROSERVICE
git@github.com:Template-EthanFavoriate/TEMPLATE_MICROSERVICE.git

Where Did Preformance Go? Dive Into Multi-process/multi-threads in CPython

I’ve been run in to this kind of problem recently: one day I found out a solution written in python multi-thread approach take more running time than single thread. This is really confusing. I decided to dive into that…and some interesting stuff were found out.

Note that my computer is multi-core processor cpu. i7 2.3Ghz QuardCore CPU, 16G DDR3 Memory. With SSD Hard Drive.

Essentially every thing starts from this little piece of code recursively solving a permutation problem

1
2
3
4
5
6
7
8
9
10
11
12
13
#!/usr/bin/python
import datetime


L = [1,2,3,4,5,7,8,9,10,11]

def gen(index,value):
  if index==len(L):
      return 1
  count=0
  for i in range(len(value)+1):
      count+=gen(index+1,value[:i]+[L[index]]+value[i:])
  return count

Below are how it was wroted in single thread, multi-thread and multi-process. Along with the running time.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48

#Single thread approach
#time0=datetime.datetime.now()
#print gen(1,[1]) #runtime 480ms-10   4500ms-11  50sec-12 11>>>>>0:00:04.823974
#time1=datetime.datetime.now()
#print time1-time0



pointer1=0
pointer2=0
def t1_start():
  global pointer1
  pointer1=gen(2,[2,1])
  print pointer1

def t2_start():
  global pointer2
  pointer2=gen(2,[1,2])
  print pointer2


#from threading import Thread
#t1=Thread(target=t1_start,args=())
#t2=Thread(target=t2_start,args=())
#time0=datetime.datetime.now()
#t1.start()
#t2.start()
#t1.join()
#t2.join()
#time1=datetime.datetime.now()
#print pointer1+pointer2,time1-time0 #runtime 650ms-10  6392ms-11 72sec -12 11>>>>>>>0:00:05.082194



import multiprocessing
p1=multiprocessing.Process(target=t1_start,args=())
p2=multiprocessing.Process(target=t2_start,args=())
time0=datetime.datetime.now()
p1.start()
p2.start()
p1.join()
p2.join()
time1=datetime.datetime.now()
print pointer1+pointer2,time1-time0 #runtime 480ms-10 4400ms-11 48sec-12   >>>>>>>02.225561

#
#

Also, I simply increased the task from 10 digit array to 12 digit array, this takes roughly 1 min for the node to solve the problem. so the new result is this: single thread: 50 sec; multi-thread: 72 sec. I also tried implement it using multi-process. surprisingly multi-process cut the time in half. so the basically result is:

single thread RUNTIME T multi-thread RUNTIME T multi-process RUNTIME T/2

This does not explain why there is no improvement from single thread to multi-threads. therefore I took a snapshot of CPU workload.

Note that even single thread it actually involves all my cpu cores. I’m now suspecting maybe CPython interpreter/optimizer or Intel CPU instruction set is pretty smart to handle even single thread solution to force it run on multiple CPUs.

Standard DFS and BFS

given a graph below

1
2
3
4
5
6
7
8
9
#!/usr/bin/python

graph={}

graph['A']=['B','D']
graph['B']=['A','E','C']
graph['C']=['B']
graph['D']=['A','E']
graph['E']=['B','D']

To traverse this simple graph, usually people go on and on talking about DFS and BFS, and there are many versions of algorithm book 101 talk about the way of implementing it. Today I got asked again by a colleague, and after search around online surprisingly I found there is a no good example anywhere that set a standard for beginners to start. Therefore I coded one for him. And it was pretty tricky since there are some caveat need be take into consideration. I decided to post that here so that it can benefit other people who might need it.

Start with this DFS. I call it classic-DFS This gives the good DFS: It Ends when All NODE ARE VISITED

1
2
3
4
5
6
7
8
9
def DFS(node):
  print node,visited
  for next in graph[node]:
      if next not in visited:
          visited.append(next)
          DFS(next)
          print 'returnning'
visited=['A']
DFS('A')

This one is difference than the one above, it gives all possible path DFS: it ends when ALL PATH ARE SEARCHED. It also very good for circle detecting. I call it standard DFS or S-DFS

1
2
3
4
5
6
7
8
9
10
11
12
13
def SDFS(node):
  print node,stack
  for next in graph[node]:
      if next not in stack:
          stack.append(next)
          SDFS(next)
          stack.pop()
          print 'returnning'
#     elif next in stack and next <> stack[-2]:
#         print 'circle!'

stack=['A']
SDFS('A')

Now this one is a BFS one. it is very straight forward.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
def BFS(node):
  global Q
  print node,Q
  for next in graph[node]:
      if next not in visted:
          Q.append(next)
          visited.append(next)
  Q=Q[1:]
  while len(Q)>0:
      next=Q[0]
      BFS(next)

Q=['A']
visited=['A']
#BFS('A')