Thursday, August 29, 2013

Distributed Systems and CAP Theorem: For my sophomore

Yash: Papa, do you have few minutes!!

Me: Yes. What is the issue!!

Yash: Nothing. I was wondering what is CAP theorem? I was trying to understand it but it is confusing. It talks about Consistency, Availability and Partition Tolerance in Distributed Systems. I have little idea about distributed systems but CAP Theorem ….

Me: It should not be a challenge. First tell me what do you know about Distributed Systems?

Yash: In distributed systems, various components of software systems run on different machines. So, if one component needs to be scaled up, that can be done individually without affecting remaining components.

Me: OK. Give me an example.

Yash: Let us assume, I have a Java program which also has database. So as you explained to me in MVC pattern (link to MVC here) there will be few distinct components:

  • JSP s representing View
  • Some servlets which will be holding controller logic
  • Java Beans which will be talking to database
  • Database to hold data


Me: Hold on. Add one more component in your list.

  • Images which will be displayed in browser with JSPs.

Yash: OK. So I have 5 components. So potentially I can host my application on five different computers. One machine for each component. And suppose with time, due to increased data, I can add one more machine for database. So, totaling 6 machines. Here I am taking benefit of distributive computing.

Me: Fantastic!!

Me: Should be talk about CAP Theorem?

Yash: Yes. I want to know it.

Me: OK. So we will start from non-computer life.  One day you wake up and had a great idea to start your own company.  Its name will be AskMe. It has very simple service to offer. People will call AskMe and give their reviews about restaurants. Also they can ask AskMe to give reviews about a particular restaurant as given by other reviewers. 

Since you were very enthusiastic about this idea. , so you talk with your friends in school. Few of your friends become member of AskMe. To keep track of each member and review you started to write details in a Note book. 

AskMe service is unique and people liked the idea of restaurants review over phone, word of mouth helped. AskMe started to gain new customers. With increased customer base, you started to miss calls, which is not good for business. 

You rope in your brother in AskMe business. You wanted that customer should have only one telephone number for AskMe, so got another line with same telephone number and internally rote the calls depending upon who is free (You and your brother). You also gave a separate note book to your brother where he can jot down reviews and also read the reviews if asked. You were happy. 

But within few days, you realized that some restaurants are in your note book and some other in your brother’s note book. It means AskMe is not giving correct reviews to its customer. You and your brother think through the problem at hand and decided that every day in morning, both need to tally each other note book, and ensure that both note books are in sync.

This arrangement worked but after few days, you and your brother have small fight and your brother decided not to sync his note book with yours. AskMe is in trouble. Also during non-fight days you noticed that AskMe cannot start working till both of you sync your booklets and invariably both note books are out of sync during day time.

Now let us list the challenges AskMe is facing:

  • Both notebooks may not be in sync all the time. This may not happen because one of you may decide not to sync due to any reason
  • If during day time, if you both decide to sync your note books, during that time AskMe service will not be available
  • One of you may not show at work which will result in missed calls

Yash: It seems to me that out of these three challenges only two can be solved at a given point of time. 

  • If AskMe decide to sync all notes book all the time, few calls will be missed.
  • If at least one of us decides not to show for work again few calls will be missed.
  • If AskMe add more people to take calls, there will be more time consumed to sync up the note books, again few calls will be missed.

It seems to me that there is no perfect solution.

Me: This is CAP Theorem. C is for Consistency, A is for Availability, and P is for Partition Tolerance. In any distributed computer system, all three cannot be achieved.

Yash: Does it mean, all distributed computer systems are not perfect in terms of Consistency, Availability and Partition Tolerance?

Me: Yes, but there are techniques to reduce the impact.

Friday, August 23, 2013

Book Review: Data Science for Business

Book Review:  Data Science for Business: What you need to know about data mining and data-analytic thinking by Foster Provost and Tom Fawcett: Publisher- O'Reilly: ISBN- 13: 978-1449361327

As name suggest Data Science for Business is for business people. It is an education book for business people not for technical crowd.

Book explains Data science and data mining concepts in very crisp manner. Those who does not have patience to read whole book should read chapter two. Now if you interested then go ahead. I guess you will.

Book is around 500 pages long which makes it difficult to read but still book has good content for business folks without using arcane jargon.

Disclaimer: I did not get paid to review this book, and I do not stand to gain anything if you buy the book. I have no relationship with the publisher or the author. I got electronic format of book from publisher for review.

One can get more information about book and related topics from:

  1. Book’s web presence
  2. Amazon:
  3. Publisher -- Oreilly

Tuesday, August 20, 2013

Cloud Computing: for my sophomore

Me: Do you know, what is cloud computing?

Yash: Are we doing some computing with clouds! :-)

Me: Nope.  Do you want to know about what is Cloud Computing?

Yash: Yes.

Me: Lets us make an analogy for Cloud computing.  OK, I have one.
I have few questions for you.  In which type of house, your friends John, Ram, Rita and Tom lives?

Yash: Means?

Me: Let me rephrase. Does your friends John, ram, Rita and Tom lives in independent single family houses or in apartment complex?

Yash: Each of these my friends live in independent single family houses.

Me: Ok. Does it also mean that each of your family is responsible for their water bill, electric bill, gas bill, maintenance of house, and front yard.

Yash: Yes. It is their house. It is their responsibility to maintain their houses.

Me: Understand. But is also means that they (or their parents) have to assign some minimum number of hours per week for upkeep of house.

Yash: Yes, they do.

Me: But in return of this sweat, they gate freedom to remodel their houses internally as well as externally. One import aspect we were missing – Huge money to purchase a house. Your friends had spent lot of money to purchase a house.

Yash: Yes. Provided they get permit from city.

Me: Sure. 

Me: Now assume all of your friends are living in a multi-story apartment complex on lease. In this case your friends (actually their parents) can offload some of the responsibilities of maintenance to apartment complex management like maintenance of green areas, shared lights, common walk ways, parking areas, building fa├žade, etc. They need to worry only about consumables like electric bill, water bill, etc.

Yash: Yes.

Me: Also, in limited sense your friends can customize their houses – decoration etc. but cannot change basic structure of apartment.

Yash: Yes.

Me: Now, let us draw analogy. Independent houses are equivalent to computing environment where each enterprise (company) purchases computers, maintain them and get freedom to run any type of software program to run on those computers. This is traditional computing

Yash: Then, what is cloud computing?

Me: In case of multi-story apartment complex building, each tent is not owning any apartment but leasing it. Similarly in cloud computing environment, each tenant (company/enterprise) lease computers (computing power) and pays money on regular cycle. Tenant also pays money on consumables – like bandwidth used, number of users on computers, etc. So tenant does not pay any money to purchase of computers upfront but pays regularly for the resources consumed.  This is cloud computing.

Yash: So cloud computing essentially about leasing of computing power and paying money regularly instead of upfront.

Me: Yes, but not forget about offloading of day to day work to cloud provider (maintaining front yard, etc.) but limited flexibility to work within limits set by cloud computing provider.

Yash: This cool.

Me: Cloud computing provider ensures that one tenant does not encroach into another tenant’s house.

Yash: Yeah, it is must. No one likes that someone peak into their house.

Me: Depending upon level of flexibility to modify apartments, there are various types of cloud provider.

Yash: so what are various types of clod computing providers?

Me: Let us assume multi story apartment complex apartment’s management rents not the apartments but an area in building with defined boundaries. In this case tenants can design their own apartments.  These types of cloud computing providers are equivalent to Infrastructure as a Service (IaaS).

If multi-story apartment complex provider provides fully built apartments but allows sufficient flexibility to shift few walls here and there. These types of cloud providers are called Platform as a Service (PaaS).

In last case, fully built apartment is provided with only choice is left to lease of decoration. These types of cloud computing providers are called Software as a Service (SaaS).

Yash: I feel cloud computing can be summarized in one statement.

Me: which one?

Yash: As Spiderman said - With great power comes great responsibility.

Multi-threading: For my sophomore

Yash: Today I have learned about Multi-threading in Java.

Me: Excellent. Explain to me.

Yash: I can explain it as a game.

Me: Wow!!

Yash: It is very easy. Let us assume that I have four of my friends are at home.  Each one has one marker and an eraser. There is also one small size white board in the room.  In this game each person is supposed to write his name at white board. But board is so small that only one name can fit in at a given point of time. Now let us begin the game.

In the start of game each person is asked to write his name on the board. Now everyone tries to write his name on board. Since board can hold only one name, so it is possible that nobody able to write his name on the board. Ouch!! It is mess.

Now let us change rules little bit. Now you pitch in seeing the mess and set one more rule. Rule is: Any one can write his name on the board if he has token provided by you.

Now game starts again. This time boys have raced to get token from you not at white board. One who gets the token gets a chance to write his name on the board.  This means at a given point of time only one person is working on board.

Here you can replace boys with threads, white board with object of piece of code on which multiple threads want to operate, token with monitor.

This is how multi-threading is implemented in Java.

Me: Superb…

Agile Manifesto

Today, I got confirmation from Agile Manifesto team that same is available in Hindi as well. I was trying to get is published since long.

Thanks Shane Hastie, Maha Raja and other team members.

Agile Manifesto in Hindi.

Tuesday, August 13, 2013

Process and Thread: For my sophomore

Yash: What is a process?

Me: Let me think.

Yash: Also what is Thread?

Me: Now you are making things simpler.

Yash: How?

Me: Let us start from scratch. In computers, a program has two characteristics. It has some memory space (contiguous memory locations) and instructions to be executed by processor. So process is a thing which has its own memory space and instructions to be executed. No two processes share memory space. Each process has its own memory space.

Yash: So you mean to say, if I have two processes are running in a computer, both have its own non sharing memory space at a given point of time.

Me: Yes.

Yash: Now what is thread?

Me: Not it is tricky. If you consider a process as a set - remember your eight grade mathematics, and then threads are sub set of process set. It means threads run within memory space of a process and even threads can share memory space.

Yash: I can draw this into a picture.

Me: Try it.

Yash: so this is my understanding.

Me: Excellent.