Thursday, October 25, 2012

Book Review: Ethics of Big Data

Book Review:  Ethics of Big Data by Kord Davis and Doug Patterson: Publisher- O'Reilly: ISBN- 13: 978-1449311797

Ethics of Big Data is one of the rare books which bring up question of ethics in business to engineering level.

Though book title is confusing, it should be something like “Ethics of Personal Data” to characterize its content more accurately and ethically. It seems author wanted to ride on Big Data wave.

Book covers ethical question from four dimensions:

  1. Privacy,
  2. Identity,
  3. Ownership, and
  4. Reputation

in a very comprehensive manner.  Author misses one vital dimension – Public/Social Good. Though I am trained in philosophy but I find any ethical discussion which is devoured of social/public good vey hollow.

Kord tracks the ethical question in very smart way and avoids being prescriptive but advice on periphery so business specific solutions can be devised and implemented.

Disclaimer: I did not get paid to review this book, and I do not stand to gain anything if you buy the book. I got electronic format of book from publisher for review.

One can get more information about book and related topics from:

  1. Amazon:
  2. Publisher -- Oreilly
Another Review:

Wednesday, October 24, 2012

Version Control System: For my Ninth Grader - Part 3

Version Control System: For my Ninth Grader - Part 1

Version Control System: For my Ninth Grader - Part 2

Day 3

Teacher: During our previous discussions we concluded that no single approach is suitable for large scale collaboration. If we combine multiple approaches then automation is require to derive substantial benefits. So let us list what we need from a system which can facilitate large scale collaboration:

  1. Single person should be able to work on a document at a given point of time OR/AND multiple persons should be able to work on a same document simultaneously
  2. Each individual should be able work on his copy of document on his computer without need of connection to common computer.
  3. Each individual’s work should be traceable
  4. History of document should be constructable
  5. Documents should be protected against vandalism
  6. Individual contributor should be able to comment about his work, so one can understand about his work without reading whole document. 

Let us build our understanding on the requirements of the proposed system.

First requirement is full of contraction and made up of two parts. Part One: Single person should be able to work on a document at a given point of time. This statement is straight forward. It simply says that on a given document at a given point of time only one person should be able to work even that document is available to multiple persons.  Now, how to achieve this?  Very simple, make a user interface (web page or some other software) which allow to mark a document stating that this document has been checked out (remember in library you check out a book) by Mr. X and do not allow anybody else to change the document till Mr. X check in (again think of library) the document. In, library once a book is checked out, book is not available on the book shelf in our proposed system document is available but read only. If someone who has not checked out the document and try to over write, should not be allowed to do so. Now Part Two: Multiple persons should be able to work on a same document simultaneously. This requirement is in direct contrast to Part one. So, what to do? What about a switch which can be flipped. If switch is in position A, proposed system should behave to fulfill Part One requirement and if switch is in position B then allow user to perform as stated by Part Two. Who has control of this magic switch? Certainly not all users, only few privileged. We call them Administrators.  Part Two requirement poses its own challenges. Think about following scenario:

There is a document (doc_under_test.doc) which is under control of proposed system and Administrator is keeping the magic switch at position B. At Oct 1, 2012 8.00 AM, Mr. X has checked out the document and start working on it. While Mr. X is making changes to document, Mr. Y also came in (at Oct 1, 2012 9.00 AM) and checked out the document and start working over it. So, on same document two persons are working independently and not aware what other one is doing with the document. At Oct 1, 2012 1.00 PM, Mr. Y checks in his work. Everything is fine till this time. At Oct 1, 2012 3.00 PM Mr. X checks in his document. After check in by Mr. X, work of Mr. Y is lost.  How to resolve this challenge? 

When Mr. X was checking in his document, flash a warning, stating that the last time when you checked out the document and till now document has changed, to avoid potential loss of work, find out the difference, accept or reject the changes and then check in. Essentially human intervention is required.

The second requirement of “each individual should be able work on his copy of document on his computer without need of connection to common computer” is pretty straight forward. Keep all documents on a common computer and allow users to download the documents on their local computers. The common computer can be called server (one who serves) and a computer on which document is downloaded is called client (one who asks for service). This technique will ensure that too many persons are not connected to server and also each person can work on document without need to continuous (persisted) connection to server.

The third requirement of traceability is also easy to achieve. Not everyone is allowed to access the proposed system. Those who are allowed should have some kind of user name and password to access documents. So be in touch with your administrator to get user name and password. Whenever someone checks out or checks in a document, log his user name. This will allow tracing, who has interacted with system at what time.

The third requirement simply facilitates to find out that who has interacted with system at what time but does not allow keeping record of changes in document. So proposed system should keep copies of document with each check in and correlate those copies back to users. To make easier for users, show latest copy to users by default.  So, fourth requirement is fulfilled.

Since any interaction with document is traced to a single user, chances of vandalism are greatly reduced. Who want to be caught!

The sixth requirement states about a desire also a good practice. How to achieve it! Computer cannot figure out what is the description of work a user has carried out. But proposed system can provide place holder where a user can write a brief description of his work. Those who are interested will read it. Think of Mr. X in Part Two of requirement one.

Students:  Wow. Almost all requirements are fulfilled.

Yash:  Does system you explained exist?

Teacher: Yes. It is called Version control system. It also has more fancy names such as Revision control system, source control system, and few more.

So using version control system vast number of people work on projects, who are located in various countries and do not know each other. That’s the beauty of version control systems.