Tuesday, August 27, 2013

Introduction to NoSQL and Neo4j graph database


What is NoSQL?

NoSQL word is derived from the word SQL. SQL stands for Structured Query Language. Basically SQL is a special kind of programming language (actually it's a query language) which is used to manage data (retrieve data  data,insert data,update existing data) in relational database management system.

So What's a relational database? If I wrote a complete description about the relational database this post will be too long to read. :) Shortly relational database uses the relational schema as the data model of the database. In a relational database data is stored as collection of relations (tables). 

So again let's come to our question. NoSQL ? 

NoSQL doesn't means Not SQL or good bye SQL. It's simply means NOT ONLY SQL. There are several common characteristics for NoSQL databases. Those are

In relational database system the relational schema is the data model of the database. So what is the data model of the NoSQL database. Actually there are several data storing models for NoSQL database. Based on those data models the NoSQL databases can be catorgorized as follows.
  • Column
  • Document
  • Key-Value
  • Graph
Next part of this document contains detailed description of the Graph data model and how to implement a simple graph database using Neo4j graph database.

Graph data Model?

Graph is the most generic data structure that we can think of when representing/ storing the data. What is a graph? It is a collection of vertices and edges.


Here in this example graph we have three vertices. Let denotes the vertices set as V={a,b,c}. If we denotes the set of edges as E, we can write E={ab,ac, bc}. The graph in the image is not a directional graph. That means the direction of the edges actually doesn't matter. But in our graph data model we uses the directional graphs.





Let's consider the following graph,

Here in this graph we have several people and their relationships. Basically in a graph database data is stored in basically two places.  They are,

  • Vertices /Nodes
  • Edges / Relationships

In this graph "Peter" is a Node. This node might contain several properties. In this case it has only one property (that is name). The edge between Node Peter and Node Ray can be called as a relationship. Relationship must have some type and might contain other properties. In this case relationship type between Peter and Ray is Knows. But it might contain other properties such as Since when peter knows Ray. Note: In this scenario Peter knows Ray, but Ray does't know Peter because there is no directed edge going from Ray to Peter. 

Now let see how to build a simple graph database using Neo4j.

What is Neo4j?

Neo4j is a NoSQL database management system which uses a graph data as the NoSQL data model. For more details you can visit Neo4j home page: http://www.neo4j.org/

How to Install Neo4j?

Installation Neo4j is extremely simple. You first have to download the Neo4j from Neo4j homepage. Neo4j is available for both windows and Linux environments.
To download the Neo4j use the following Link.

After You have downloaded the zip file uncompressed it to any location you want. In Linux environment to run the Neo4j database server just follow the below steps.
  • First go to your neo4j folder using the cd command
  • Then go inside the bin folder
  • type ./neo4j start to start the neo4j database. You can type ./neo4j-shell to start the neo4j database and neo4j-shell in the same time.
Then in the terminal it will show a localhost port number where the neo4j server runs.










In terminal showing start up of the neo4j server 














Neo4j web admin interface.

Creating Our first graph database

So let's create a simple graph database. I will create a graph database for the graph shown in the figure above.

To create a node for peter. Like that we can create nodes for all other people.


 CREATE n={name:'peter', age:'21' ,sex:'male'};  
 CREATE n={name:'slimer', age:'22' ,sex:'male'};  
 CREATE n={name:'winston', age:'20' ,sex:'male'};  
 CREATE n={name:'egon', age:'20' ,sex:'male'};  
 CREATE n={name:'ray', age:'20' ,sex:'male'};  


Now we have to represent the Relationships between these friends.

To that we need to know particular nodes with their node id.

To find all nodes in your database


 START n=NODE(*) RETURN n;  

To Create a relation ship between peter and ray and other friends

 START a=NODE(20) ,b=NODE(21) CREATE a-[r:knows]->b RETURN r;  
 START a=NODE(24) ,b=NODE(23) CREATE a-[r:knows]->b RETURN r;  
 START a=NODE(24) ,b=NODE(22) CREATE a-[r:knows]->b RETURN r;  
 START a=NODE(22) ,b=NODE(21) CREATE a-[r:knows]->b RETURN r;  




That's it. It is that simple Now we have created our simple graph.





Now Let's do some queries. If we want to find all the friends of 'ray'

 START n=NODE(24) MATCH n-[:knows]-friend RETURN friend;  



If we want to find ray's friend's friends,

 START n=NODE(24) MATCH n-[:knows]-()-[:knows]->friend RETURN friend;  






This is a very small introduction on Neo4j. It is actually very powerful database management system which enables you to do large data processing very efficiently. If you want to learn more on Neo4j Please download and read the Neo4j manual. I hope this tutorial helped you to understand the basic knowledge on the graph databases and how to use Neo4j to create simple graph database. If you have any problems regarding graph databases or Neo4j Please send me an email or Comment you question on this article. I'll try my best to answer. :)

7 comments:

  1. This is a really great introductory blog post. Please continue the good work.

    Perhaps a next one on the visualization options? What is your primary programming language that you will be accessing Neo4j from?

    We would like to send you a thank-you t-shirt. So please send your t-shirt size and postal address to michael at neo4j dot org.

    ReplyDelete
    Replies
    1. Thank you Michael Hunger... :) Your are the first person who commented on my blog spot post. :D

      java is my main programming language to access Neo4j.

      I would love to have a thank-you t-shirt.... :)

      Delete
    2. I try to send you the mail to the given mail address but it fails... Do you have another email address?

      Delete
  2. This comment has been removed by the author.

    ReplyDelete
  3. very informative blog and useful article thank you for sharing with us , keep posting learn more Big Data Hadoop Online Training

    ReplyDelete