Monday, May 25, 2015

How to upgrade R and RStudio


Type below commands in command line

sudo apt-get install libcurl4-openssl-dev
sudo apt-get install libxml2-dev
sudo add-apt-repository ppa:marutter/rrutter
sudo apt-get update
sudo apt-get upgrade
sudo apt-get install r-base r-base-dev

now restart your Rstudio You will see all packages will work.

Note:
If you will face any problem then type

install.packages(“type packages name here”, dependencies=TRUE)

Friday, May 22, 2015

RHive Setup and Sample Code

Congiguration:

install.packages("Rserve")
install.packages("rJava")
install.packages("RHive")
library(Rserve)
library(rJava)
library(RHive)
Sys.setenv(HADOOP_HOME="/home/ravi/apache/hadoop-1.0.4")
Sys.setenv(HIVE_HOME="/home/ravi/apache/hive-0.12.0/")
Sys.setenv()
rhive.env()
rhive.init()
rhive.connect(hiveServer2=FALSE)


If rJava does'nt get installed properly try below command

sudo apt-get install r-cran-rjava
 
sudo updatedb && locate libjvm.so
 
and then try  :

install.packages("rJava")

After that if error is:

Error: java.io.IOException: Mkdirs failed to create file:/home/rhive/lib/2.0-0.2


Sys.setenv("RHIVE_FS_HOME"="your RHive installation directory here e.g. /home/rhive")
This needs to be local directory on the node with hive installed, create one if doesnt exist. The user created (rstudio) have chown -R rights on this local directory.

For other erroes visit the below link :
http://dailyitsolutions.blogspot.in/2014_12_01_archive.html

Problem 2:
Rhive works well with simple queriess but does not work with complex queries or queries with aggregate SQL functions.
We get the following error code on R console(RStudio)
Error: java.sql.SQLException: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask 
 OR
Following exception on Hiveserver2 console
org.apache.hadoop.security.AccessControlException: Permission denied: user=anonymous, access=EXECUTE, inode="/tmp/hadoop-yarn/staging/anonymous/.staging":trendwise:supergroup:drwx------

Solution:
To resolve the above problem, we need to change the permission mode of the /tmp folder. To do so type,
$hadoop fs -chmod -R 777 /tmp/hadoop-yarn

 Sample Code:


library(Rserve)
library(RHive)
Sys.setenv(HADOOP_HOME="/usr/local/hadoop")
Sys.setenv(HIVE_HOME="/usr/local/hive")
Sys.setenv("RHIVE_FS_HOME"="/home/trendwise/R/x86_64-pc-linux-gnu-library/3.2/RHive")
rhive.env()
rhive.init()
rhive.connect(hiveServer2=TRUE)
rhive.query("use database_name")
rhive.query("select * from table");





Wednesday, May 13, 2015

Installation of Pig

Download Pig from  Here
Go to the downloaded directory.

$ cd Downloads


Untar the  zip file with below command or do right click and extract here

$ tar -xvzf pig-0.14.0.tar.gz



Move extracted pig to created directory you want to install Pig. In my case it is /usr/local/pig


$ sudo mv pig-0.15.0 /usr/local/pig



Set path for PIG_HOME in .bashrc file 

 $ vim.tiny ~/.bashrc



Append below lines at the end of the file


export PIG_HOME=/usr/local/pig/pig-0.14.0
export PATH=$PATH:$PIG_HOME/bin 


Restart Terminal

$ pig

And We are Done!!



By default Pig creates logs in the current directory. If we want those to be crated in a specific directory we will create it and set the property in $PIG_HOME/conf/pig.properties

Create directory for logs wherever you wish. I created in  $PIG_HOME

$ mkdir /usr/local/pig/logs

go to conf directory in $PIG_HOME


$ cd /usr/local/pig/conf/

Open pig.properties and set the path

$ sudo vim.tiny pig.properties 

set the property to pig.logfile and uncomment it.

pig.logfile=/usr/local/pig/logs





Tuesday, May 12, 2015

Installation of Hive

Download Hive from Here


Go to directory where it is downloaded

Untar the downloaded hive by below commnd or right cllick and extract. 

$ tar -xzvf hive-0.10.0.tar.gz
 Move the folder to /usr/local from Downloads or to your preferable folder. 
I prefer /usr/local because my hadoop is installed at /usr/local/hive
 
$ mv apache-hive-1.2.0-bin /usr/local/hive
 
Set the environment variable HIVE_HOME
 
 $ vim.tiny ~/.bashrc

Add the following lines at the bottom of the file
# Set HIVE_HOME
export HIVE_HOME="/usr/lib/hive/apache-hive-0.13.0-bin"
export PATH=$PATH:$HIVE_HOME/bin 

Close and Restart the Terminal


$  Hive
We are Done!!