Hive Installation on ubuntu 18.04

Apache Hive is a data warehouse infrastructure that facilitates querying and managing large data sets which resides in distributed storage system. It is developed on top of Hadoop. Hive has its own SQL-like query language called HiveQL (Hive Query Language).

Hive query language is similar to SQL wherein it supports subqueries. With Hive query language, it is possible to take a MapReduce joins across Hive tables.

Since Hive is built on top of Hadoop, Java and Hadoop needs to be installed in your system. If Hadoop is not configured on your system, you can follow the step by step installation instructions HERE

Before installing the Hive, make sure your Hadoop installation is fine and all the core services of Hadoop are up and running. Environment used in this setup is ubuntu 18.04 and hive version is 3.1.2.

– – –

Note: Prefer java 8, as newer versions no longer has URLClassLoader which is required for running hive.

Hive 3.x installation

Now. lets begin the installation process of hive by downloading the latest stables release from (http://apachemirror.wuchna.com/hive/hive-3.1.2/), for older versions visit (http://apachemirror.wuchna.com/hive/).

To download the release of your choice use the following commands. (Change the directory and download link according to your preference).

cd /usr/local
sudo wget http://apachemirror.wuchna.com/hive/hive-3.1.2/apache-hive-3.1.2-bin.tar.gz

Extract the the hive file in same location.

sudo tar xvzf apache-hive-3.1.2-bin.tar.gz

Rename the extracted folder

sudo mv apache-hive-3.1.2-bin hive

Need help with your product development?

Let's work together!

Contact

Adding Hive Environment Variables

Adding hive path to the environment is necessary, else you would have to move to the hive directory to run commands.

Open the bashrc file by running

sudo nano ~/.bashrc

Add the following lines to the end of the bashrc file

# Set HIVE_HOME 
export HIVE_HOME=/usr/local/hive 
export PATH=$PATH:$HIVE_HOME/bin

Now, load the hive environment variables by running the following commands

source ~/.bashrc

Creating directories …
Now we need to create Hive directories within HDFS.

hdfs dfs -mkdir /bigdata/tmp

Now for the hive to save table or other miscellaneous data we need to create another directory.

hdfs dfs -mkdir -p /bigdata/hive/warehouse

Adding Permissions …

hdfs dfs -chmod g+w /bigdata/tmp
hdfs dfs -chmod g+w /bigdata/hive/warehouse

Configuring Hive …

Change the working directory to hive configurations location

cd /usr/local/hive/conf

hive-env.sh

Open the hive-env file by running the following command

sudo nano hive-env.sh

Add the following configurations to the end of the file (Change the paths according to your setup)

# Set HADOOP_HOME to point to a specific hadoop install directory
export HADOOP_HOME=/usr/local/hadoop
# Hive Configuration Directory can be controlled by: 
export HIVE_CONF_DIR=/usr/local/hive/conf
# Java Home 
export JAVA_HOME=/usr

Metastore Configuration

Metastore is the central repository of Apache Hive metadata. It stores metadata for Hive tables (like their schema and location) and partitions in a relational database.

All Hive implementations need a metastore service, where it stores metadata. By default, Hive uses a built-in Derby SQL server. You can also choose MySQL, Postgres, Oracle, MS SQL Server as Hive Metastore.

We will be using MySQL for this configuration. Metastore configuration needs to be specified in the hive-site.xml file.

First things first, lets install the latest mysql version using aptitude. You can skip this step if the system has a mysql installation .

sudo apt-get update
sudo apt-get install mysql-server

If the secure installation utility does not launch automatically after the installation completes, enter the following command:

sudo mysql_secure_installation utility

This utility prompts you to define the mysql root password and other security-related options, including removing remote access to the root user and setting the root password.

sudo systemctl start mysql

This command starts the mysql service

sudo systemctl enable mysql

This command ensures that the database server launches after a reboot

Now after successful installation of mysql server, we need to install the mysql java connector. Run the following command to install the connector.

sudo apt-get install libmysql-java

For the hive to access the mysql connector, a soft link needs to be created for the connector in hive lib folder or the jar file should be copied to the hive lib folder.

ln -s /usr/share/java/mysql-connector-java.jar $HIVE_HOME/lib/mysql-connector-java.jar

Create the Initial database schema using the hive-schema-3.1.0.mysql.sql file ( or the file corresponding to your installed version of Hive) located in the $HIVE_HOME/scripts/metastore/upgrade/mysql directory.

Login to mysql shell

mysql -u root -p

Create db for metastore

CREATE DATABASE metastore;
USE metastore;
SOURCE /usr/local/hive/scripts/metastore/upgrade/mysql/hive-schema-3.1.0.mysql.sql;

For Hive to access the metastore a MySQL user account needs to be created. It is very important to prevent this user account from creating or altering tables in the metastore database schema. (Don’t forget the quotes)

CREATE USER 'hiveuser'@'%' IDENTIFIED BY 'hivepassword';
GRANT all on *.* to 'hiveuser'@localhost identified by 'hivepassword';
flush privileges;

Now we’ve created the metastore and hive user in mysql. Lets define the metastore configuration in hive-site.xml.

hive-site.xml

Open the hive-site file by running the following command

cd /usr/local/hive/conf
sudo nano hive-site.xml

Add the following configurations

configuration> 
   <property> 
      <name>javax.jdo.option.ConnectionURL</name> 
      <value>jdbc:mysql://localhost/metastore?createDatabaseIfNotExist=true</value> 
      <description>metadata is stored in a MySQL server</description> 
   </property> 
   <property> 
      <name>javax.jdo.option.ConnectionDriverName</name> 
      <value>com.mysql.jdbc.Driver</value> 
      <description>MySQL JDBC driver class</description> 
   </property> 
   <property> 
      <name>javax.jdo.option.ConnectionUserName</name> 
      <value>hiveuser</value> 
      <description>user name for connecting to mysql server</description> 
   </property> 
   <property> 
      <name>javax.jdo.option.ConnectionPassword</name> 
      <value>password</value> 
      <description>hivepassword for connecting to mysql server</description> 
   </property> 
</configuration>

All set… Now lets dive into the hive console. Type in the following command in your terminal and hit enter

hive

If you see any errors related to jdbc driver not found, check whether you have successfully linked or copied the driver to hive lib folder.

Extras …

If you were able to get into the hive console without any errors, you can verify your metastore configuration by following these steps.

Create a table in hive.

create table test(id int, name string);

Now exit from the hive console (Type exit and hit enter)

Let’s see if the table was successfully added in our metastore. Login to the mysql console using hive user credentials.

mysql -u root -p

Change the database and view tables.

use metastore;
select * from TBLS;

If you see the test table listed in TBLS table, then your installation was successful. If not, check your mysql database and user configuration for hive metastore.

Looking for reliable software development team?

Get in touch with us

Contact

This article was originally published by Adarsh M. S in his medium page HERE

ASK AUTHOR

Adarsh M S

Technology enthusiast with an urge to explore into vast areas of advancing technologies. Experienced in domains like Computer Vision, ... Read more

Ask A Question

Question*

Name*

E-mail id *

I agree to Accubits's Privacy and Cookies Policy.

Error

Cancel

Send

Know more

Featured Post

MENU

Hive Installation on ubuntu 18.04 | MySQL Metastore

Hive 3.x installation

Need help with your product development?

Adding Hive Environment Variables

Configuring Hive …

Metastore Configuration

Extras …

Looking for reliable software development team?

Related Articles

The Art Of Reaping Actionable Insights From Your Data Silos

How to set up a custom Hadoop single node cluster in the pseudo-distributed mode

How NLP can facilitate better governance and enhanced citizen experience

What is Big Data Ingestion and Why it matters for your business?

ASK AUTHOR

Categories

Subscribe now to get our latest posts