Apache Hive is a data warehouse infrastructure that facilitates querying and managing large data sets which resides in distributed storage system. It is developed on top of Hadoop. Hive has its own SQL-like query language called HiveQL (Hive Query Language).
Hive query language is similar to SQL wherein it supports subqueries. With Hive query language, it is possible to take a MapReduce joins across Hive tables.
Since Hive is built on top of Hadoop, Java and Hadoop needs to be installed in your system. If Hadoop is not configured on your system, you can follow the step by step installation instructions HERE
Before installing the Hive, make sure your Hadoop installation is fine and all the core services of Hadoop are up and running. Environment used in this setup is ubuntu 18.04 and hive version is 3.1.2.
– – –
Note: Prefer java 8, as newer versions no longer has URLClassLoader which is required for running hive.
Now. lets begin the installation process of hive by downloading the latest stables release from (http://apachemirror.wuchna.com/hive/hive-3.1.2/), for older versions visit (http://apachemirror.wuchna.com/hive/).
To download the release of your choice use the following commands. (Change the directory and download link according to your preference).
cd /usr/local sudo wget http://apachemirror.wuchna.com/hive/hive-3.1.2/apache-hive-3.1.2-bin.tar.gz
Extract the the hive file in same location.
sudo tar xvzf apache-hive-3.1.2-bin.tar.gz
Rename the extracted folder
sudo mv apache-hive-3.1.2-bin hive
Let's work together!
Adding hive path to the environment is necessary, else you would have to move to the hive directory to run commands.
Open the bashrc file by running
sudo nano ~/.bashrc
Add the following lines to the end of the bashrc file
# Set HIVE_HOME export HIVE_HOME=/usr/local/hive export PATH=$PATH:$HIVE_HOME/bin
Now, load the hive environment variables by running the following commands
source ~/.bashrc
Creating directories …
Now we need to create Hive directories within HDFS.
hdfs dfs -mkdir /bigdata/tmp
Now for the hive to save table or other miscellaneous data we need to create another directory.
hdfs dfs -mkdir -p /bigdata/hive/warehouse
Adding Permissions …
hdfs dfs -chmod g+w /bigdata/tmp hdfs dfs -chmod g+w /bigdata/hive/warehouse
Change the working directory to hive configurations location
cd /usr/local/hive/conf
Open the hive-env file by running the following command
sudo nano hive-env.sh
Add the following configurations to the end of the file (Change the paths according to your setup)
# Set HADOOP_HOME to point to a specific hadoop install directory export HADOOP_HOME=/usr/local/hadoop # Hive Configuration Directory can be controlled by: export HIVE_CONF_DIR=/usr/local/hive/conf # Java Home export JAVA_HOME=/usr
Metastore is the central repository of Apache Hive metadata. It stores metadata for Hive tables (like their schema and location) and partitions in a relational database.
All Hive implementations need a metastore service, where it stores metadata. By default, Hive uses a built-in Derby SQL server. You can also choose MySQL, Postgres, Oracle, MS SQL Server as Hive Metastore.
We will be using MySQL for this configuration. Metastore configuration needs to be specified in the hive-site.xml file.
First things first, lets install the latest mysql version using aptitude. You can skip this step if the system has a mysql installation .
sudo apt-get update sudo apt-get install mysql-server
If the secure installation utility does not launch automatically after the installation completes, enter the following command:
sudo mysql_secure_installation utility
This utility prompts you to define the mysql root password and other security-related options, including removing remote access to the root user and setting the root password.
sudo systemctl start mysql
This command starts the mysql service
sudo systemctl enable mysql
This command ensures that the database server launches after a reboot
Now after successful installation of mysql server, we need to install the mysql java connector. Run the following command to install the connector.
sudo apt-get install libmysql-java
For the hive to access the mysql connector, a soft link needs to be created for the connector in hive lib folder or the jar file should be copied to the hive lib folder.
ln -s /usr/share/java/mysql-connector-java.jar $HIVE_HOME/lib/mysql-connector-java.jar
Create the Initial database schema using the hive-schema-3.1.0.mysql.sql file ( or the file corresponding to your installed version of Hive) located in the $HIVE_HOME/scripts/metastore/upgrade/mysql directory.
Login to mysql shell
mysql -u root -p
Create db for metastore
CREATE DATABASE metastore; USE metastore; SOURCE /usr/local/hive/scripts/metastore/upgrade/mysql/hive-schema-3.1.0.mysql.sql;
For Hive to access the metastore a MySQL user account needs to be created. It is very important to prevent this user account from creating or altering tables in the metastore database schema. (Don’t forget the quotes)
CREATE USER 'hiveuser'@'%' IDENTIFIED BY 'hivepassword'; GRANT all on *.* to 'hiveuser'@localhost identified by 'hivepassword'; flush privileges;
Now we’ve created the metastore and hive user in mysql. Lets define the metastore configuration in hive-site.xml.
Open the hive-site file by running the following command
cd /usr/local/hive/conf sudo nano hive-site.xml
Add the following configurations
configuration> <property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:mysql://localhost/metastore?createDatabaseIfNotExist=true</value> <description>metadata is stored in a MySQL server</description> </property> <property> <name>javax.jdo.option.ConnectionDriverName</name> <value>com.mysql.jdbc.Driver</value> <description>MySQL JDBC driver class</description> </property> <property> <name>javax.jdo.option.ConnectionUserName</name> <value>hiveuser</value> <description>user name for connecting to mysql server</description> </property> <property> <name>javax.jdo.option.ConnectionPassword</name> <value>password</value> <description>hivepassword for connecting to mysql server</description> </property> </configuration>
All set… Now lets dive into the hive console. Type in the following command in your terminal and hit enter
hive
If you see any errors related to jdbc driver not found, check whether you have successfully linked or copied the driver to hive lib folder.
If you were able to get into the hive console without any errors, you can verify your metastore configuration by following these steps.
Create a table in hive.
create table test(id int, name string);
Now exit from the hive console (Type exit and hit enter)
Let’s see if the table was successfully added in our metastore. Login to the mysql console using hive user credentials.
mysql -u root -p
Change the database and view tables.
use metastore; select * from TBLS;
If you see the test table listed in TBLS table, then your installation was successful. If not, check your mysql database and user configuration for hive metastore.
Get in touch with us
This article was originally published by Adarsh M. S in his medium page HERE