packet loss between two specific VMs

I am experiencing some high-percentage packet loss (more than 70%) between two specific VMs in a local network, in a remote cluster (I have no physical access to it). One machine is used as a hadoop master and the other as a hadoop slave. The strange thing is that when I ping any other machine … Read more

Namenodes fails starting on HA cluster – Fatals exists in Journalnode logs

I am having some problem with my Hadoop Cluster Centos 7.3 Hortonworks Ambari 2.4.2 Hortonworks HDP 2.5.3 Ambari stderr: 2017-04-06 10:49:49,039 – Getting jmx metrics from NN failed. URL: http://master02.mydomain.local:50070/jmx?qry=Hadoop:service=NameNode,name=FSNamesystem Traceback (most recent call last): File “/usr/lib/python2.6/site-packages/resource_management/libraries/functions/jmx.py”, line 38, in get_value_from_jmx _, data, _ = get_user_call_output(cmd, user=run_user, quiet=False) File “/usr/lib/python2.6/site-packages/resource_management/libraries/functions/get_user_call_output.py”, line 61, in get_user_call_output raise … Read more

Datanode not showing in WEB interface

Newbie on hadoop clusters. I have setup my two nodes conf as described by M. G. Noll here. The datanode has datanode & tasktracker running (jps command show them). However in the WEB UI I only see one node for the DFS Live Node : 1 Dead Node : 0 Same thing on the MapRed … Read more

Hadoop streaming job on EC2 stays in “pending” state

Trying to experiment with Hadoop and Streaming using cloudera distribution CDH3 on Ubuntu. Have valid data in hdfs:// ready for processing. Wrote little streaming mapper in python. When I launch a mapper only job using: hadoop jar /usr/lib/hadoop/contrib/streaming/hadoop-streaming*.jar -file /usr/src/mystuff/mapper.py -mapper /usr/src/mystuff/mapper.py -input /incoming/STBFlow/* -output testOP hadoop duly decides it will use 66 mappers on … Read more

Hadoop + NAT scenario

I have a situation where I’d like to run Hadoop spread across 2 clusters. The first cluster (ClusterA) is normal and all nodes are publicly accessible. The second cluster (ClusterB) is behind a NAT. Nodes in ClusterA will be running both Mapred and HDFS, while nodes in ClusterB will be running Mapred without HDFS and … Read more

How do I grant a user permission to use Hadoop via Kerberos?

I’ve setup Hadoop to use Kerberos (following the Cloudera security guide), but it is unclear how I connect to hadoop with regular users (e.g. username=myuser). Currently I have myself authenticated with Kerberos with my Keberos admin user (via kinit kerbadmin/admin), but that doesn’t seem to help. Do I need to tell Hadoop that kerberos user … Read more

Hadoop Configuration Files – Who Needs What?

As I am setting up Hadoop, one question keeps popping in my mind but I can’t find the answer. Which Hadoop configuration files need to be copied to which nodes. For example, I’m making changes to the following files: hadoop-env.sh, core-site.xml, mapred-site.xml, hdfs-site.xml, masters, slaves Do I need to copy these files to ALL my … Read more

Hadoop: Slave nodes are not starting

I am trying to setup a Pseudo Distributed Hadoop Cluster on my machine. Env Details : Host OS: Windows Guest OS: Ubuntu Vm’s Created one master and one slave. I was able to run the hadoop wordcount successfully on single node cluster But when i tried to add the slave, the datanode, jobtracker, namenode and … Read more

installing and configuring hadoop on ubuntu

I’m unable to locate the file hadoop-ec2-env.sh. I’ve downloaded and intalled hadoop_1.0.4-1_x86_64.deb from http://mirror.olnevhost.net/pub/apache/hadoop/common/stable/ This is the most recent stable version. I would like to run hadoop on EC2. I’m following a tutorial that says: Edit all relevant variables in src/contrib/ec2/bin/hadoop-ec2-env.sh. Where is hadoop-ec2-env.sh? Answer It would be in the [hadoop source location]src/contrib/ec2/bin/hadoop-ec2-env.sh If you … Read more

How do I reconfigure the default port or uninstall the interactive Javascript console in HDInsight?

The Microsoft HDInsight Preview is available for download and install via the Web Platform Installer, which bundles Hadoop on Azure installer. As part of the process, it installs something that listens on port 8080 and serves up an interactive JavaScript console for doing MapReduce, Hive, and Pig queries. This blocks my default port for IIS … Read more