Gallery

Gallery.

HBase

HBase

BUILD+DEPLOY HBASE 3

dly-hbase-build-deploy

FULL BUILD

available hadoop profiles are:

HOSTS

update /etc/hosts so hostname is resolved to 127.0.0.1 (not to 127.0.1.1) more /etc/hosts 127.0.0.1 localhost 127.0.0.1 eric.datalayer.io eric

ZOOKEEPER

To point HBase at an existing ZooKeeper cluster, one that is not managed by HBase, set HBASE_MANAGES_ZK in conf/hbase-env.sh to ‘false’. Next set ensemble locations and client port, if non-standard, in hbase-site.xml, or add a suitably configured zoo.cfg to HBase’s CLASSPATH. HBase will prefer the configuration found in zoo.cfg over any settings in hbase-site.xml.

START

#start-hbase.sh $HBASE_HOME/bin/hbase-daemons.sh start master tail -f $HBASE_HOME/logs/hbase-eric-master-eric.log cat $HBASE_HOME/logs/hbase-eric-master-eric.log $HBASE_HOME/bin/hbase-daemons.sh start regionserver tail -f $HBASE_HOME/logs/hbase-eric-regionserver-eric.log cat $HBASE_HOME/logs/hbase-eric-regionserver-eric.log

CONNECTION PORT

UI

SHELL

$HBASE_HOME/bin/hbase shell help create ‘test’, ‘cf’ list ‘test’ put ‘test’, ‘row1’, ‘cf:a’, ‘value1’ put ‘test’, ‘row2’, ‘cf:b’, ‘value2’ put ‘test’, ‘row3’, ‘cf:c’, ‘value3’ scan ‘test’ get ‘test’, ‘row1’ disable ‘test’ drop ‘test’ exit

STOP

#stop-hbase.sh $HBASE_HOME/bin/hbase-daemons.sh stop regionserver $HBASE_HOME/bin/hbase-daemons.sh stop master

ECLIPSE

Import with m2eclipse with profile=’!hadoop-1.0, !hadoop-1.1, !hadoop-2.0, hadoop-3.0’

BUILD HBASE DOC

mvn site

mvn site -Dmaven.javadoc.skip=true

mvn docbkx-maven-plugin:generate-html

mvn docbookx:generate-html (generate-rtf, generate-html, generate-pdf, generate-manpages, generate-epub, generate-javahelp, generate-xhtml, generate-webhelp, generate-eclipse)

DATA MODEL

OTHERS

Datalayer Data NoSql HBase MapReduce

HBase MapReduce usage examples.

BUILD A DISTRIBUTION

$ git clone git@github.com:yahoo/samoa.git samoa.git $ cd samoa.git $ mvn install -DskipTests -Pstorm

Forest CoverType contains the forest cover type for 30 x 30 meter cells obtained from US Forest Service (USFS)

Region 2 Resource Information System (RIS) data. It contains 581,012 instances and 54 attributes,

and it has been used in several papers on data stream classification.

$ wget “http://downloads.sourceforge.net/project/moa-datastream/Datasets/Classification/covtypeNorm.arff.zip” $ unzip covtypeNorm.arff.zip $ wget “http://repo1.maven.org/maven2/org/slf4j/slf4j-simple/1.7.2/slf4j-simple-1.7.2.jar”

Run an Example. Classifying the CoverType dataset with the VerticalHoeffdingTree in local mode.

$ java -cp ./slf4j-simple-1.7.2.jar:./target/SAMOA-Storm-0.0.1-SNAPSHOT.jar:/opt/apache-storm-0.9.1-incubating/lib/* com.yahoo.labs.samoa.LocalStormDoTask “PrequentialEvaluation -l classifiers.trees.VerticalHoeffdingTre-s (ArffFileStream -f covtypeNorm.arff) -f 100000”

The output will be a sequence of the evaluation metrics for accuracy, taken every 100,000 instances.

To run the example on Storm, please refer to the instructions on the wiki [http://github.com/yahoo/samoa/wiki/Executing-SAMOA-with-Apache-Storm]

Spark-Connector

do we use distributed memory cache? (hazelcast…) are there json, avro… requirements? need for special queries? (spatial with geohash…)

lucene on hbase

hbase secondary indexes

lucene on cassandra

cassandra secondary indexes

hadoop append

solr & solrcloud

elasticsearch

senseidb

msc

helpful links