Visit my new wordpress blog to view this content :)
Database Administration Blog
A blog about Database Administration, Nosql and Hadoop technologies.
Thursday 26 May 2016
Saturday 19 September 2015
Thursday 17 September 2015
Monday 14 September 2015
Loading data to hbase - bulk and non-bulk loading
Loading csv to hadoop fs:
hadoop fs -put test.tsv /tmp/
hadoop fs -ls /tmp/
hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.columns="HBASE_ROW_KEY,cf1:c1,cf1:c2" -Dimporttsv.separator="," -Dimporttsv.bulk.output="/tmp/hbaseoutput" t1 /tmp/test.tsv
b) Upload the data from the HFiles located at /tmp/hbaseoutput to the HBase table t1
hbase org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles /tmp/hbaseoutput t1
hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.columns="HBASE_ROW_KEY,cf1:c1,cf1:c2" t1 /tmp/test.tsv
hadoop fs -put test.tsv /tmp/
hadoop fs -ls /tmp/
1. BULK LOADING
a) Preparing StoreFileshbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.columns="HBASE_ROW_KEY,cf1:c1,cf1:c2" -Dimporttsv.separator="," -Dimporttsv.bulk.output="/tmp/hbaseoutput" t1 /tmp/test.tsv
b) Upload the data from the HFiles located at /tmp/hbaseoutput to the HBase table t1
hbase org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles /tmp/hbaseoutput t1
2. NON-BULK LOADING
Upload the data from TSV format in HDFS into HBase via Putshbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.columns="HBASE_ROW_KEY,cf1:c1,cf1:c2" t1 /tmp/test.tsv
Sunday 25 May 2014
Postgres tablespace creation
Create a tablespace in postgresql in two simple steps :
1) Make a tablespace directory
mkdir -p /var/lib/pgsql/tablespaces/<tablespace_name>
cd /var/lib/pgsql/tablespaces/
chmod -R 700 <tablespace_name>
2) Create tablespace
psql test
test=# create tablespace <tablespace_name> location '/var/lib/pgsql/tablespaces/<tablespace_name>';
After creating tablespace we should basically include this in the ddl
SET default_tablespace = <tablespace_name>;
Create table mytable(id integer);
and now table 'mytable' will belong to our newly created tablespace!
1) Make a tablespace directory
mkdir -p /var/lib/pgsql/tablespaces/<tablespace_name>
cd /var/lib/pgsql/tablespaces/
chmod -R 700 <tablespace_name>
2) Create tablespace
psql test
test=# create tablespace <tablespace_name> location '/var/lib/pgsql/tablespaces/<tablespace_name>';
After creating tablespace we should basically include this in the ddl
SET default_tablespace = <tablespace_name>;
Create table mytable(id integer);
and now table 'mytable' will belong to our newly created tablespace!
Thursday 8 May 2014
Slony - number of records yet to be processed in sl_log tables
Two tables in SLONY - sl_log_1 and sl_log_2 stores the changes which need to be propagated to the subscriber nodes. Slony will try to log switch between both of these tables and truncate each of them once all the changes are propagated to the subscriber node. Sometimes there is a chance that these tables grow very huge because of a big table or large data set sync. You could also notice in the logs that SYNC events are taking long time.
Also you may get this error in slony log in master
NOTICE: Slony-I: could not lock sl_log_1 - sl_log_1 not truncated
Finding number of records yet to be processed by slony is important.
Query to find number of records in sl_log_1 yet to be processed by slony
select count(*) from sl_log_1 where log_txid>(select split_part(cast(ev_snapshot as text),':',1)::bigint from sl_event where ev_seqno=(select st_last_event from sl_status));
similarly you can find number of records yet to be processed in sl_log_2 using
select count(*) from sl_log_2 where log_txid>(select split_part(cast(ev_snapshot as text),':',1)::bigint from sl_event where ev_seqno=(select st_last_event from sl_status));
similarly you can find number of records yet to be processed in sl_log_2 using
select count(*) from sl_log_2 where log_txid>(select split_part(cast(ev_snapshot as text),':',1)::bigint from sl_event where ev_seqno=(select st_last_event from sl_status));
Subscribe to:
Posts (Atom)