UP | HOME

gbutils: command line econometrics

Table of Contents

Getting Started

This is a brief description of gbutils (since version 5.6), a set of command line utilities for the manipulation and statistical analysis of data. These utilities read data from standard input in an ASCII format and print the result in ASCII format to standard output. See the overview for more details and join gbutils google group for discussion and news.

Since version 6.0, the gbutils package contains an updated version of the programs originally distributed in the discontinued subbotools package. See Documentation for further information.

In many cases, the output of the utilities in the gbutils package is designed in a format suitable to be sent to other program, like the graph program in the plotutils package, for plotting. Alternatively, these utilities can be used inside an interactive gnuplot session, or inside a gnuplot script, with the help of the special datafile identifier < (see Gnuplot documentations for details).

example1.png

Let us see some examples. Figure 1 reports empirical estimates of the density of \(200\) realizations of a exponential power random variable, obtained using both a binned distribution and a kernel estimate. This picture can be obtained in terminal with the following list of commands:

gbrand -c 1 -r 200 gaussian 1 > data.txt

gbker < data.txt >kernel.txt

gbhisto -n 20 -M 2 <data.txt > histogram.txt 

gbplot -T 'png enhan crop' -o example1.png plot 'w histeps title "binned",\
"kernel.txt" w l title "kernel" ' < histogram.txt

rm data.dat kernel.txt histogram.txt

In the first line, a sample of 200 observations is independently drawn form a Gaussian distribution with unit variance and saved in the file data.txt. The second line builds a kernel estimate of the density and the third an histograms. Results are saved in kernel.txt and histogram.txt, respectively. The command in the fourth line, which continues on the fifth, generates the plot and save the result in example1.png and the last line remove all the intermediary files.

example2.png


The second example, Figure 2, is a scatter plot of couples of points together with a non-linear kernel regression.

gbrand -c 2 -r 200 gaussian 1 | gbfun 'x1' 'x1+.5*x2' > data.txt

gbkreg < data.txt > kernel_reg.txt

gbplot -T 'png enhan crop' -o example2.png plot 'w p pt 5,\
"kernel_reg.txt" w l ' < data.txt

rm kernel_reg.txt data.txt

The data generated in the first line and saved in data.txt are independent couples of correlated random variables. A kernel regression is performed in the second line. The third (and fourth) lines produce the plot.

What these commands do is to generate random samples, somehow manipulate them, perform some model estimation and finally plot the result. All this in one line of code.

Both pictures are generated using gbplot, which is an interface to the gnuplot graphic program. To reproduce these pictures the latter must be installed in your system. Grasping all the details of the lines above is probably complicated. The baseline message, whoever, should be clear: even complicated analysis can be split in a sequence of very simple steps.

The programs in gbutils are completely written in C. Some of them can make use of external libraries (see below).

These programs have been tested (and repeatedly used for many years) on different Linux distributions and should compile and run, in principle, on any Unix platform. Other operating systems, most notably Windows, have been scarcely tested.

These programs have been originally written for personal use. Even if today they are used by several persons, please consider that I maintain and develop them in my spare time. They are distributed under the GPL license in the hope they could be of help to other people, but without any implied warranty.

Please report bugs to mailto:gbutils@googlegroups.com

Requirements

To be installed from source, the package requires a C compiler and the standard C library. Unix-like environment in which GNU auto-tools can work is required for automatic installation.

Many programs in gbutils do not require ANY external library apart the standard ones. There are however exceptions. Several programs use the GNU Scientific Library (GSL) (version >= 1.6) and other the GNU matheval library (version >=1.0.1).

See the Programs summary table for the list of dependencies of the different utilities.

At install time, if a library is not found, the programs requiring it are omitted from the list of programs to be installed. If the zlib library is found on the system, all the utilities are compiled with the capability of reading gz-compressed input.

For more information about GSL, including installation procedures on various platforms, check here. GNU matheval library is hosted here and zlib home page is here. Notice that, in principle, using programs linked with different libraries can produce slightly different outputs.

Installation instructions

On Linux, the gbutils package can be either installed using a .deb package or from source. The first method is recommended. You can download the binaries executable for your system from the Debian or the Ubuntu repositories. Alternatively, the latest source code packages are available in the Download section. The installation from source requires the gcc compiler, the make utility and, in order to have all utilities with all the features activated, the development (dev) version of the gsl, matheval and zlib libraries. You should be able to install all of them using the package manager of your Linux distribution. Once all the necessary tools and library are installed, the procedure is rather straightforward.

wget ftp://cafed.sssup.it/packages/gbutils-{version}.tar.gz
tar xvzf gbutils-{version}.tar.gz
cd gbutils-{version}
./configure
make
su
make install

Check the README file in the distributed package for further details.

In Windows 10 the gbutils package can be installed directly form the Bash shell using the Windows Subsystem for Linux. To activate it, open the Settings app, go to "Update & Security -> For Developers" and activate the “Developer Mode”. Then open the Control Panel, go to “Programs -> Turn Windows Features On or Off” and enable the “Windows Subsystem for Linux” option. Confirm with “OK" and reboot the system to install the new software. Open Microsoft Store and install your preferred Linux distribution. I suggest Ubuntu or Debian. Now you have a Linux system inside Windows 10. Launch the just installed application or start the "Bash" program to enter a console and install the package with

sudo apt-get update 
sudo apt-get install gbutils

In older versions of Windows, the gbutils package is installed in the Cygwin environment. Follow the instructions in the cygwin installation page.

Documentation

For a brief survey of the package features see the overview. The gbget utility is the Swiss knife of the package, the main tool to perform data manipulation. Its special role commands a specific gbget tutorial.

The utility gbtest provides a number of non-parametric statistical tests. They are described in the tests summary page.

The gbutils package also includes a set of utilities specifically aimed to the estimation of symmetric and asymmetric Laplace and power exponential distributions. You can find a description of these utilities in power exponential estimation.

A brief summary of options and modes of operation can be obtained from all the programs using the command line option -h or --help. All the programs are accompanied by automatically generated manual pages which can be isnpected with the man Unix utility.

Contributors

Cees Diks, Federico Tamagni, Angelo Secchi and Davide Pirino provided helpful suggestions and coding.

Download

Created: 2024-03-05 Tue 12:19