Showing posts with label other academic disciplines. Show all posts
Showing posts with label other academic disciplines. Show all posts

Friday, April 12, 2013

Version Control

This is a nerdy post, relevant only for empirical researchers in social sciences. It may also be relevant for those whose job involves the creation of tons of computer files to finish one project, though.

Matthew Gentzkow and Jesse M. Shapiro of Chicago Booth advocate the use of version control in empirical research of social sciences (See chapter 3 of their writing entitled "Code and Data for the Social Sciences: A Practitioner’s Guide").

If you are new to the idea of version control, watch a series of videos from Software Carpentry.

The basic idea of Gentzkow and Shapiro is that social science empirical researchers should think of writing data analysis scripts as developing software to be released to the public. We need to allow other researchers to replicate our empirical findings. For this purpose, we should make public all the codes and datasets once you publish the paper. It's often the case, however, that by the time you publish the paper, your computer directories are cluttered with many files unnecessary to produce the final results. And cleaning them up often ends up the inability of replicating the final results that you have obtained for the paper to be published. Version control can avoid such a problem.

However, it seems to me that the main benefit of version control is something else: to track the evolution of your thoughts on each empirical research project.

We empirical researchers often face a situation like this:

"Well, I need to analyze this particular thing. I think I did it a few months ago. Which files did I write for this purpose? I cannot find them in my computer."

So you have to start from scratch. A massive waste of time.

The branching function of version control (a great illustration can be found on section 3.4 of Git Pro, written by Scott Chacon) helps us avoid this problem. Every time you try out a new way of analyzing the data, create a new branch (call it the test branch). Within the test branch, keep developing your code. If it turns out to be a bad idea, you can stop working on the test branch and go back to the "master" branch. This way, all the new files you wrote for the failed idea disappear from your working directory. All the clutters are cleaned away. However, these files are preserved behind the curtain. If you later find the failed idea to be actually a good one, you can recover all the files you created in the test branch. Then, you can merge all these files in the test branch with those in the master branch very easily.

There are several systems of version control out there. Git appears to be the best one for branching. (And this article confirms my impression.) However, Git itself is a Unix-based software. Its user interface is not particularly friendly unless you are  a computer programmer.

Among a wide range of visualization software for Git (see the partial list provided by the Git official website), I find Gitbox the most intuitive. It's like an iPhone. Without reading a manual, you can use it. It runs on Mac OS. For Windows, I don't know which one is the best.

The only problem with Gitbox is that it does not visualize branches. Perhaps it is a good idea to also use another graphical interface software for the purpose of visualizing branches only. But it seems to me that none of the available software is very good at visualizing branches.

There is one issue with Git per se. It's a "distributed" version control system. That is, you keep all the files in your local computer and, whenever appropriate, sync them with a remote server (a bit like Evernote). And all the previous versions of each file will be stored in your local computer. This is fine if you only write ascii files. It's not fine if you "version-control" binary files such as data and images. If you use Git, therefore, it's a good idea to version-control those scripts to run statistical software only. Data can be reproduced by running those "tracked" scripts each time.

As opposed to the distributed system, there is also a centralized version control system (such as Subversion), which keeps track of file histories on a remote server. (See this article for the comparison of centralized and distributed.) The drawback of the centralized version control is that branching takes time (because each time you create a new branch, you need to download every file from the remote server). If the main benefit of version control is branching, then the distributed system appears to be the way to go.

Another merit of using version control is to make collaboration easy. It's an effective tool to avoid different people edit different parts of the project, ending up lots of conflicts that cannot immediately be resolved. For collaborative use of version control, however, your coauthors also know about version control (which is totally new to anyone in social science) and agree on when to create a new branch and when to "commit" your works. (To commit means to record all the file changes you have made so that they can be tracked in the future.) Which doesn't seem to be easy.

I'm still learning about version control. One thing that I have to figure out is to use Dropbox for version control. Freshmob and Sam Doidge suggest how to do it.


Friday, March 30, 2012

Survival analysis

If you are a Stata user with lots of experiences of conducting statistical and econometric analysis but have never learned survival analysis before, An Introduction to Survival Analysis Using Stata, written by Mario A. Cleves, William W. Gould, and Roberto G. Gutierrez is the best way to go through a crash course of survival analysis on your own. I've never read any textbook of statistics or econometrics as easy to follow and practical as this one.

Saturday, April 24, 2010

A useful tool for those academics who use Mac OS X

These days, all the published academic papers are assigned the DOI identifier. If you type this number after "http://dx.doi.org/" in your browser address bar, you can directly access to the webpage from which you can download the paper.

It's annoying that you need to type "http://dx.doi.org/" every time, however. If you are an Apple user, there is an excellent solution. Check this out.

Thursday, February 26, 2009

Day of an Assistant Professor (44)

1. Work on the climate change project with my colleagues.

2. Finish writing and submit the referee report.

3. Read Wright (2008). He empirically finds: (1) Among autocratic regimes, personal rule and monarchy are more likely to be found in countries with mineral resources and small population. (2) The cross-sectional correlation between the regime stability and the likelihood of the presence of national legislature in autocracies is positive for single-party and military regimes but negative for personal rules and monarchies.

Wednesday, March 15, 2006

Life Expectancy

Lent term is about to end. I learned several things during this term. But I'm sure I'm going to forget them in a month or so. So let me write them down here. The first instalment is about life expectancy. If you spot any inaccurate descriptions below, let me know by leaving a comment.

1. Life expectancy is visualized as follows. Take age on the x-axis and take the proportion of survivors to a certain age on the y-axis. From the data on death rates for each age cohort, you can plot the proportion of survivors for each age. The area surrounded by the x and y axes and the plot curve is life expectancy at birth. (Added on 16th March: Of course, one cannot estimate life expectancy for each cohort because then you need to track this cohort until the last person dies. For practicality, demographers assume that each cohort will face the same survival probability as the one currently faced by older cohorts. Special thanks to Bessho-san (his Japanese blog), who emailed me on this point.)

2. In the demography literature, various ways to decompose change in life expectancy between two points in time have been developed during the past 25 years. But the most useful is still one of the first proposals: Arriaga, E.E. (1984). "Measuring and explaining the change in life expectancies", Demography, 21, pp.83-96.

3. Crude death rates are sensitive to age composition of the population. Even if the death rates for every age group doesn't change, the crude death rate of the population goes up if the proportion of old people to the population goes up. A solution to this is standardized death rates, in which the population death rate is calculated with the age structure fixed. But then the arbitrariness of the choice of a "standard" age composition is a problem.

4. A good reference of these issues is Demography: Measuring and Modeling Population Processes, by Samuel H. Preston, Patrick Heuveline, and Michel Guillot (Blackwell Publishing, 2001).

Wednesday, May 18, 2005

DESTIN-STICERD joint seminar part II

The second DESTIN-STICERD joint seminar. (See here for the first one held last November.)

This time, the main theme is "Aid and the End of Poverty". As it suggests, the background is Jeffery Sach's recent book The End of Poverty (see 9th April for Sach's basic argument) and a recent political initiative undertaken by the British government to end poverty in Africa (see Commission for Africa website).

The first speaker is Francesco Caselli, representing development economists. His main point is that the effectiveness of aid depends on which world the poor live in, the nonconvex one or the convex one.

In the nonconvex world (or the world endowed with increasing return technology), the poor cannot get richer because of (1) minimum consumption to survive (therefore they can't save at all), (2) lack of education, (3) lack of healthiness, and (4) lack of infrastructure. But if your wealth exceeds a certain level, then these problems are suddenly solved, hence you start getting richer and richer. This is the world Jeffery Sachs envisions, and the rationale for more aid as aid money brings the poor immediately above the threshold level of wealth.

In the convex world (or the world endowed with decreasing return technology), on the other hand, the poorer you are, the higher the investment return. In this case, aid is wasteful because at the end of the day everyone reaches the equilibrium level of wealth (remember the Solow growth model). What matters is the curvature of production function, which is affected by, say, governance.

Therefore, the question is which world the poor live in. Based on evidence available, the convex world seems more likely as empirical studies have shown that return to education is higher in poorer countries and that return to physical capital is also higher in poorer countries (see Professor Caselli's own working paper).

He also raises two questions. Are the middle class people (whose wealth level is above the threshold in the non-convex world) becoming richer fast? If so, the world is non-convex. Is there any evidence that aid triggered the take-off of the economy? If so, the world is non-convex.

The second speaker is Dr. Teddy Brett from DESTIN. He points out quite a few countries where aid worked because the government was good (Uganda after the 1980s, India, Botswana, Ghana, Mozambique, Morocco, and, to a lesser degree, Egypt). So we shouldn't be so pessimistic. His main point is that aid is effective with good governance, which consists of bureaucratic capacity and the government's commitment.

Now it's PhD students' turn. The third speaker is Masa Kudamatsu from STICERD.

Yes, I was the speaker. :) I talked about under what conditions self-interested political leaders are willing to use aid money to tackle poverty. I introduced two theories in the literature of policy-making in autocracy.

The first one is Mancur Olson's stationary bandit theory (see McGuire and Olson 1996). According to this theory, political leaders tackle poverty by using aid money if both of the following two conditions are met: (1) The government is capable of collecting taxes (Self-interested politicians are assumed to consume tax revenue for their own benefit, and therefore we need to have a positive correlation between poverty reduction and tax revenue); (2) Political leaders won't be ousted in the near future (otherwise leaders cannot reap the benefit from poverty reduction even if the first condition is met).

The second one is the agency model approach (or what is sometimes called the political accountability model), first proposed by Barro (1973) "The Control of Politicians: An Economic Model", Public Choice, 14, pp.19-42, and Ferejohn (1986) "Incumbent Performance and Electoral Control", Public Choice, 50, pp.5-25. Although these two authors had in mind democratic politics, the basic idea can be applied to autocracy as well (see Gallego and Pitchik 2004, for example). This theory tells us again that we need both of the two conditions to be satisfied so that self-interested political leaders are willing to tackle poverty out of aid money: (1) the poor can play a role in leadership selection (otherwise political leaders will stay in office without tackling poverty); (2) the poor can trust a potential new leader who will assume office after the current leader is ousted (otherwise, the poor are happy with the incumbent who takes poverty serious only marginally because the alternative to the current leader is even worse). (The second point was brought to attention by Bueno de Mesquita et al. 2002, and Padro-i-Miquel 2004 applied it to African ethnic politics.

I couldn't effectively connect these ideas to the points raised by the two faculty members earlier in the seminar. That was my regret. But I guess the presentation went well overall as Robert Wade, a professor from DESTIN, told me it was interesting and crisp. (I didn't know the meaning of "crisp". Later I looked it up in the Collins Cobuild English Dictionary; it says, "If you describe someone's writing or speech as crisp, you mean they write or speak very clearly, without mentioning unnecessary details.") Special thanks goes to Paolo, who gave me comments on my slides, saying, "Difficult to understand." :)

The last speaker is Elliot Green from DESTIN. He is a specialist of Uganda. He points out that the headcount ratio (the percentage of the poor over the total population) in Uganda had been in decline during the 1990s but went up around 2000. He attributes this dynamics to the coffee price in the world market. If you look at the headcount ratios by region, the number of poor people is falling mainly in the central region, where coffee beans are produced. Plus, the world coffee price was on the rise durng the 1990s and began falling until 2002. He also mentions that democratic elections for regional governments in Uganda actually caused a fall in tax revenue because the governments cut taxes in order to win votes. This is detrimental to good governance in his view (and, it seems, in the view of most development studies researchers) because the link between the government and citizens is broken without tax collection. (This argument is shared by The Economist magazine as well. See this article.)

So there seemed to be a consensus between me and Elliot (and some DESTIN students) that tax collection is key to poverty reduction.

During the floor discussion that followed, a couple of interesting interactions between economics and development studies emerged. In relation to Professor Caselli's question - is the middle class people getting richer quick? - Bettina, a DESTIN student, pointed out the fact that middle class people in poor countries simply invest their money overseas or emigrate to rich countries. Prof. Caselli's response was that it indicates the return to investment for middle class people is low, supporting the idea of the convex world.

Also in response to Professor Caselli's comment that good governance comes with a good leader, which is more or less a matter of luck, Dr Brett raised an interesting point. President Yoweri Museveni of Uganda and former President Charles Taylor of Liberia were both rebel leaders who succeeded in defeating the government forces. But under Museveni's rule Uganda has seen improvements in the economy (with caveats pointed out by Elliot, though) while Charles Taylor just brought another civil war to Liberia (see this BBC article). The difference between these two cases is, in Dr Brett's view, higher education. Uganda received a lot of aid money to improve its higher education system in the 60s and 70s, which President Museveni enjoyed before turning to be a rebel leader. This didn't happen in Liberia. So Dr Brett welcomes one suggestion made by the Commission for Africa that aid should target higher education in order to bring about good governance. This point is revealing to me as development economists all ignore the role of higher education in development.

The seminar ended with this argument on the quality of leadership. This is somehow encouraging to me because that's what I'm now trying to figure out: how the quality of leadership is determined in nondemocratic countries.

Overall, I benefited a lot from this seminar though other STICERD people didn't seem to (plus, none of STICERD professors attended the seminar)...