How to draw good looking maps in R

In one recent project I needed to draw several maps and visualize different kinds of geographical data on it. I found the combination of R/ggplot/maps package extremely flexible and powerful, and produce nice looking map based visualizations.

Here is a short tutorial,  monospace font indicates the code you need to run in R. You probably need some basic understanding of R to work through this tutorial.

install.packages("maps")
install.packages("ggplot2")

  • Now import the libraries, load the US map data, and draw a map with all states.


library(ggplot2)
library(maps)
#load us map data
all_states <- map_data("state")
#plot all states with ggplot
p <- ggplot()
p <- p + geom_polygon( data=all_states, aes(x=long, y=lat, group = group),colour="white", fill="grey10" )
p

  • If you only want a subset of states, subset the all_states dataframe and redraw the plot.


states <- subset(all_states, region %in% c( "illinois", "indiana", "iowa", "kentucky", "michigan", "minnesota","missouri", "north dakota", "ohio", "south dakota", "wisconsin" ) )
p <- ggplot()
p <- p + geom_polygon( data=states, aes(x=long, y=lat, group = group),colour="white", fill="grey10" )
p

  • Prepare a geographical dataset, which contains the data you want to visualize onto the map. Indicate the geographical location of each data point in terms of longitude and latitude. As an example, download this file and save it as “geo.csv” into your working directory, and load it into R.


mydata <- read.csv("geo.csv", header=TRUE, row.names=1, sep=",")

  • In the dataset, there are 39 universities in the midwest region, we want to visualize all these schools on the map, put a label on some of the schools, and we want to make the size of each dot proportional to the total number of enrollment in the school, and we want a legend, with the name “Total enrollment”. Sounds like a bit complicated, huh? But it’s just two lines of more code.


p <- ggplot()
p <- p + geom_polygon( data=states, aes(x=long, y=lat, group = group),colour="white" )
p <- p + geom_point( data=mydata, aes(x=long, y=lat, size = enrollment), color="coral1") + scale_size(name="Total enrollment")
p <- p + geom_text( data=mydata, hjust=0.5, vjust=-0.5, aes(x=long, y=lat, label=label), colour="gold2", size=4 )
p

For those of you who knows R but are not familiar with ggplot, the catch is the size=enrollment option in the geom_point function. This option sets the size of the dots proportional to the number of enrollment in the dataset.

  • Now we can see that there are so many schools in Chicago that they actually overlapped with each other, we want to jitter the dots a bit so we can see them better. So I changed geom_point function to geom_jitter and used position option to control the magnitude of jittering.


p <- ggplot()
p <- p + geom_polygon( data=states, aes(x=long, y=lat, group = group),colour="white" )
p <- p + geom_jitter( data=mydata, position=position_jitter(width=0.5, height=0.5), aes(x=long, y=lat, size = enrollment, color="coral1")) + scale_size(name="Total enrollment")
p <- p + geom_text( data=mydata, hjust=0.5, vjust=-0.5, aes(x=long, y=lat, label=label), colour="gold2", size=4 )
p

Some schools are jittered so much that they are in the lake now, but….. you get my point.

So what if you want to change the colors of schools to indicate some other factors, such as What state are they in?

  • Use color option to change the colors of the dots.


p <- ggplot()
p <- p + geom_polygon( data=states, aes(x=long, y=lat, group = group),colour="white" )
p <- p + geom_jitter( data=mydata, position=position_jitter(width=0.5, height=0.5), aes(x=long, y=lat, size = enrollment,color=state)) + scale_size(name="Total enrollment")
p <- p + geom_text( data=mydata, hjust=0.5, vjust=-0.5, aes(x=long, y=lat, label=label), colour="gold2", size=4 )
p


Now you can see a really colorful map… if you don’t like the colors, you can change the color scale using the scale_color_brewer function, or manually choose the color using scale_colour_manual function in ggplot.

So this is just a very simple illustration of the incredible power and flexibility of ggplot2 using maps as an example. While the grammar of ggplot2 seems a bit mysterious the first time you see it, it is actually built on the so called “grammer of graphics”. It basically allows you to build your own plot layer by layer, with absolute control of each single element in the plot.

For professional plotting, I highly recommendggplot2.  http://had.co.nz/ggplot2/

Finland Economy Outlook and Welfare System

This is a summary and analysis of the Mckinsey Helsinki office pro bono project. Link: Helsinki_Report

———————

Problem: In the face of an aging population, global competition and the financial crisis, how does Finland keep it’s current social model (a welfare state), in a sustainable manner?

Background

Finland has a well functioning public sector and successful welfare state, and wish to keep it in a sustainable manner. However, Finland faces significant challenges starting from 2010:

  • Population aging at the fastest rate in Europe, working population will shrink
  • Global competition leads to industrial restructuring, manufacturing jobs lost in financial crisis may never recover
  • Elderly population will strain the public healthcare system and public financing

If no action is taken, in the next decade, the above factors will push the Finnish public sector into unsustainable debt and the government will be forced into unfavorable changes and will inevitably harm the current welfare system.

Methods

The study was based on historically sustainable support ratio levels, predicted trends in population, and predicted number of private sector jobs. The support ratio is defined as number of public sector jobs / number of private sector jobs. In the report, a ratio of 1.9 – 2.0 was deemed as “sustainable”. If no measure was taken by the Finland government, the support ratio would surge to ~2.5 in 2020, which is highly unsustainable.

Proposed solution

  • Increase number of jobs in private sector.

In 2020, 150k – 200k private sector jobs is estimated to be needed to maintain a sustainable support ratio.

  • Increase supply of labor to private sector.

In 2020, Finland will need 270k – 320k more people in workforce to maintain a sustainable support ratio.

  • Producing more public services with the same resources.

Public productivity has to increase 1.2% year for the next decade to satisfy demand. To achieve this without increasing resources demanded, changes in working methods and procedures is needed.

——————— end of the summary here.

Below is an brief analysis in terms of the structure, presentation style, and other aspects of the report.

Structure

The overall structure of the report is:

  • Preface (1 page, including Acknowledgments)
  • Executive summary
  • Background and presentation of the problem (Challenges, aims of the project)
  • Proposition 1
  • Proposition 2
  • Proposition 3
  • Conclusion and Implication (What does this mean for Finland?)
  • References

Overall, the structure is clear and concise, and easy to follow logically.

The Charts

In my opinion, the highlights of the reports are the well designed and illustrated charts. In such a long report, few people has the patience to read all text, thus the charts are the key for effective communication. There are 56 “Exhibits”/Charts in the 108 page report, which can be grouped into seven types:

  • Column/Bar Charts – display numbers/percentages of a few groups Example: Exhibit 19. A variant of bar chart (Segmented bar chart): Exhibit 23.
  • Scatter plots and variants – display trends in large amount of data, two dimensional. Example: Exhibit 18. Variants: Exhibit 20, 26.
  • Line charts – display trends over time. Example: Exhibit 16.
  • “Waterfall” chart – a variant of bar chart, used to display changes in amount/percentage. Example: Exhibit 4, 5, 6.
  • Flowchart – display workflow or structure. Example: Exhibit 8
  • Tables – listing categories of information. Example: Exhibit 21.
  • Pie chart – displaying proportions, illustrating components. Example: Exhibit 30.

All these charts could be made from Excel/PowerPoint 2007. The key is selecting the appropriate type of chart for the information you want to display, the above summary could give be used as a guide. Also highlighting the key information in the charts is important. Formats of the charts in one reports should be kept uniform.

Analysis method

I think the analysis in the reports are all straightforward. In summary, they can be grouped into the following categories:

  • Segmentation – Dividing big problems into sub problems and analyze individually

For example, when we ask how do we increase employment, we segment the economy into six major sectors: Company services, local services, infrastructure, capital intensive production, industrial production and research intensive production. Then we address each sector separately according to their features.

  • Comparison – Comparing the data/problem we have with other countries/historical data and gain insights.

For example, comparing the job growth problem Finland is facing now with US, Japan, UK, Germany, etc. and identify key factors that influences job growth.

  • Association – find association between groups of data, and transform/simplify the problem.

For example, identified the association between support ratio and overall outlook of the economy, and focus on improving the support ratio.

  • Scenario analysis – making assumptions and perform forecasting

For example, making some assumptions about the trends, and forecast the  support ratio in 2020. This will illustrate the importance of reforms and strategies for the government to take before it’s too late.

No statistical analysis beyond simple regression between two variables is used – the client won’t be able to follow anything beyond that.


Categories: Uncategorized

Hollywood Profit Case

Source: http://www.economist.com/node/18386456

A major Hollywood film studio has seen their profits down in US market in the recent 5 years, they want to know why and how to tackle with it strategically.

So we start with the industry analysis. Does the industry profit fall as much as our client does? The answer is yes. The industry profit has been declining at the same rate as our client. So it’s not a company problem, we need to analyze the whole industry.

Now we start a Profit & Loss framework. We start from Cost, and we learned that the cost has been steady for the recent 5 years.  We know that the profit drop as been a revenue problem.

Then we ask why does the revenue drop? We start with segmenting the revenue source of the industry. The revenue stream of the films studios comes from 3 sources: 1, Theater box office. 2, after 6 months, DVD/Blue Ray/Video on demand sales.  3, after 1-2 years, sell the movie to TV channels. Looking at the trends, we found that the theater box office revenue and TV channel sales revenue has been steady for recent years, but DVD sales has been dropped drastically in recent 5 years.

We identified the source of the revenue problem to be dropping DVD sales. What caused this problem? Three possibilities: 1, piracy. 2, home DVD rentals. 3, heavy competition leads to dropping prices. A careful analysis of the market reveals that: the number of total films rented in US has grown 10% each year since 2007, and coincidently, the number of DVD bought in US has drastically declined. Piracy is a major problem in China and Russia, but not in US, and DVD market hasn’t seen a surge in competition and drop in price. Thus, the rise of home DVD rental companies (Netflix and Redbox) in the US market has been the dominating factor.

Now, what are the possible strategies to reverse this trend? The response can be grouped in these categories:

1, Boost revenue in emerging markets (China, India) by:

  • fighting Piracy
  • increase marketing
  • release films faster in those regions
  • invest in films that talks about stories in local cultural

2, Boost DVD sales in US:

  • Deal with Netflix and Redbox for a window of DVD sales (ask them to start rent later)
  • Release DVD sooner after the movie is in theater. (May cannibalize box office revenue)
  • Increase DVD prices (not likely to be effective)
  • Invest in more “blockbuster” movies and reduce funding in small budget movies (People are more likely to buy DVDs of Avatar than other films)

3, Boost other revenue sources in US

  • Disney enjoys big revenues from toys/theme parks that comes from movies.
  • Increase video-on-demand sales, which is by far the most high margin business for film studios.
Categories: Uncategorized

Ford Growth in Southeast Asia

Source: http://online.wsj.com/article/SB10001424052748704132204576190003595864520.html

Case scenario: Ford would like to expand their market share in the fast growing Southeast Asia Auto market. They are looking to establish a long term growth strategy in the region, what would you advise them to do?

We will analyze the case from four perspectives:

Customer/Market

Southeast Asian market is one of the fastest growing market in the world. Major countries in the include Thailand, Philippine, Indonesia and Vietnam. We will focus on Thailand and Indonesia in this case.

Thailand has 65 million people, per capita income ~ $5000. Indonesia has population 245 million, per capita income $3000, but will grow to $5000 in 10 years.

Product

Most of Fords sales comes from a Ranger truck, which is used in regional distribution in Thailand. Ford recently boosted sales with Fiesta (a model of compact car) in Thailand from 600 cars/month to 2000 cars/month, and enjoys an 8% market share in small compact car market in Thailand.

Distribution Channels

Ford recently started to construct a 450 million passenger-car plant in Thailand.

Competition

Ford currently has only 2% market share in the region. Toyota currently leads the market. Ford has 3% across Asia and Pacific. Both much smaller than Ford’s US market share.

Proposed solution:

1, Since indonesia is a much bigger potential market than Thailand, we will have to invest more in Indonesia for a long term growth strategy. We will either market existing successful models in Thailand (Fiesta), or introduce new models for Indonesian market. If we are successful in Indonesia, we may consider establishing distribution center or manufacturing plant in Indonesia to save distribution costs/tax. Creating local jobs will also help our brand marketing efforts (synergy).

2, Introduce more products in both Thailand and Indonesia. Currently we only have a two models selling. We will focus on compact cars where we have enjoyed sizable success and brand recognition, also compact cars will likely be the major growth sector in a region with a growing population of middle class. We will introduce new models every year, updating existing models and probably bring successful models (Focus, etc) in North America to the region. We will also try to provide models for fuel efficient cars/Hybrids.

3, In terms of Customer, We do not have much information about customer segmentation. It would be helpful to analyze customer segmentations and identify which segments are we most successful in selling to, and invest more in marketing efforts on that to boost market share.

4, Competition. We don’t have much information regarding competition. But with Japanese currency rising and US dollar declining, we might be able to beat them in cost with local factories and labor.

Potential risks:

Large capital investments in plants, currency fluctuation, political stability/terrorism, inflation, interest rate/tight credit, high oil price.

Categories: Uncategorized

IAI case in Israel

Source: http://www.economist.com/node/18281744

Case scenario:
IAI is the national owned Israeli company focusing on defense and aircraft manufacturing. It has been announced that IAI is going to be privatized soon. Our customer, which is a big private fund in Israel, wants us to evaluate this deal and tell them whether it’s a good idea to buy shares of the company when it’s privatized.

Framework and Analysis:
This is a business situation case. We will look at 4 factors:
1, Customer.
2, Company.
3, Product.
4, Competition.

Start with Customer. Obviously the major customer is the Israeli army, the IDF. Israel has a large military funding every year due to the frequent unrest in middle east and it’s tense diplomatic relationship with Palestine and Iran. The recent unrest in Arab world adds uncertainty and pressure for IDF for bigger budget. The IDF also has a keen sense of maintaining a technological advance relative to it’s rivals, which will ensure they buy the most advanced and expensive equipment the company produces. Chech +1 for this factor.

Now, company. The IAI is said to be in a financial and organizational mess for the past few years. This is part of the reason why the company is going private and fired its current management. Two factors to consider: 1, whether the new management can successfully manage the business. 2, the Union in the company (Israel Aerospace Industries) is powerful, and will they agree to layoff people and cut down costs. Check ? for this factor.

Product. They offer three major categories of products: 1, observational and communications satellites. 2, anti-ballistic-missile defenses. 3, mid-size business jets. The company has a competitive advantage in the first two categories of products, thus they have remained profitable consistently, and even has potential to market in Asian markets. The business jets sector has been barely profitable for two years due to the financial crisis, but has slightly recovered in recent months (80m/3b =~ 3%). Check +1.

Competition. There are no major domestic competitors because it’s previously national owned business. Foreign competitors can not compete because of security concerns (barrier to entry). Domestic competitors could arise but should take a long time because of technological constraints and huge investment required to start such a business. Check +1.

Conclusion: we should buy the shares of IAI. (brief summary goes here)

Some potential risks: 1, Regulatory. There could be some constraints about the money that are allowed into this sensitive national defense business. 2, The funding for the IDF could be slashed in future, (say the Israeli government suddenly runs into huge deficit and needs to slash budget), in such case the company will have to explore foreign markets. 3, Barriers to exit the investment. 4, The management and Union/company structure problem is still a risk. 5, The rising oil price will threat the business jet sector, which we may need to sell if the situation gets worse (it wasn’t very profitable to start with, and much vulnerable to competition).

 

Categories: Uncategorized

NYSE and Deutsche Boerse merge

February 20, 2011 1 comment

The central questions are:

  1. Why are they merging? What are the benefits to each party?
  2. What’s the impact to the outside market especially their main competitors?
  3. Merge details.
  4. Potential issues.

Sources of information:

I.          New York Times (N)

II.          Associated press (A)

III.          Seeking Alpha (S)

IV.          CNBC (C)

  1. Why are they merging? What are the benefits to each party?

The reduction of the cost.

1.1.   Background information: NYSE is the largest stock market but its parent the $9.9 Billion NYSE Euronext is not the largest exchange, smaller than CME (No.1 $20 Billion) and a number of other exchanges. (A)

  1. What’s the impact to the outside market especially their main competitors?

2.1   Similar events in the same market:

2.1.1        Last week, London and Toronto stock exchanges announced tie-up LSE-TMX $2.9 Billion. (N, A)

2.1.2        In Oct, 2011, SGX, the operator of Singapore stock exchange planned to buy Australia stock exchange. (N)

2.2   Greater access to investments: add in French or Germany company shares. Takes time to materialize. (A)

2.3   Lower trading cost:  expected $400 million annual cost savings. (A)

2.4   Rivals’ response: Nasdaq, ICE and CME hold conversation about the possibility of a joint bid and no further progress from there and talks are unlikely to be advanced to a bid. (C)

2.4.1        The most threatened exchange: Nasdaq. It’s difficult for it to make a credible, full financed, overbid for NYSE. (C)

2.4.2        Similar events: A joint hostile that breaks apart an existing deal is virtually heard of. (C)

2.4.3        Practical concern: $340 million breaking up fee. (C)

  1. Merge details.

3.1   Announce date: Feb 15th. (N)

3.2   Two involving parties: NYSE Euronext and Deutsche Börse. (N)

3.3   Total amount of transaction: $9.53 billion. (N)

3.4   Name issues: Senator Charles E. Schumer of New York want it to have “New York” first in its name. (N)

3.5   Market operation: Deutsche Börse will issue 0.47 of a share for each NYSE Euronext share, a roughly 10 percent premium to the American company’s stock price on Feb. 8. (N)

3.6   Management in combined unit: chairman Mr. Francioni of Deutsche Börse, chief executive Mr. Niederauer of NYSE Euronext. Deutsche Börse hold 10/17 seats on board, 60% of the merged exchange operator’s shares.

3.7   Location: dual headquarters in New York and Frankfurt, incorporated in Netherlands.

3.8   Benefits:  expected $400 million annual cost savings.

3.9   Further steps: requires approval by owners representing a majority of NYSE Euronext shares and 75 percent of Deutsche Börse shareholders.

  1. Potential issues.

4.1   Timing and likelihood of regulatory approvals.

4.2   Synergy realization. For example, data center: (S)

4.2.1        NYSE, two data centers located in NJ and London.

4.2.2        DB, data center outsourced.

4.3   Longer-term management and cultural issues.

 

Categories: Uncategorized

Hello world!

February 19, 2011 1 comment

Welcome to WordPress.com. This is your first post. Edit or delete it and start blogging!

Categories: Uncategorized
Follow

Get every new post delivered to your Inbox.