Local Blogs

Hey Tech!

By Angela Hey

E-mail Angela Hey

About this blog: I write about technology companies, trends and events in and around Mountain View. Where else can you find startups nurtured by Y-Combinator and 500 Startups working alongside multi-billion corporations like Google, Symantec and I...  (More)

View all posts from Angela Hey

Analyze data yourself with R - a fast growing language for statistics, forecasting and graphs

Uploaded: Apr 17, 2014
Data are everywhere. With today's powerful smartphones, tablets and computers anyone, from school children to seniors, can learn how to analyze and make graphics from data. Maybe you want to take a closer look at financial data from Intuit's Quicken or Mint, organize answers to a survey, understand government statistics better, predict what your customers will buy or follow stock prices.

Spreadsheets, like Microsoft Excel or Apache's Open Office Calc (free), have statistical and graphing functions as a start. If you want to go beyond a spreadsheet and organize data into groups, plot many graphs at once or build statistical models then it's worth learning the rapidly growing language statistical R. Modules let you integrate R with spreadsheets.

During a three-month break from writing this blog, I took a couple of online courses that used R:
- Johns Hopkins University's Coursera course "Computing For Data Analysis" (rebranded R Programming with a new session starting May 5)
- Stanford University's online course, "Statistical Learning"

In the mid-1970s, John Chambers at Bell Laboratories created the S language to analyze statistics. It launched on the UNIX operating system in 1978. That's before VisiCalc, the first spreadsheet, that appeared for the Apple II in 1979.

In 1996, Robert Gentleman and Ross Ihaka from the University of Auckland, announced their development of the R language, inspired by S. They attracted collaborators worldwide to build the powerful R software that's available today.

R is freely available, open source software with many components. Today there are 5449 available packages on the leading CRAN software library, for everything from insightful graphics, to biomedical data analysis to financial forecasting.

Palo Alto's Tibco offers a commercial version of S, S+, as well as TERR (Tibco Enterprise Runtime for R). Revolution Analytics, headquartered in Mountain View, was founded in 2007 to sell R software and services to commercial users. Recently, the company was named an Advanced Analytics "Visionary" company by analyst firm Gartner. Gartner estimates advanced analytics to be a $2 billion market that spans a broad array of industries globally. Revolution Analytics, lets you run R in Amazon's cloud for data sets as large as a terabyte.


Source: Bay Area useR Group logo

Last week, I attended the monthly Bay Area useR Group Meetup, for those interested in R. The April meeting was at Intuit. There were 7 excellent speakers, but I'm only going to write about one - Ram Narasimhan. Ram teaches R for UCSC Extension. He gave an amusing talk on analyzing weather data - this link shows several examples of how he plotted weather data.

It started a couple of years ago, when he moved to Silicon Valley and his wife thought the weather was better in Chicago than in Sunnyvale. Ram decided he'd write some R software to test this. As averages can be misleading (see for example Stanford Consulting Professor Sam Savage's short talk on the Flaw of Averages), Ram selected the minimum temperature, as long as it was over 50įF. He ended up writing a package – weatherData - to gather weather station data for different locations and date ranges.


City Mean, Maximum and Minimum Temperatures, by Month for Austin, Las Vegas, San Diego, San Francisco and Tampa (Source: http://ramnarasimhan.wordpress.com/tag/weather-analysis/)

One weather data source is Weather Underground which shows the personal weather station – Mountain Shadows KCAMOUNT15 – in Mountain View. This site shows temperature, dew point, wind speed, pressure and precipitation starting in 2008. For an organizational site, Moffett Field's station KNUQ gives weather data on Weather Underground as far back as 19:00hrs on March 2nd 1945. Comparing the two sites, you can see that temperatures can vary, even within a small town, given the marine effect near Moffett.

Playing with R gives insights into the challenges of using weather station data to see if Mountain View is warming or cooling. Which station should you use? Most days, at Moffett, the weather data is collected approximately once an hour, but other days there are more observations. Do you want to include night temperatures, or just day temperatures? Whereas climate models show the world in general is warming, climate is not the same as weather, which can jiggle around from day to day. You can read more about climate on NASA's website.

If you want to learn about statistics, probability and survey sampling, then Stat Trek is a useful site. To get started with R, download the free RStudio development environment which comes with training and help files. Type in a sum – "5+3" – and the answer "8" will appear. At a minimum RStudio can be used as a calculator. To analyze the local weather data from Moffett Field from April 1st, 2005 to April 8th, 2005. Open RStudio, install the package weatherData from CRAN, then type in:
mvweather <- getWeatherForDate("KNUQ","2005-4-1","2005-4-8")
followed by:
summary(mvweather) to get a summary of temperature data
or you could type:
mvweather
to see a list of temperatures.

Being interactive, R provides a fun way to learn statistics and quickly visualize data.

Comments

Posted by krrannti kumar, a resident of Blossom Valley,
on Apr 17, 2014 at 9:31 pm

Thanks a lot for nice article. Thanks for introducing Mr.Ram Narasimhan here with his work. Its really great to know how R language will help us to streamline most complex data into easy-read one. Thanks again.


Posted by John Mount, a resident of another community,
on Apr 20, 2014 at 8:35 am

Angela, thanks for the great article. That was a fun meeting, and the Bay Area useR Group Meetup is routinely one of the more positive and interesting Meetups I attend.

I also like the R training steps you shared. In that direction I would like to alert readers to a new book by Nina Zumel and myself (John Mount): Practical Data Science with R ( Web Link ). This book is about how to do data science (defining problems, working with others, analyzing data, presenting results, and deploying decision models to production) using R. The book ends up working through about 10 substantial examples and is a good tour through a lot of the tasks required of a data scientist (though by staying in R we avoid big data and a lot of other tools; all of which are good follow-up topics).


Post a comment

Posting an item on Town Square is simple and requires no registration. Just complete this form and hit "submit" and your topic will appear online. Please be respectful and truthful in your postings so Town Square will continue to be a thoughtful gathering place for sharing community information and opinion. All postings are subject to our TERMS OF USE, and may be deleted if deemed inappropriate by our staff.

We prefer that you use your real name, but you may use any "member" name you wish.

Name: *

Select your neighborhood or school community: * Not sure?

Comment: *

Verification code: *
Enter the verification code exactly as shown, using capital and lowercase letters, in the multi-colored box.

*Required Fields

Scottís Seafood Mountain View to close, reopen as new concept
By Elena Kadvany | 11 comments | 3,290 views

Who Says Kids Donít Eat Vegetables?
By Laura Stec | 7 comments | 1,706 views

Breastfeeding Tips
By Jessica T | 11 comments | 1,553 views

Richard Linklater's Masterpiece "Boyhood"
By Anita Felicelli | 5 comments | 1,212 views

Community Service Helps You, Too
By John Raftrey and Lori McCormick | 1 comment | 1,099 views