9781684879618 Flipbook PDF
FLIP PDF 3.46MB
R Programming Language
Dr. F. Mary Harin Fernandez, Associate Professor, Sphoorthy Engineering College, Hyderabad.
Dr. M. Nithya
Associate Professor, Department of Computer Science and Engineering, Sri Sairam Engineering College, Chennai.
Dr. I. S. Hephzi Punithavathi Associate Professor, Sphoorthy Engineering College, Hyderabad.
EDITORS PROFILE Dr. F. Mary Harin Fernandez has completed her Bachelor of Engineering in Computer Science and Engineering from DMI College of Engineering, ANNA University, Chennai in the year 2005. She pursued her Master of Engineering in Computer Science and Engineering from Sathyabama University, Chennai in the year 2007. She achieved the Doctor of Philosophy in Computer Science and Engineering from Sathyabama Institute of Science and Technology, Chennai in the year 2020. She has 14 years of teaching experience in various foremost Institutions. She has published papers in various reputed International and National Conferences and Journals. She is currently working at Sphoorthy Engineering College, Hyderabad as Associate Professor in the Department of Computer Science and Engineering. Her areas of interests include AI, Machine Learning, Deep Learning, Ontology and Data Mining.
M. Nithya is currently working as Assiin Sri Sairam Engineering College. She completed B.Tech Information Technology in Bharathidasan University in 2005. Completed M.E in Computer science in Sathyabama University in 2008 and completed Ph.D (” Medical data Extraction akin to privacy”) In Sathyabama Institute of science and technology in 2019.
Dr. Hephzi Punithavathi. I.S, is currently working as an Associate Professor in the Department of Computer Science and Engineering at Sphoorthy Engineering College, Hyderabad. She has completed her Doctoral degree from the Anna University, Tamilnadu in 2019 and has 16 years of teaching experience in various institutions. She has published several international journals and her area of research includes Machine Learning and Image Processing.
First and foremost, I am bound to dedicate this work to the GOD, the ALMIGHTY without whom anything is impossible. I dedicate this book to my parents Mr. T. Fernandez and (Late) Mrs. M. Mary Flora who were the initiators of all my endeavours and my siblings who shouldered all my attempts towards this work. A special dedication is to my husband Kevin E for his encouragement, motivation and constant support for completing this book successfully. Dr. F. Mary Harin Fernandez
Dedicated to Dr. P. Duriapandy and Reena Gracelyn.
Dr. Hephzi Punithavathi
Contents Chapter I 1.1 1.2 1.3 1.4 1.5 1.6
Getting Started Obtaining and Installing R from CRAN Opening R for the First Time Saving Work and Exiting R Conventions Exercise
2 3 4 15 17 20
Chapter II 2.1 2.2 2.3 2.4
R for Basic Math Assigning Objects Vectors Conventions Exercise
22 28 30 46
Chapter III 3.1 3.2 3.3 3.4 3.5
Defining a Matrix Subsetting Matrix Operations and Algebra Multidimensional Arrays Exercise
49 53 60 68 77
Chapter IV 4.1 4.2 4.3 4.4
Logical Values Characters Factors Exercise
80 100 110 123
Chapter V 5.1 5.2 5.3 5.4 5.5
Lists of Objects Data Frames Some Special Values Understanding Types, Classes, and Coercion Exercise
127 136 147 156 175
Chapter VI 6.1 6.2 6.3 6.4 6.5
Using plot with Coordinate Vectors Graphical Parameters Adding Points, Lines, and Text to an Existing Plot The ggplot2 Package Exercise
178 180 187 189 192
Chapter VII 7.1 7.2 7.3 7.4
R-Ready Data Sets Reading in External Data Files Writing Out Data Files and Plots Ad Hoc Object Read/Write Operations
196 200 207 214
© All Right Reserved by the Publisher. NOTION PRESS First Edition: November 2021 This book has been published with all reasonable efforts taken to make the material error-free after the consent of the author. No part of this book shall be used, reproduced in any manner whatsoever without written permission from the author, except in the case of brief quotations embodied in critical articles and reviews. The Author of this book is solely responsible and liable for its content including but not limited to the views, representations, descriptions, statements, information, opinions and references. The Content of this book shall not constitute or be construed or deemed to reflect the opinion or expression of the Publisher or Editor. Neither the Publisher nor Editor endorse or approve the Content of this book or guarantee the reliability, accuracy or completeness of the Content published herein and do not make any representations or warranties of any kind, express or implied, including but not limited to the implied warranties of merchantability, fitness for a particular purpose. The Publisher and Editor shall not be liable whatsoever for any errors, omissions, whether such errors or omissions result from negligence, accident, or any other cause or claims for loss or damages of any kind, including without limitation, indirect or consequential loss or damage arising out of use, inability to use, or about the reliability, accuracy or sufficiency of the information contained in this book. MRP ₹. 260 /ISBN: 9781684879618 Published By: Notion Press Media Pvt Ltd, Old No. 38, New No. 6, McNichols Road, Chetpet, Chennai-600031, Tamilnadu, India. Email: [email protected]ress.com
The aim of The Book of R: A First Course in Programming and Statistics is to provide a relatively gentle yet informative exposure to the statistical software environment R, alongside some common statistical analyses, so that readers may have a solid foundation from which to eventually become experts in their own right. Learning to use and program in a computing language is much the same as learning a new spoken language. At the beginning, it is often difficult and may even be daunting—but total immersion in and active use of the language is the best and most effective way to become fluent. Many beginner-style texts that focus on R can generally be allocated to one of two categories: those concerned with computational aspects (that is, syntax and general programming tools) and those with statistical modeling and analysis in mind, often one particular type. In my experience, these texts are extremely well written and contain a wealth of useful information but better suit those individuals wanting to pursue fairly specific goals from the outset. This text seeks to combine the best of both worlds, by first focusing on only an appreciation and understanding of the language and its style and subsequently using these skills to fully introduce, conduct, and interpret some common statistical practices. Authors
Chapter I 1.1 Getting Started R plays a key role in a wide variety of research and data analysis projects because it makes many modern statistical methods, simple and advanced, readily available and easy to use. It’s true, however, that a beginner to R is often new to programming in general. As a beginner, you must not only learn to use R for your specific data analysis goals but also learn to think like a programmer. This is partly why R has a bit of a reputation for being “hard”—but rest assured, which really isn’t the case. A Brief History of R R is based heavily on the S language, first developed in the 1960s and 1970s by researchers at Bell Laboratories in New Jersey (for an overview, see, for example, Becker et al., 1988). With a view to embracing open source software, R’s developers—Ross Ihaka and Robert Gentleman at the University of Auckland in New Zealand—released it in the early 1990s under the GNU public license. (The software was named for Ross and Robert’s shared first initial.) Since then, the popularity of R has grown in leaps and bounds because of its unrivaled flexibility for data analysis and powerful graphical tools, all available for the princely sum of nothing. Perhaps the most appeal- ing feature of R is that any researcher can contribute code in the form of packages (or libraries), so the rest of the world has fast access to developments in statistics and data science. 2
Today, the main source code archives are maintained by a dedicated group known as the R Core Team, and R is a collaborative effort. You can find the names of the most prominent contributors at http://www.r-project.org/ ; these individuals deserve thanks for their ongoing efforts, which keep R alive and at the forefront of statistical computing! The team issues updated versions of R relatively frequently. There have been substantial changes to the software over time, though neighboring versions are typically similar to one another. In this book, I’ve employed versions 3.0.1–3.2.2. R provides a wonderfully flexible programming environment favored by the many researchers who does some form of data analysis as part of their work. In this chapter, I’ll lay the groundwork for learning and using R, and I’ll cover the basics of installing R and certain other things useful to know before you begin. 1.2 Obtaining and Installing R from CRAN R is available for Windows, OS X, and Linux/Unix platforms. You can find the main collection of R resources online at the Comprehensive R Archive Network (CRAN). If you go to the R project website at http:// www.rproject.org/ , you can navigate to your local CRAN mirror and down- load the installer relevant to your operating system. Section A.1 provides step-by-step instructions for installing the base distribution of R.
1.3 Opening R for the First Time R is an interpreted language that’s strictly case- and character-sensitive, which means that you enter instructions that follow the specific syntactic rules of the language into a console or command-line interface. The soft- ware then interprets and executes your code and returns any results. When you open the base R application, you’re presented with the R con- sole; Figure 1-1 shows a Windows instance, and the left image of Figure 1-2 shows an example in OS X. This represents R’s naturally incorporated graphical user interface (GUI) and is the typical way base R is used.
The R GUI application (default configuration) in Windows
The functional, “no-frills” appearance of the interpreter, which in my experience has struck fear into the heart of many an undergraduate, stays true to the very nature of the software—a blank statistical canvas that can be used for any number of tasks. Note that OS X versions use separate windows for the console and editor, though the default 4
behavior in Windows is to contain these panes in one overall R window.
The base R GUI console pane (left) and a newly opened instance of the built-in editor (right) in OS X
Console and Editor Panes There are two main window types used for programming R code and viewing output. The console or command-line interpreter that you’ve just seen is where all execution takes place and where all textual and numeric output is provided. You may use the R console directly for calculations or plotting. You would typically use the console directly only for short, one-line commands. By default, the R prompt that indicates R is ready and awaiting a command is a > symbol, after which a text cursor appears. To avoid confusion with the mathematical symbol for “greater than,” >, some authors (including me) prefer to modify this. A typical choice is R>, which you can set as follows: