The Comprehensive R Archive Network (CRAN) continues to grow rapidly, with thousands of packages available for various statistical and data science applications. As of 2025, CRAN hosts over 20,000 packages, making R one of the most comprehensive statistical computing environments.
Recent trends in R package development include:
Enhanced performance: Many packages now leverage C++ integration through Rcpp
Improved user experience: Better documentation, vignettes, and error messages
Cloud integration: Packages for working with cloud platforms and big data
Machine learning focus: Continued expansion of ML and AI-related packages
1.2 The Tidyverse Approach
The tidyverse represents a modern approach to data science in R, emphasizing:
1.2.1 Core Principles
Tidy data: Each variable forms a column, each observation forms a row
Functional programming: Functions should be predictable and side-effect free
Human-centered design: APIs designed for humans, not computers
Consistency: Similar functions work in similar ways
1.2.2 Key Tidyverse Packages
Code
# Install the complete tidyverse# install.packages("tidyverse")# Core packages included:library(dplyr) # Data manipulationlibrary(ggplot2) # Data visualizationlibrary(tidyr) # Data tidyinglibrary(readr) # Data importlibrary(purrr) # Functional programminglibrary(tibble) # Modern data frameslibrary(stringr) # String manipulationlibrary(forcats) # Factor handlinglibrary(nycflights13)
1.2.3 Modern Data Manipulation with dplyr
The dplyr package provides a grammar of data manipulation with five key verbs:
Code
# Example using starwars datasetlibrary(dplyr)starwars %>%filter(species =="Human") %>%select(name, height, mass, homeworld) %>%mutate(bmi = mass / (height/100)^2) %>%arrange(desc(bmi)) %>%slice_head(n =5)
# A tibble: 5 × 5
name height mass homeworld bmi
<chr> <int> <dbl> <chr> <dbl>
1 Owen Lars 178 120 Tatooine 37.9
2 Darth Vader 202 136 Tatooine 33.3
3 Beru Whitesun Lars 165 75 Tatooine 27.5
4 Wedge Antilles 170 77 Corellia 26.6
5 Luke Skywalker 172 77 Tatooine 26.0
1.2.4 Advanced dplyr Features
1.2.4.1 Row-wise Operations
Code
# Computing row-wise statisticsdf <-tibble(x =1:3, y =3:5, z =5:7)df %>%rowwise() %>%mutate(row_mean =mean(c(x, y, z)))
# A tibble: 3 × 4
# Rowwise:
x y z row_mean
<int> <int> <int> <dbl>
1 1 3 5 3
2 2 4 6 4
3 3 5 7 5
R TaskView is a curated collection of packages organized by subject area, maintained by domain experts. These views help users navigate the vast ecosystem of R packages by providing structured recommendations for specific analytical domains.
A function in R is a set of statements organized together to perform a specific task. Functions are fundamental building blocks that allow you to:
Avoid code repetition
Make code more readable and maintainable
Create reusable components
Organize complex analyses
3.2 Function Structure
Code
function_name <-function(argument1, argument2 = default_value) {# Function body result <-some_computation(argument1, argument2)return(result) # Optional - R returns last expression}
3.3 Arguments and Parameters
3.3.1 Types of Arguments
Required arguments: Must be provided by the user
Optional arguments: Have default values
… (dots): Accept variable number of arguments
Code
# Example function with different argument typescalculate_stats <-function(x, na.rm =FALSE, ...) {if (na.rm) { x <- x[!is.na(x)] }list(mean =mean(x, ...),median =median(x, ...),sd =sd(x, ...) )}# Usagedata <-c(1, 2, 3, NA, 5)calculate_stats(data, na.rm =TRUE)
$mean
[1] 2.75
$median
[1] 2.5
$sd
[1] 1.707825
3.3.2 Parameter Matching
R matches arguments in three ways:
Exact matching: Argument names match exactly
Partial matching: Argument names are partially matched
Positional matching: Arguments matched by position
Code
# All equivalent callsmean(x =c(1, 2, 3), na.rm =TRUE)
[1] 2
Code
mean(c(1, 2, 3), na.rm =TRUE)
[1] 2
Code
mean(c(1, 2, 3), na =TRUE) # Partial matching
[1] 2
3.4 Getting Help: Finding Packages and Functions
3.4.1 Built-in Help System
Code
# Get help for a function?meanhelp(mean)# Search for functions??regression # Fuzzy searchhelp.search("regression")# View package documentationhelp(package ="dplyr")# View vignettesvignette("dplyr")browseVignettes("dplyr")
3.4.2 Finding Functions for Specific Tasks
CRAN Task Views: Organized by domain
RSeek.org: Specialized R search engine
Stack Overflow: Community-driven solutions
R Documentation sites: rdocumentation.org, rdrr.io
Package websites: Often hosted on GitHub or pkgdown sites
4 Writing R Scripts
4.1 Tips for Writing Good Functions
4.1.1 1. Follow the Single Responsibility Principle
Code
# Good: Function does one thing wellcalculate_bmi <-function(weight_kg, height_m) { bmi <- weight_kg / (height_m^2)return(bmi)}# Bad: Function tries to do too many thingscalculate_everything <-function(weight_kg, height_m) { bmi <- weight_kg / (height_m^2) category <-if (bmi <18.5) "Underweight"else"Normal"plot(weight_kg, height_m) # Side effect!return(list(bmi = bmi, category = category))}
4.1.2 2. Use Descriptive Names
Code
# Goodcalculate_confidence_interval <-function(data, confidence_level =0.95) {# Function implementation}# Badci_calc <-function(d, cl =0.95) {# Function implementation}
4.1.3 3. Include Input Validation
Code
safe_divide <-function(x, y) {# Input validationif (!is.numeric(x) ||!is.numeric(y)) {stop("Both x and y must be numeric") }if (y ==0) {warning("Division by zero, returning Inf")return(Inf) }return(x / y)}
4.1.4 4. Document Your Functions
Code
#' Calculate Body Mass Index#'#' This function calculates BMI from weight and height measurements#'#' @param weight_kg Numeric vector of weights in kilograms#' @param height_m Numeric vector of heights in meters#' @return Numeric vector of BMI values#' @examples#' calculate_bmi(70, 1.75)#' calculate_bmi(c(70, 80), c(1.75, 1.80))#' @exportcalculate_bmi <-function(weight_kg, height_m) {if (length(weight_kg) !=length(height_m)) {stop("weight_kg and height_m must have the same length") } bmi <- weight_kg / (height_m^2)return(bmi)}
4.2 Tips for Writing R Scripts
4.2.1 1. Script Organization
Code
# Header with script information# Title: Data Analysis Pipeline# Author: Your Name# Date: 2025-01-01# Purpose: Analyze survey data and generate report# Load required packageslibrary(tidyverse)library(here)library(scales)# Set global optionsoptions(stringsAsFactors =FALSE)# Define constantsSIGNIFICANCE_LEVEL <-0.05OUTPUT_DIR <-here("output")# Source custom functions# source(here("R", "helper_functions.R"))# Main analysis code...
4.2.2 3. Error Handling
Code
# Robust data readingread_data_safely <-function(file_path) {tryCatch({ data <-read_csv(file_path)message(paste("Successfully loaded", nrow(data), "rows"))return(data) }, error =function(e) {stop(paste("Failed to read file:", file_path, "\nError:", e$message)) })}
Following these principles will help you write more maintainable, readable, and efficient R code. Remember:
Functions should do one thing well
Use descriptive names
Document your code
Avoid repetition through functions and vectorization
Use the tidyverse approach for consistent and readable data manipulation
Leverage R’s functional programming capabilities
The combination of good function design, proper script organization, and DRY principles will make your R code more professional and easier to maintain over time.