"drop_na_if" provides a simple way to drop columns with missing values if they meet certain criteria/conditions.
drop_na_if(
df,
sign = "gteq",
percent_na = 50,
keep_columns = NULL,
grouping_cols = NULL,
target_columns = NULL,
...
)
A data.frame object
Character. One of gteq,lteq,lt,gt or eq which refer to greater than(gt) or equal(eq) or less than(lt) or equal to(eq) respectively.
The percentage to use when dropping columns with missing values
Columns that should be kept despite meeting the target percent_na criterion(criteria)
For dropping groups that meet a target criterion of percent missingness.
If working on grouped data, drop all columns that meet target or only a specific column.
Other arguments to "percent_missing"
A data.frame object with columns that meet the target criteria dropped.
head(drop_na_if(airquality, percent_na = 24))
#> Solar.R Wind Temp Month Day
#> 1 190 7.4 67 5 1
#> 2 118 8.0 72 5 2
#> 3 149 12.6 74 5 3
#> 4 313 11.5 62 5 4
#> 5 NA 14.3 56 5 5
#> 6 NA 14.9 66 5 6
#drop columns that have less tan or equal to 4%
head(drop_na_if(airquality,sign="lteq", percent_na = 4))
#> Ozone Solar.R
#> 1 41 190
#> 2 36 118
#> 3 12 149
#> 4 18 313
#> 5 NA NA
#> 6 28 NA
# Drop all except with greater than ie equal to 4% missing but keep Ozone
head(drop_na_if(airquality, sign="gteq",percent_na = 4,
keep_columns = "Ozone"))
#> Ozone Wind Temp Month Day
#> 1 41 7.4 67 5 1
#> 2 36 8.0 72 5 2
#> 3 12 12.6 74 5 3
#> 4 18 11.5 62 5 4
#> 5 NA 14.3 56 5 5
#> 6 28 14.9 66 5 6
# Drop groups that meet a given criterion
grouped_drop <- structure(list(ID = c("A", "A", "B", "A", "B"), Vals = c(4, NA,
NA, NA, NA), Values = c(5, 6, 7, 8, NA)), row.names = c(NA, -5L),
class = "data.frame")
drop_na_if(grouped_drop,percent_na = 67,grouping_cols = "ID")
#> # A tibble: 3 x 3
#> ID Vals Values
#> <chr> <dbl> <dbl>
#> 1 A 4 5
#> 2 A NA 6
#> 3 A NA 8