"drop_na_if" provides a simple way to drop columns with missing values if they meet certain criteria/conditions.

drop_na_if(
  df,
  sign = "gteq",
  percent_na = 50,
  keep_columns = NULL,
  grouping_cols = NULL,
  target_columns = NULL,
  ...
)

Arguments

df

A data.frame object

sign

Character. One of gteq,lteq,lt,gt or eq which refer to greater than(gt) or equal(eq) or less than(lt) or equal to(eq) respectively.

percent_na

The percentage to use when dropping columns with missing values

keep_columns

Columns that should be kept despite meeting the target percent_na criterion(criteria)

grouping_cols

For dropping groups that meet a target criterion of percent missingness.

target_columns

If working on grouped data, drop all columns that meet target or only a specific column.

...

Other arguments to "percent_missing"

Value

A data.frame object with columns that meet the target criteria dropped.

See also

Examples

head(drop_na_if(airquality, percent_na = 24))
#>   Solar.R Wind Temp Month Day
#> 1     190  7.4   67     5   1
#> 2     118  8.0   72     5   2
#> 3     149 12.6   74     5   3
#> 4     313 11.5   62     5   4
#> 5      NA 14.3   56     5   5
#> 6      NA 14.9   66     5   6
#drop columns that have less tan or equal to 4%
head(drop_na_if(airquality,sign="lteq", percent_na = 4))
#>   Ozone Solar.R
#> 1    41     190
#> 2    36     118
#> 3    12     149
#> 4    18     313
#> 5    NA      NA
#> 6    28      NA
# Drop all except with greater than ie equal to 4% missing but keep Ozone
head(drop_na_if(airquality, sign="gteq",percent_na = 4, 
keep_columns = "Ozone"))
#>   Ozone Wind Temp Month Day
#> 1    41  7.4   67     5   1
#> 2    36  8.0   72     5   2
#> 3    12 12.6   74     5   3
#> 4    18 11.5   62     5   4
#> 5    NA 14.3   56     5   5
#> 6    28 14.9   66     5   6
# Drop groups that meet a given criterion
grouped_drop <- structure(list(ID = c("A", "A", "B", "A", "B"), Vals = c(4, NA, 
NA, NA, NA), Values = c(5, 6, 7, 8, NA)), row.names = c(NA, -5L),
 class = "data.frame")
 drop_na_if(grouped_drop,percent_na = 67,grouping_cols = "ID")
#> # A tibble: 3 x 3
#>   ID     Vals Values
#>   <chr> <dbl>  <dbl>
#> 1 A         4      5
#> 2 A        NA      6
#> 3 A        NA      8