Yebelay Berehan – Reshaping data

Reshaping data

National Data Management Center for Health (NDMC) at EPHI

Converting data from wide to long or long to wide

Reshaping data using `tidyr` package

Following are the four important functions to tidy (clean) the data:

Function	Objective	Arguments
`gather()`	Transform the data from wide to long	(data, key, value, na.rm = FALSE)
`spread()`	Transform the data from long to wide	(data, key, value)
`separate()`	Split one variables into two	(data, col, into, sep= ” “, remove = TRUE)
`unit()`	Unit two variables into one	(data, col, conc ,sep= ” “, remove = TRUE)

`gather()`

The objectives of the gather() function is to transform the data from wide to long.

Syntax

Code

gather(data, key, value, na.rm = FALSE)

Arguments:

data: The data frame used to reshape the dataset
key: Name of the new column created
value: Select the columns used to fill the key column
na.rm: Remove missing values. FALSE by default

Example

Below, we can visualize the concept of reshaping wide to long.
We want to create a single column named growth, filled by the rows of the quarter variables.

Code

library(tidyr)
data <- data.frame(   # Create a dataset
  Treatment = c("A", "B", "C"),
  q1_2021 = c(0.03, 0.05, 0.01),
  q2_2021 = c(0.05, 0.07, 0.02),
  q3_2021 = c(0.04, 0.05, 0.01),
  q4_2021 = c(0.03, 0.02, 0.04),
  q1_2022 = c(0.06, 0.07, 0.04),
  q2_2022 = c(0.08, 0.07, 0.08),
  q3_2022 = c(0.10, 0.08, 0.08),
  q4_2022 = c(0.09, 0.09, 0.10))
data

  Treatment q1_2021 q2_2021 q3_2021 q4_2021 q1_2022 q2_2022 q3_2022 q4_2022
1         A    0.03    0.05    0.04    0.03    0.06    0.08    0.10    0.09
2         B    0.05    0.07    0.05    0.02    0.07    0.07    0.08    0.09
3         C    0.01    0.02    0.01    0.04    0.04    0.08    0.08    0.10

Reshape the data

In the gather() function, we create two new variable quarter and growth because our original dataset has one group variable: i.e. treatment and the key-value pairs.

Code

long <-data |>
gather(quarter, growth, q1_2021:q4_2022)
head(long, 12)

   Treatment quarter growth
1          A q1_2021   0.03
2          B q1_2021   0.05
3          C q1_2021   0.01
4          A q2_2021   0.05
5          B q2_2021   0.07
6          C q2_2021   0.02
7          A q3_2021   0.04
8          B q3_2021   0.05
9          C q3_2021   0.01
10         A q4_2021   0.03
11         B q4_2021   0.02
12         C q4_2021   0.04

spread()

The spread() function does the opposite of gather.

Syntax

Code

spread(data, key, value)

arguments:

data: The data frame used to reshape the dataset
key: Column to reshape long to wide
value: Rows used to fill the new column

Example

We can reshape the long dataset back to data with spread()

Reshape the data

Code

wide <- long |>
  spread(quarter, growth) 
wide

  Treatment q1_2021 q1_2022 q2_2021 q2_2022 q3_2021 q3_2022 q4_2021 q4_2022
1         A    0.03    0.06    0.05    0.08    0.04    0.10    0.03    0.09
2         B    0.05    0.07    0.07    0.07    0.05    0.08    0.02    0.09
3         C    0.01    0.04    0.02    0.08    0.01    0.08    0.04    0.10

separate()

The separate() function splits a column into two according to a separator.
This function is helpful in some situations where the variable is a date.
Our analysis can require focussing on month and year and we want to separate the column into two new variables.

Syntax

Code

separate(data, col, into, sep= "", remove = TRUE)

arguments:

data: The data frame used to reshape the dataset
col: The column to split
into: The name of the new variables
sep: Indicates the symbol used that separates the variable, i.e.: “-”, “_”, “&”
remove: Remove the old column. By default sets to TRUE.

Example

We can split the quarter from the year in the tidier dataset by applying the separate() function.

Code

separate_tidier <-long |>
separate(quarter, c("Qrt", "year"), sep ="_")
head(separate_tidier, 12)

   Treatment Qrt year growth
1          A  q1 2021   0.03
2          B  q1 2021   0.05
3          C  q1 2021   0.01
4          A  q2 2021   0.05
5          B  q2 2021   0.07
6          C  q2 2021   0.02
7          A  q3 2021   0.04
8          B  q3 2021   0.05
9          C  q3 2021   0.01
10         A  q4 2021   0.03
11         B  q4 2021   0.02
12         C  q4 2021   0.04

unite()

The unite() function concanates two columns into one.

Syntax

Code

unit(data, col, conc ,sep= "", remove = TRUE)

arguments: - data: The data frame used to reshape the dataset - col: Name of the new column - conc: Name of the columns to concatenate - sep: Indicates the symbol used that unites the variable, i.e: “-”, “_”, “&” - remove: Remove the old columns. By default, sets to TRUE —

Example

In the above example, we separated quarter from year.
What if we want to merge them.
We use the following code:

Code

unit_tidier <- separate_tidier |>
  unite(Quarter, Qrt, year, sep ="_")
head(unit_tidier)

Dealing with Missing Data

Missing data types

One of the most important aspects of data cleaning is missing values.

Types of “missing” data:

NA - general missing data
NaN - stands for “Not a Number”, happens when you do 0/0.
Inf and -Inf - Infinity, happens when you divide a positive number (or negative number) by 0.

Finding Missing data

is.na - looks for NAN and NA
is.nan- looks for NAN
is.infinite - looks for Inf or -Inf

Code

test<-c(0,NA, -1)
test/0

[1]  NaN   NA -Inf

Code

test <-test/0
is.na(test)

[1]  TRUE  TRUE FALSE

Code

is.nan(test)

[1]  TRUE FALSE FALSE

Code

is.infinite(test)

[1] FALSE FALSE  TRUE

Useful checking functions

any will be TRUE if ANY are true
- any(is.na(x)) - do we have any NA’s in x?

Code

A = c(1, 2, 4, NA)
B = c(1, 2, 3, 4)
any(is.na(A)) # are there any NAs - YES/TRUE

[1] TRUE

Code

any(is.na(B)) # are there any NAs- NO/FALSE

[1] FALSE

`naniar`

Sometimes you need to look at lots of data though… the naniar package is a good option.
The pct_complete() function shows the percentage that is complete for a given data object, (vector or data frame).

Code

#install.packages("naniar")
library(naniar)
pct_complete(airquality)

[1] 95.20697

Code

library(janitor)
library(dplyr)
airquality |> select(Ozone) |>
pct_complete()

[1] 75.81699

`naniar` plots

The gg_miss_var() function creates a nice plot about the number of missing values for each variable, (need a data frame).

Code

gg_miss_var(airquality)

Missing Data Issues

Recall that mathematical operations with NA often result in NAs.

Code

sum(c(1,2,3,NA))

[1] NA

Code

mean(c(2,4,NA))

[1] NA

Code

median(c(1,2,3,NA))

[1] NA

filter() and missing data

Be careful with missing data using subsetting!
filter() removes missing values by default. Because R can’t tell for sure if an NA value meets the condition.
To keep them need to add is.na() conditional.

Code

df <-tibble(Dog = c(0, NA, 2, 3, 1, 1), 
            Cat = c(NA, 8, 6, NA, 2, NA))

Code

df

# A tibble: 6 × 2
    Dog   Cat
  <dbl> <dbl>
1     0    NA
2    NA     8
3     2     6
4     3    NA
5     1     2
6     1    NA

Code

df |> filter(Dog < 3)

# A tibble: 4 × 2
    Dog   Cat
  <dbl> <dbl>
1     0    NA
2     2     6
3     1     2
4     1    NA

filter() and missing data

Code

df |> filter(Dog < 3 | is.na(Dog))

# A tibble: 5 × 2
    Dog   Cat
  <dbl> <dbl>
1     0    NA
2    NA     8
3     2     6
4     1     2
5     1    NA

To remove rows with `NA` values for a variable use `drop_na()`

A function from the tidyr package. (Need a data frame to start!)
Don’t do this unless you have thought about if dropping NA values makes sense based on knowing what these values mean in your data.

Code

df |> drop_na(Dog)

# A tibble: 5 × 2
    Dog   Cat
  <dbl> <dbl>
1     0    NA
2     2     6
3     3    NA
4     1     2
5     1    NA

Code

df |> drop_na()

# A tibble: 2 × 2
    Dog   Cat
  <dbl> <dbl>
1     2     6
2     1     2

Change a value to be `NA`

The na_if() function of dplyr can be helpful for this. Let’s say we think that all 0 values should be NA.

Code

df

# A tibble: 6 × 2
    Dog   Cat
  <dbl> <dbl>
1     0    NA
2    NA     8
3     2     6
4     3    NA
5     1     2
6     1    NA

Code

df |> 
  mutate(Dog = na_if(x = Dog, y = 0))

# A tibble: 6 × 2
    Dog   Cat
  <dbl> <dbl>
1    NA    NA
2    NA     8
3     2     6
4     3    NA
5     1     2
6     1    NA

Summary

is.na(),any(is.na()), count(), and functions from naniar like gg_miss_var() can help determine if we have NA values
filter() automatically removes NA values - can’t confirm or deny if condition is met (need | is.na() to keep them)
drop_na() can help you remove NA values from a variable or an entire data frame
NA values can change your calculation results
think about what NA values represent - don’t drop them if you shouldn’t

Data leaning

Cleaning data is one of the most essential parts in data analysis.
Data cleaning is the process of converting messy data into reliable data that can be analyzed in R.
Data cleaning improves data quality and your productivity in R.
- Format ugly data frame column names in R
- Delete all blank rows in R
- Remove duplicate rows in R
- Converting numeric to date format
- Checking consistency by using tably() function

`clean_names()`

This function is used to change and clean up names of columns in data frames.
It can be used to ensure consistency.
You can choose to change all names to lower cases, separated by underscores, variations on internal capital letters between words, title case or other styles.
It can also be used to remove parts of names and any special characters, including replacing % symbols with the word percent.
The most recommended variable names in R is one word, lower case, without special characters.
clean_names() function will be used available in janitor R package to clean column names.

Code

library(janitor); library(report)
data<-clean_names(iris)
data |> head()

  sepal_length sepal_width petal_length petal_width species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa
3          4.7         3.2          1.3         0.2  setosa
4          4.6         3.1          1.5         0.2  setosa
5          5.0         3.6          1.4         0.2  setosa
6          5.4         3.9          1.7         0.4  setosa

`remove_empty()` & `remove_constant()`

Suppose if you want to remove rows and/or columns of if contain completely empty, then you can use remove_empty() function available in janitor R package.

remove_empty() removes both empty rows and empty columns.

Code

data3<- remove_empty(data, which = c("rows","cols"), quiet = F)
data3 |> head()

  sepal_length sepal_width petal_length petal_width species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa
3          4.7         3.2          1.3         0.2  setosa
4          4.6         3.1          1.5         0.2  setosa
5          5.0         3.6          1.4         0.2  setosa
6          5.4         3.9          1.7         0.4  setosa

Lets add two constant columns to the dataset and see how we can remove all this junk with janitor.

Code

# add two constant columns
data4 <- data3 |> 
  mutate(constant_column   = 42, constant_column_2 = "text") |> head()

Code

library(kableExtra)
data4 |> 
  remove_constant() |> remove_empty() |> kbl() |>
  kable_classic_2(full_width = F)

sepal_length	sepal_width	petal_length	petal_width
5.1	3.5	1.4	0.2
4.9	3.0	1.4	0.2
4.7	3.2	1.3	0.2
4.6	3.1	1.5	0.2
5.0	3.6	1.4	0.2
5.4	3.9	1.7	0.4

`get_dupes()`

This function retrieves any duplicates in the dataset so that they can be examined during data clean-up operations.
One trick is determining if a duplicate is indeed a duplicate.
The function returns a data frame which includes a dupe_count column containing the number of duplicates of that value.

Code

data3 |> get_dupes()

  sepal_length sepal_width petal_length petal_width   species dupe_count
1          5.8         2.7          5.1         1.9 virginica          2
2          5.8         2.7          5.1         1.9 virginica          2

We can also get duplicate values by col names

Code

data3 |> get_dupes(sepal_length, sepal_width) |> head()

  sepal_length sepal_width dupe_count petal_length petal_width    species
1          5.8         2.7          4          4.1         1.0 versicolor
2          5.8         2.7          4          3.9         1.2 versicolor
3          5.8         2.7          4          5.1         1.9  virginica
4          5.8         2.7          4          5.1         1.9  virginica
5          5.1         3.8          3          1.5         0.3     setosa
6          5.1         3.8          3          1.9         0.4     setosa

We can also use distinct() function available in dplyr R package to remove the duplicate rows.

Code

library(dplyr)
data_cleaned <- data3 |> distinct()
data_cleaned |> head()

  sepal_length sepal_width petal_length petal_width species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa
3          4.7         3.2          1.3         0.2  setosa
4          4.6         3.1          1.5         0.2  setosa
5          5.0         3.6          1.4         0.2  setosa
6          5.4         3.9          1.7         0.4  setosa

If we want to remove duplicate rows with respect to a specific variable, we can use distinct() function again. For example, we remove duplicate rows with respect to petal_length.

Code

data_cleaned2 <- data_cleaned |>  distinct(petal_length, .keep_all = TRUE)
data_cleaned2 |> head.matrix()

  sepal_length sepal_width petal_length petal_width species
1          5.1         3.5          1.4         0.2  setosa
2          4.7         3.2          1.3         0.2  setosa
3          4.6         3.1          1.5         0.2  setosa
4          5.4         3.9          1.7         0.4  setosa
5          4.8         3.4          1.6         0.2  setosa
6          4.3         3.0          1.1         0.1  setosa

`tabyl()`

This function is used to produce frequency tables and contingency tables, i.e. counts of each category or combination of categories of data.
Unlike the base R table() function, tabyl() returns a data frame which makes results easier to work with.
The code below creates a data frame showing the number of rows of data (n) for each location in the dataset.

Code

data_cleaned2 |> tabyl(species)

    species  n   percent
     setosa  9 0.2093023
 versicolor 19 0.4418605
  virginica 15 0.3488372

Also returned is a percent column, showing the percentage of rows containing data for that location.

Code

data_cleaned2 |> tabyl(species)|>
  adorn_pct_formatting(digits = 2)

    species  n percent
     setosa  9  20.93%
 versicolor 19  44.19%
  virginica 15  34.88%

Explore missing values

Code

# plot missing data (using raw data)
DataExplorer::plot_missing(
  title   = "% of Missing Data (filtered to cols w/missing data)",
  data    = airquality,
  ggtheme = tidyquant::theme_tq(), 
  missing_only = F)

`convert_to_date()`

A modern Excel always tries to automate things, and I hate it!
For instance you write a number into a cell and it sometimes immediately converts it into date. Then you try to have a date in a cell, and it returns a number.
Moreover, Excel also has some strange date encoding systems, which can be confused with a normal numeric columns.
Luckily, our dirty dataset has a “date” word in the name of a column “hire_date”, otherwise we wouldn’t know that it is a date:

Code

convert_to_date(d$hire_date)

`excel_numeric_to_date()`

Ever load data from Excel and see a value like 42223 where a date should be?
This function converts those serial numbers to class Date, with options for different Excel date encoding systems, preserving fractions of a date as time (in which case the returned value is of class POSIXlt), and specifying a time zone.

Code

excel_numeric_to_date(41103)

[1] "2012-07-13"

Code

excel_numeric_to_date(41103.01) # ignores decimal places, returns Date object

[1] "2012-07-13"

Code

excel_numeric_to_date(41103.01, include_time = TRUE) # returns POSIXlt object

[1] "2012-07-13 00:14:24 EAT"

Convert a mix of date and datetime formats to date

Building on excel_numeric_to_date(), the new functions convert_to_date() and convert_to_datetime() are more robust to a mix of inputs.
Handy when reading many spreadsheets that should have the same column formats, but don’t.

For instance, here a vector with a date and an Excel datetime sees both values successfully converted to Date class:

Code

convert_to_date(c("2020-02-29", "40000.1"))

[1] "2020-02-29" "2009-07-06"

Dates

Before dates in a dataset are loaded into R, typically they are stored as a column of character (string) values.
However, dates are inherently numeric and we lose this information when they are stored as strings.

Code

# string dates have no numeric value, so this errors
"2018-03-05" + 1

In R, we want to convert our string dates to R class Date, which preserves the dates’ numeric values and allows us to take advantage of the many date-related functions in R.
Once converted to class Date, the numeric value for the date represents the number of days since January 1, 1970 (1970-01-01).

`as.Date()`

Base R provides as.Date() to convert strings to dates, but it can be intimidating and unwieldy to use, particularly if your dates are not stored in one of 2 default formats (see ?strptime for more about date formats).

Code

# as.Date only accepts a couple of formats by default
# good
as.Date("2015-02-14")
# bad
as.Date("02/14/2014")
# specify a format to fix
as.Date("02/14/2014", format="%m/%d/%Y")

Once our dates are class Date, we can perform date arithmetic.

Code

a <- as.Date("1971-01-01")
class(a)

[1] "Date"

Code

# days since 1970-01-01
as.numeric(a)

[1] 365

Code

# date arithmetic
a - as.Date("1970/12/31")

Time difference of 1 days

Code

## Time difference of 1 days
a + 2

[1] "1971-01-03"

We just need something easier to use to convert strings of various formats to class Date!

Package lubridate

tidyverse provides the amusingly named package lubridate to help R users convert their string dates to R Date format more easily as well as functions to process those dates.
The Date conversion functions in lubridate accept a wide variety of date formats, removing the need for us to remember all of those format specifications.
Instead, just take the letters y, m, and d and place them in the order of the year, month and day, respectively, as they are stored in the date column. That ordering produces the name of the function to convert that column to Date (e.g.ymd(), mdy(), dmy()).

Code

library(lubridate)

Using lubridate

We’ll first load a dataset with various date formats to demonstrate the flexibility of lubridate functions.

Code

library(readr)
d <- read_csv("data/dates.csv")
d

# A tibble: 3 × 5
  fmt1     fmt2             fmt3           fmt4 decision_time             
  <chr>    <chr>            <date>        <dbl> <chr>                     
1 01/15/89 December 8, 2015 2015-02-27 20090101 Approved 10-10-15 07:15:55
2 02/13/92 January 22, 2012 2016-11-15 20080819 Denied 09-27-11 14:57:23  
3 03/15/84 March 3, 2010    2017-12-25 20071011 Approved 04-24-15 02:03:03

Currently, the first 4 columns are stored as chr, chr, date, and int.
The third column is actually read in as Date by read_csv() because of its particular formatting signals a date to read_csv().
The first pair of columns use the order month-day-year, while the second pair of columns use year-month-day.
So, we want mdy() and ymd().

Code

# no format specifications needed
# just ordering y,m, and d
dates <- data.frame(f1=mdy(d$fmt1), f2=mdy(d$fmt2),
                    f3=ymd(d$fmt3), f4=ymd(d$fmt4))
dates

          f1         f2         f3         f4
1 1989-01-15 2015-12-08 2015-02-27 2009-01-01
2 1992-02-13 2012-01-22 2016-11-15 2008-08-19
3 1984-03-15 2010-03-03 2017-12-25 2007-10-11

Date-time variables

If your date column additionally contains time information (i.e. is date-time), you can add one or more of h, m, and s to y, m, and d to form the function name to convert the string to class POSIXct.
Class POSIXct stores date-time variables as a number representing the number of seconds since the beginning of 1970.
The fifth column of our dates dataset, decision_time, is a date-time variable.
(It also contains an “Approved/Denied” string at the beginning that the lubridate() function will ignore!).
Specifically, the date-time is recorded as month, day, year, hour, minute, second, so we want mdy_hms():

Code

# month day year hour minute second
mdy_hms(d$decision_time)

[1] "2015-10-10 07:15:55 UTC" "2011-09-27 14:57:23 UTC"
[3] "2015-04-24 02:03:03 UTC"

Code

# POSIXct is a standard way of representing calendar time
class(mdy_hms(d$decision_time))

[1] "POSIXct" "POSIXt"

The ‘time zone’ standard UTC (Coordinated Universal Time) is used as the default zone. Use the argument, tz= to assign a zone.
To see a list of valid time zone specifications, run OlsonNames().

Code

# we'll use this for our dates variable
dates$decision_time <- mdy_hms(d$decision_time, tz="US/Pacific")
dates$decision_time

[1] "2015-10-10 07:15:55 PDT" "2011-09-27 14:57:23 PDT"
[3] "2015-04-24 02:03:03 PDT"

Extracting information from Date variables

lubridate provides several functions to extract specific information from Date variables including:
- day(): day of the month
- wday(): weekday
- yday(): day of the year
- month(): month of the year
- year(): year
Some examples of extracting information from Date variables.

Code

# we'll use the first column of our dates dataset
dates$f1

[1] "1989-01-15" "1992-02-13" "1984-03-15"

Code

# day of the month
day(dates$f1)

[1] 15 13 15

Code

# day of the year
yday(dates$f1)

[1] 15 44 75

Some more date information extraction:

Code

# weekday as numbers
wday(dates$f1)

[1] 1 5 5

Code

# weekday with labels
wday(dates$f1, label=TRUE)

[1] Sun Thu Thu
Levels: Sun < Mon < Tue < Wed < Thu < Fri < Sat

Code

# month of the year
month(dates$f1)

[1] 1 2 3

Extracting information from POSIXct variables

lubridate also has functions to extract time information from POSIXct date-time variables.

hour()
minute()
second()

Code

with(dates,  ## with() tells R to look for variables in object "dates"
data.frame(time=decision_time, h=hour(decision_time),
           m=minute(decision_time), s=second(decision_time)))

                 time  h  m  s
1 2015-10-10 07:15:55  7 15 55
2 2011-09-27 14:57:23 14 57 23
3 2015-04-24 02:03:03  2  3  3

Two sets of functions for date-time arithmetic
If you need to add or subtract time to your date variables, lubridate provides 2 sets of functions.
One set gives “intuitive” results, and will ignore conventions like leap year.
These include seconds(), minutes(), hours(), days(), weeks(), years().
The other set will adhere to those conventions, and are named by adding d to the names of the previous functions, dseconds(), dminutes(), dhours(), ddays(), dweeks(), dyears().

Code

#2016 is a leap year
# the intuitive result
ymd("2015-02-14") + years(2)

[1] "2017-02-14"

Code

# the exact result
ymd("2015-02-14") + dyears(2)

[1] "2017-02-13 12:00:00 UTC"

Easy data cleaning with the janitor package

Clean column names

add some points about
- count
- tably
- percent for making quick graphs

https://www.r-bloggers.com/2024/05/easy-data-cleaning-with-the-janitor-package/

sepal_length	sepal_width	petal_length	petal_width
5.1	3.5	1.4	0.2
4.9	3.0	1.4	0.2
4.7	3.2	1.3	0.2
4.6	3.1	1.5	0.2
5.0	3.6	1.4	0.2
5.4	3.9	1.7	0.4

sepal_length	sepal_width	petal_length	petal_width
5.1	3.5	1.4	0.2
4.9	3.0	1.4	0.2
4.7	3.2	1.3	0.2
4.6	3.1	1.5	0.2
5.0	3.6	1.4	0.2
5.4	3.9	1.7	0.4

Converting data from wide to long or long to wide

Reshaping data using tidyr package

gather()

Reshape the data

spread()

Reshape the data

separate()

unite()

Dealing with Missing Data

Missing data types

Types of “missing” data:

Finding Missing data

Useful checking functions

naniar

naniar plots

Missing Data Issues

filter() and missing data

filter() and missing data

To remove rows with NA values for a variable use drop_na()

Change a value to be NA

Summary

Data leaning

clean_names()

remove_empty() & remove_constant()

get_dupes()

tabyl()

Explore missing values

convert_to_date()

excel_numeric_to_date()

Convert a mix of date and datetime formats to date

Dates

as.Date()

Package lubridate

Using lubridate

Date-time variables

Extracting information from Date variables

Extracting information from POSIXct variables

Clean column names

Reshaping data using `tidyr` package

`gather()`

`naniar`

`naniar` plots

To remove rows with `NA` values for a variable use `drop_na()`

Change a value to be `NA`

`clean_names()`

`remove_empty()` & `remove_constant()`

`get_dupes()`

`tabyl()`

`convert_to_date()`

`excel_numeric_to_date()`

`as.Date()`

sepal_length	sepal_width	petal_length	petal_width
5.1	3.5	1.4	0.2
4.9	3.0	1.4	0.2
4.7	3.2	1.3	0.2
4.6	3.1	1.5	0.2
5.0	3.6	1.4	0.2
5.4	3.9	1.7	0.4