Data Analysis and Visualisation for Cereal (2024)

Kimberley Hutson

5/13/2020

The purpose of this document is to provide an overview of data analysis and visualization for the different types of cereals.

Things to know:

Type;

  • C = Cold
  • H = Hot

Manufacturer;

  • A = American Home Food Products
  • G = General Mills
  • K = Kellogg
  • N = Nabisco
  • P = Post
  • Q = Quaker Oats
  • R = Ralston Purina

The data set used in this overview was taken from: https://www.kaggle.com/crawford/80-cereals/data

Data set(Cereal)

NameManufacturerTypeCaloriesProteinFatSodiumFibreCarbohydratesSugarPotassiumVitaminsShelfWeightCupsRating
100% BranNC704113010.05.0628025310.3368.40297
100% Natural BranQC12035152.08.081350311.0033.98368
All-BranKC70412609.07.0532025310.3359.42551
All-Bran with Extra FiberKC504014014.08.0033025310.5093.70491
Almond DelightRC110222001.014.08-125310.7534.38484
Apple Cinnamon CheeriosGC110221801.510.5107025110.7529.50954

Summary of Data set(Cereal)

## Name Manufacturer Type Calories ## 100% Bran : 1 A: 1 C:74 Min. : 50.0 ## 100% Natural Bran : 1 G:22 H: 3 1st Qu.:100.0 ## All-Bran : 1 K:23 Median :110.0 ## All-Bran with Extra Fiber: 1 N: 6 Mean :106.9 ## Almond Delight : 1 P: 9 3rd Qu.:110.0 ## Apple Cinnamon Cheerios : 1 Q: 8 Max. :160.0 ## (Other) :71 R: 8 ## Protein Fat Sodium Fibre ## Min. :1.000 Min. :0.000 Min. : 0.0 Min. : 0.000 ## 1st Qu.:2.000 1st Qu.:0.000 1st Qu.:130.0 1st Qu.: 1.000 ## Median :3.000 Median :1.000 Median :180.0 Median : 2.000 ## Mean :2.545 Mean :1.013 Mean :159.7 Mean : 2.152 ## 3rd Qu.:3.000 3rd Qu.:2.000 3rd Qu.:210.0 3rd Qu.: 3.000 ## Max. :6.000 Max. :5.000 Max. :320.0 Max. :14.000 ## ## Carbohydrates Sugar Potassium Vitamins ## Min. :-1.0 Min. :-1.000 Min. : -1.00 Min. : 0.00 ## 1st Qu.:12.0 1st Qu.: 3.000 1st Qu.: 40.00 1st Qu.: 25.00 ## Median :14.0 Median : 7.000 Median : 90.00 Median : 25.00 ## Mean :14.6 Mean : 6.922 Mean : 96.08 Mean : 28.25 ## 3rd Qu.:17.0 3rd Qu.:11.000 3rd Qu.:120.00 3rd Qu.: 25.00 ## Max. :23.0 Max. :15.000 Max. :330.00 Max. :100.00 ## ## Shelf Weight Cups Rating ## Min. :1.000 Min. :0.50 Min. :0.250 Min. :18.04 ## 1st Qu.:1.000 1st Qu.:1.00 1st Qu.:0.670 1st Qu.:33.17 ## Median :2.000 Median :1.00 Median :0.750 Median :40.40 ## Mean :2.208 Mean :1.03 Mean :0.821 Mean :42.67 ## 3rd Qu.:3.000 3rd Qu.:1.00 3rd Qu.:1.000 3rd Qu.:50.83 ## Max. :3.000 Max. :1.50 Max. :1.500 Max. :93.70 ## 

Question 1: Which Manufacturer have cereals with the most fat?

It can be observed on the histogram, that Manufacturer K(Kelloggs) has the most fat content.

Data Analysis and Visualisation for Cereal (1)

Question 2: What are the type of cereals the different manufacturers product?

It can be observed on the histogram, that Type C(Cold) cereals are manufacturered the most.We can also see that the Manufacturers for N (Nabisco) and Q (Quaker Oats ) product both hot and cold cereals.

Data Analysis and Visualisation for Cereal (2)

Question 3: Which type of cereal persons prefer?

The Boxplot compares the rating of Cereals by the different types. It can be observed that Hot type of cereals have a Minimum rating = 51, Q1 rating = 53 , Median rating = 55, Q3 rating = 60 and Maximum rating = 65 with a right skew. Cold type Cereals have a Minimum rating = 18, Q1 rating = 33 , Median rating = 40, Q3 rating = 50 and Maximum rating = 95 (including the 1 outliner) with a right skew.

Data Analysis and Visualisation for Cereal (3)

Question 4: How much Calories one can get per serving?

The scatter plot shows, the amount of Calories you can get from a One cup by Manufactures.

Data Analysis and Visualisation for Cereal (4)

Question 5: Which cereal is the unhealthiest?

It is observed in the scatter plot, there is no relationship between hot and cold cereals, additionally it can be observed that cold cereals has the most fat and sugar content.

Data Analysis and Visualisation for Cereal (5)

Question 6: Which type of cereal will give you more energy(protein)?

It can be observed on the histograms, that eating Manufacturer K(Kelloggs) cold Cereal you will get more energy.

Data Analysis and Visualisation for Cereal (6)

Question 7: Which Manufacturer product Cereals with the most Sodium?

The Box plot compares the amount of Potassium that are in the different type of Cereals. It can be observed that Hot type of cereals have a Minimum = 0 Potassium, Q1 = 49 Potassium, Median = 98 Potassium, Q3 = 101 Potassium and Maximum = 110 Potassium with a left skew. Cold type Cereals have a Minimum = 0 Potassium, Q1 = 30 Potassium, Median = 80 Potassium, Q3 = 110 Potassium and Maximum = 330 Potassium (including the 4 outliners) with a right skew.

Data Analysis and Visualisation for Cereal (7)

Question 8: What is the average amount of Carbohydrates?

It can be observed on the histogram, that the average amount of Carbohydrates one can get from eating cereal hot or cold is 14.5974026.

Data Analysis and Visualisation for Cereal (8)

Question 9: What is the total amount of Fiber you can get from eating your cereal cold or hot?

It can be observed on the histogram, that the total amount of fiber you can get from eating you cereal cold is 74 and hot is 3.

Data Analysis and Visualisation for Cereal (9)

Question 10: Which Manufacturer have cereals with the most Vitamins?

It can be observed on the histogram, that Manufacturer G (General Mills) is rich in vitamins.

Data Analysis and Visualisation for Cereal (10)

  • train is from row 1 - 50
  • test is from row 51 - 77
## ## Call:## lm(formula = Rating ~ Fat, data = train)## ## Coefficients:## (Intercept) Fat ## 47.725 -5.248

Data Analysis and Visualisation for Cereal (11)

Summary for first Simple Linear Regression

## ## Call:## lm(formula = Rating ~ Fat, data = train)## ## Residuals:## Min 1Q Median 3Q Max ## -20.081 -7.102 -2.116 7.976 25.926 ## ## Coefficients:## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 47.725 2.560 18.643 <2e-16 ***## Fat -5.248 1.963 -2.673 0.0102 * ## ---## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1## ## Residual standard error: 11.94 on 48 degrees of freedom## Multiple R-squared: 0.1295, Adjusted R-squared: 0.1114 ## F-statistic: 7.143 on 1 and 48 DF, p-value: 0.01025

Correlation

## [1] -0.3599192

Anova

## Analysis of Variance Table## ## Response: Rating## Df Sum Sq Mean Sq F value Pr(>F) ## Fat 1 1018.3 1018.35 7.1434 0.01025 *## Residuals 48 6842.8 142.56 ## ---## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

AIC (Akaike’s information criterion)

## [1] 393.8402

BIC (Bayesian information criterion)

## [1] 399.5763

Simple Linear Regression 2

## ## Call:## lm(formula = Rating ~ Sugar, data = train)## ## Coefficients:## (Intercept) Sugar ## 58.616 -2.324

Data Analysis and Visualisation for Cereal (12)

Summary for second Simple Linear Regression

## ## Call:## lm(formula = Rating ~ Sugar, data = train)## ## Residuals:## Min 1Q Median 3Q Max ## -12.8051 -5.3921 -0.7764 4.7406 23.7296 ## ## Coefficients:## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 58.6163 2.1726 26.980 < 2e-16 ***## Sugar -2.3238 0.2688 -8.646 2.37e-11 ***## ---## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1## ## Residual standard error: 8.002 on 48 degrees of freedom## Multiple R-squared: 0.609, Adjusted R-squared: 0.6008 ## F-statistic: 74.75 on 1 and 48 DF, p-value: 2.367e-11

Correlation

## [1] -0.7803697

Anova

## Analysis of Variance Table## ## Response: Rating## Df Sum Sq Mean Sq F value Pr(>F) ## Sugar 1 4787.2 4787.2 74.755 2.367e-11 ***## Residuals 48 3073.9 64.0 ## ---## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

AIC (Akaike’s information criterion)

## [1] 353.8275

BIC (Bayesian information criterion)

## [1] 359.5636
## ## Call:## lm(formula = Rating ~ Calories, data = train)## ## Coefficients:## (Intercept) Calories ## 86.5206 -0.4161

Data Analysis and Visualisation for Cereal (13)

Summary for third Simple Linear Regression

## ## Call:## lm(formula = Rating ~ Calories, data = train)## ## Residuals:## Min 1Q Median 3Q Max ## -18.3546 -5.1485 -0.0718 6.5752 23.7289 ## ## Coefficients:## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 86.52055 6.95497 12.440 < 2e-16 ***## Calories -0.41609 0.06465 -6.436 5.4e-08 ***## ---## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1## ## Residual standard error: 9.376 on 48 degrees of freedom## Multiple R-squared: 0.4632, Adjusted R-squared: 0.452 ## F-statistic: 41.42 on 1 and 48 DF, p-value: 5.399e-08

Correlation

## [1] -0.6805819

Anova

## Analysis of Variance Table## ## Response: Rating## Df Sum Sq Mean Sq F value Pr(>F) ## Calories 1 3641.2 3641.2 41.417 5.399e-08 ***## Residuals 48 4219.9 87.9 ## ---## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

AIC (Akaike’s information criterion)

## [1] 369.6713

BIC (Bayesian information criterion)

## [1] 375.4073

The Models have a value of(anove, the lower the value the stronger it is);

  • Model 1- 142.56
  • Model 2- 64.0
  • Model 3- 95.16

There R-Square values are(the higher the R-square value the better the fit of the model);

  • Model 1= 0.1114
  • Model 2= 0.6008
  • Model 3= 0.4337

Correlation(the closer to 1 or -1 the stronger the correlation);

  • Model 1= -0.3599192
  • Model 2= -0.7803697
  • Model 3= -0.6805819

AIC (the model with the lowest AIC score is preferred);

  • Model 1= 393.8402
  • Model 2= 353.8275
  • Model 3= 292.291

BIC (the model with the lowest BIC score is preferred);

  • Model 1= 399.5763
  • Model 2= 359.5636
  • Model 3= 297.2817
actuals.Nameactuals.Manufactureractuals.Typeactuals.Caloriesactuals.Proteinactuals.Fatactuals.Sodiumactuals.Fibreactuals.Carbohydratesactuals.Sugaractuals.Potassiumactuals.Vitaminsactuals.Shelfactuals.Weightactuals.Cupsactuals.Ratingpredicteds
13Cinnamon Toast CrunchGC120132100.01394525210.7519.8235737.70189
33Grape Nuts FlakesPC100311403.01558525310.8852.0769046.99720
22CrispixKC110202201.02133025311.0046.8956451.64485
26Frosted FlakesKC110102001.014112525110.7531.4359733.05424
73TriplesGC110212500.02136025310.7539.1061751.64485
58Quaker OatmealQH1005202.7-1-11100110.6750.8283960.94015

From the comparisons it can be observed that, Model 2 is the best fit and the most accurate model of the dataset, for it has a stronger correlation(steeper curve and a higher R-square value).It can be predicted that the Rating goes up when there is More Calories.

Data Analysis and Visualisation for Cereal (2024)

FAQs

What is cereal grain analysis? ›

Cereal Grain Science is the study of the composition, structure, and properties of cereals and the reactions or transformations they undergo. Cereals are plants such as wheat, rice, corn, barley, rye, oats, and millet, which produce grains that are the base of the world's food supply.

What are the trends for cereal consumption? ›

Unit sales of ready-to-eat cereal in the U.S. rose 5.2% in 2020, according to industry tacker Circana. But they plummeted 8.7% in 2021 and another 3.9% in 2022. Recent trends seem to have given fresh momentum to the downturn, says Barclays analyst Andrew Lazar.

Do Gen Z eat cereal? ›

Gen Z adults consume the most cereal: 67% eat cereal at least once or twice per week, compared to 45% of Baby Boomers.

How many people in America eat cereal? ›

The Rise of America's Most Popular Breakfast. An estimated 283 million Americans ate boxed cereals in 2020, and many of us have childhood memories of begging Mom or Dad at the grocery store for our favorite breakfast cereal. And why wouldn't cereal be a quintessential breakfast food?

How do you evaluate the quality of cereals? ›

The quality is assessed by some extrinsic factors like age, broken grain, immature grain, foreign matter, infected grain and moisture content and some intrinsic factors like color, composition, bulk density, odor, aroma, size and shape.

What factors determine quality in cereal grains? ›

Grain quality is defined by several factors such as physical (moisture content, bulk density, kernel size, kernel hardness, vitreousness, kernel density and damaged kernels), safety (fungal infection, mycotoxins, insects and mites and their fragments, foreign material odour and dust) and compositional factors (milling ...

Why do millennials not eat cereal? ›

They aren't eating it for breakfast and they think it takes too much time for breakfast. On top of that, Topper points out that millennials are more concerned than other groups with getting sufficient protein and fiber in the morning, and they aren't likely to see cereal satisfying that need.

Who buys the most cereal? ›

Global Breakfast Cereal Retail Sales – Top 10 Countries
Country2021 Retail Sales (in millions)
United States$6,966
United Kingdom$3,245
Mexico$3,013
Canada$1,980
6 more rows
Jul 26, 2023

What is the most popular cereal? ›

2021. Consumer demand is a measure of the number of consumers on Amazon shopping for a given type of item during a given period. It was a tight battle for the most popular cereal in 2021, but Cinnamon Toast Crunch just edged out Rice Krispies to claim the crown.

What is the cereal capital of the US? ›

Most of the small cereal companies disappeared by 1910, but Battle Creek remained the cereal capital of the world as Kellogg, Ralston and Post products became staples on the breakfast tables around the world.

Which country sells the most cereal? ›

Revenue in the Breakfast Cereals market amounts to US$82.16bn in 2024. The market is expected to grow annually by 5.52% (CAGR 2024-2029). In global comparison, most revenue is generated in the United States (US$22,530m in 2024).

What is meant by cereal grain? ›

Grains, commonly referred to as 'cereals' or 'cereal grains', are the edible seeds of specific grasses belonging to the Poaceae (also known as Gramineae) family.

What does grain size analysis tell you? ›

The grain size analysis test is performed to determine the percentage of each size of grain that is contained within a soil sample, and the results of the test can be used to produce the grain size distribution curve. This information is used to classify the soil and to predict its behavior.

What is the difference between cereal grain production and yield? ›

Cereal yield, measured as kilograms per hectare of harvested land, includes wheat, rice, maize, barley, oats, rye, millet, sorghum, buckwheat, and mixed grains. Production data on cereals relate to crops harvested for dry grain only.

What are cereal grains for food plots? ›

Cereal grains for the deer manager mean wheat, oats, rye and triticale. Both types of plants (cereals and brassicas) provide strong nutritional value and a taste that deer relish.

References

Top Articles
Latest Posts
Article information

Author: Rev. Leonie Wyman

Last Updated:

Views: 5811

Rating: 4.9 / 5 (59 voted)

Reviews: 82% of readers found this page helpful

Author information

Name: Rev. Leonie Wyman

Birthday: 1993-07-01

Address: Suite 763 6272 Lang Bypass, New Xochitlport, VT 72704-3308

Phone: +22014484519944

Job: Banking Officer

Hobby: Sailing, Gaming, Basketball, Calligraphy, Mycology, Astronomy, Juggling

Introduction: My name is Rev. Leonie Wyman, I am a colorful, tasty, splendid, fair, witty, gorgeous, splendid person who loves writing and wants to share my knowledge and understanding with you.