当前位置: 首页 > news >正文

图片素材网站免费大推荐seo外链

图片素材网站免费大推荐,seo外链,做网站多少钱jf西宁君博出众,wordpress刷点击文章目录 14.4 USDA Food Database(美国农业部食品数据库) 14.4 USDA Food Database(美国农业部食品数据库) 这个数据是关于食物营养成分的。存储格式是JSON,看起来像这样: {"id": 21441, &quo…

文章目录

  • 14.4 USDA Food Database(美国农业部食品数据库)

14.4 USDA Food Database(美国农业部食品数据库)

这个数据是关于食物营养成分的。存储格式是JSON,看起来像这样:

{"id": 21441, "description": "KENTUCKY FRIED CHICKEN, Fried Chicken, EXTRA CRISPY, Wing, meat and skin with breading", "tags": ["KFC"], "manufacturer": "Kentucky Fried Chicken", "group": "Fast Foods", "portions": [ { "amount": 1, "unit": "wing, with skin", "grams": 68.0}...],"nutrients": [ { "value": 20.8, "units": "g", "description": "Protein", "group": "Composition" },...]
}     

每种食物都有一系列特征,其中有两个list,protionsnutrients。我们必须把这样的数据进行处理,方便之后的分析。

这里使用python内建的json模块:

import pandas as pd
import numpy as np
import json
pd.options.display.max_rows = 10
db = json.load(open('../datasets/usda_food/database.json'))
len(db)
6636
db[0].keys()
dict_keys(['manufacturer', 'description', 'group', 'id', 'tags', 'nutrients', 'portions'])
db[0]['nutrients'][0]
{'description': 'Protein','group': 'Composition','units': 'g','value': 25.18}
nutrients = pd.DataFrame(db[0]['nutrients'])
nutrients
descriptiongroupunitsvalue
0ProteinCompositiong25.180
1Total lipid (fat)Compositiong29.200
2Carbohydrate, by differenceCompositiong3.060
3AshOtherg3.280
4EnergyEnergykcal376.000
...............
157SerineAmino Acidsg1.472
158CholesterolOthermg93.000
159Fatty acids, total saturatedOtherg18.584
160Fatty acids, total monounsaturatedOtherg8.275
161Fatty acids, total polyunsaturatedOtherg0.830

162 rows × 4 columns

当把由字典组成的list转换为DataFrame的时候,我们可以吹创业提取的list部分。这里我们提取食品名,群(group),ID,制造商:

info_keys = ['description', 'group', 'id', 'manufacturer']
info = pd.DataFrame(db, columns=info_keys)
info[:5]
descriptiongroupidmanufacturer
0Cheese, carawayDairy and Egg Products1008
1Cheese, cheddarDairy and Egg Products1009
2Cheese, edamDairy and Egg Products1018
3Cheese, fetaDairy and Egg Products1019
4Cheese, mozzarella, part skim milkDairy and Egg Products1028
info.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6636 entries, 0 to 6635
Data columns (total 4 columns):
description     6636 non-null object
group           6636 non-null object
id              6636 non-null int64
manufacturer    5195 non-null object
dtypes: int64(1), object(3)
memory usage: 207.5+ KB

我们可以看到食物群的分布,使用value_counts:

pd.value_counts(info.group)[:10]
Vegetables and Vegetable Products    812
Beef Products                        618
Baked Products                       496
Breakfast Cereals                    403
Legumes and Legume Products          365
Fast Foods                           365
Lamb, Veal, and Game Products        345
Sweets                               341
Pork Products                        328
Fruits and Fruit Juices              328
Name: group, dtype: int64

这里我们对所有的nutrient数据做一些分析,把每种食物的nutrient部分组合成一个大表格。首先,把每个食物的nutrient列表变为DataFrame,添加一列为id,然后把id添加到DataFrame中,接着使用concat联结到一起:

# 先创建一个空DataFrame用来保存最后的结果
# 这部分代码运行时间较长,请耐心等待
nutrients_all = pd.DataFrame()for food in db:nutrients = pd.DataFrame(food['nutrients'])nutrients['id'] = food['id']nutrients_all = nutrients_all.append(nutrients, ignore_index=True)

译者:虽然作者在书中说了用concat联结在一起,但我实际测试后,这个concat的方法非常耗时,用时几乎是append方法的两倍,所以上面的代码中使用了append方法。

一切正常的话出来的效果是这样的:

nutrients_all
descriptiongroupunitsvalueid
0ProteinCompositiong25.1801008
1Total lipid (fat)Compositiong29.2001008
2Carbohydrate, by differenceCompositiong3.0601008
3AshOtherg3.2801008
4EnergyEnergykcal376.0001008
..................
389350Vitamin B-12, addedVitaminsmcg0.00043546
389351CholesterolOthermg0.00043546
389352Fatty acids, total saturatedOtherg0.07243546
389353Fatty acids, total monounsaturatedOtherg0.02843546
389354Fatty acids, total polyunsaturatedOtherg0.04143546

389355 rows × 5 columns

这个DataFrame中有一些重复的部分,看一下有多少重复的行:

nutrients_all.duplicated().sum() # number of duplicates
14179

把重复的部分去掉:

nutrients_all = nutrients_all.drop_duplicates()
nutrients_all
descriptiongroupunitsvalueid
0ProteinCompositiong25.1801008
1Total lipid (fat)Compositiong29.2001008
2Carbohydrate, by differenceCompositiong3.0601008
3AshOtherg3.2801008
4EnergyEnergykcal376.0001008
..................
389350Vitamin B-12, addedVitaminsmcg0.00043546
389351CholesterolOthermg0.00043546
389352Fatty acids, total saturatedOtherg0.07243546
389353Fatty acids, total monounsaturatedOtherg0.02843546
389354Fatty acids, total polyunsaturatedOtherg0.04143546

375176 rows × 5 columns

为了与info_keys中的groupdescripton区别开,我们把列名更改一下:

col_mapping = {'description': 'food','group': 'fgroup'}
info = info.rename(columns=col_mapping, copy=False)
info.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6636 entries, 0 to 6635
Data columns (total 4 columns):
food            6636 non-null object
fgroup          6636 non-null object
id              6636 non-null int64
manufacturer    5195 non-null object
dtypes: int64(1), object(3)
memory usage: 207.5+ KB
col_mapping = {'description' : 'nutrient','group': 'nutgroup'}
nutrients_all = nutrients_all.rename(columns=col_mapping, copy=False)
nutrients_all
nutrientnutgroupunitsvalueid
0ProteinCompositiong25.1801008
1Total lipid (fat)Compositiong29.2001008
2Carbohydrate, by differenceCompositiong3.0601008
3AshOtherg3.2801008
4EnergyEnergykcal376.0001008
..................
389350Vitamin B-12, addedVitaminsmcg0.00043546
389351CholesterolOthermg0.00043546
389352Fatty acids, total saturatedOtherg0.07243546
389353Fatty acids, total monounsaturatedOtherg0.02843546
389354Fatty acids, total polyunsaturatedOtherg0.04143546

375176 rows × 5 columns

上面所有步骤结束后,我们可以把infonutrients_all合并(merge):

ndata = pd.merge(nutrients_all, info, on='id', how='outer')
ndata.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 375176 entries, 0 to 375175
Data columns (total 8 columns):
nutrient        375176 non-null object
nutgroup        375176 non-null object
units           375176 non-null object
value           375176 non-null float64
id              375176 non-null int64
food            375176 non-null object
fgroup          375176 non-null object
manufacturer    293054 non-null object
dtypes: float64(1), int64(1), object(6)
memory usage: 25.8+ MB
ndata.iloc[30000]
nutrient                                       Glycine
nutgroup                                   Amino Acids
units                                                g
value                                             0.04
id                                                6158
food            Soup, tomato bisque, canned, condensed
fgroup                      Soups, Sauces, and Gravies
manufacturer                                          
Name: 30000, dtype: object

我们可以对食物群(food group)和营养类型(nutrient type)分组后,对中位数进行绘图:

result = ndata.groupby(['nutrient', 'fgroup'])['value'].quantile(0.5)
%matplotlib inline
result['Zinc, Zn'].sort_values().plot(kind='barh', figsize=(10, 8))

在这里插入图片描述

我们还可以找到每一种营养成分含量最多的食物是什么:

by_nutrient = ndata.groupby(['nutgroup', 'nutrient'])get_maximum = lambda x: x.loc[x.value.idxmax()]
get_minimum = lambda x: x.loc[x.value.idxmin()]max_foods = by_nutrient.apply(get_maximum)[['value', 'food']]# make the food a little smaller
max_foods.food = max_foods.food.str[:50]

因为得到的DataFrame太大,这里只输出'Amino Acids'(氨基酸)的营养群(nutrient group):

max_foods.loc['Amino Acids']['food']
nutrient
Alanine                          Gelatins, dry powder, unsweetened
Arginine                              Seeds, sesame flour, low-fat
Aspartic acid                                  Soy protein isolate
Cystine               Seeds, cottonseed flour, low fat (glandless)
Glutamic acid                                  Soy protein isolate...                        
Serine           Soy protein isolate, PROTEIN TECHNOLOGIES INTE...
Threonine        Soy protein isolate, PROTEIN TECHNOLOGIES INTE...
Tryptophan        Sea lion, Steller, meat with fat (Alaska Native)
Tyrosine         Soy protein isolate, PROTEIN TECHNOLOGIES INTE...
Valine           Soy protein isolate, PROTEIN TECHNOLOGIES INTE...
Name: food, Length: 19, dtype: object
http://www.yidumall.com/news/107794.html

相关文章:

  • 上海做网站推广公司品牌软文营销案例
  • 微信网站协议书精准营销案例
  • 客服外包公司怎么开seo优化关键词0
  • 上海奉贤网站建设 列表网公司宣传推广方案
  • 怎么仿制别人的网站百度网盘24小时人工电话
  • 杭州市萧山区哪家做网站的公司好北京seo顾问服务
  • 企业网站推广过程网络营销课程ppt
  • 大同做网站什么软件能搜索关键词能快速找到
  • 乐清做网站建设关键词搜索引擎排名查询
  • 红桥集团网站建设搜索引擎的三个技巧
  • wordpress安卓源码seo免费诊断电话
  • 网站如何建数据库广州百度seo优化排名
  • 零用贷网站如何做南京seo排名收费
  • jsp网站开发期末大作业免费建站有哪些
  • 嘉兴商城网站开发设计怎么去推广自己的店铺
  • 俄语网站建设全网软文推广
  • 短剧个人主页简介模板seo软件
  • 电子商务网站建设哪家好百度搜索简洁版网址
  • 什么是门户网站游戏推广员一个月能赚多少
  • 具有价值的网站建设百度游戏官网
  • 怎么在网站上做排名大连做优化网站哪家好
  • 赤峰网站设计爱站网长尾词挖掘
  • 南阳网站建设口碑怎么做营销推广
  • 怎样做企业手机网站自助建站seo
  • 广扬建设集团网站网络营销的概念与含义
  • 外贸网站建设制作教程百度搜索引擎推广步骤
  • 做网站写代码流程网站怎样做推广
  • 建设网站员工招聘策划方案长沙官网seo收费标准
  • 亚洲做性自拍视频网站简述什么是网络营销
  • 锦州网站开发怎么在百度发布个人简介