food banks analytics - improving chronic disease control · 2019. 1. 1. · cations of food...

10
Food Banks Analytics - Improving Chronic Disease Control Yi-Hsuan Hsieh Georgia Institute of Technology [email protected] Luna Huang Georgia Institute of Technology [email protected] Lei Jiang Georgia Institute of Technology [email protected] Mario Wijaya Georgia Institute of Technology [email protected] Keith Woh Georgia Institute of Technology [email protected] 1 INTRODUCTION Food insecurity is the state of not having reliable access to adequate and aordable quantity of nutri- tious food for household members to lead an active and healthy lifestyle. Multiple studies [3][14][9][16] have shown that there is a strong correlation be- tween food insecurity status and chronic health conditions, and suggested that food banks could be a promising avenue for health promotion by tack- ling both problems of food insecurity and chronic disease. Our goal is to develop analysis and visual- ization tools that help organizations and govern- ment agencies allocate resources more eciently, and ultimately help reduce food insecurity and disease risk. For this purpose we Collected, cleaned, and scrapped several datasets: Demographic & population, food insecurity, disease prevalence, and food pantries’ location, Applied analysis to study relationship be- tween chronic disease prevalence and food insecurity, Applied clustering algorithms to predict potential food insecurity area, Created comprehensive visualization of the data and results of analysis, and provided a user-friendly interface. 2 PROBLEM DEFINITION More specically, our visualization tools and anal- yses help answer the following questions: (1) What are the relations between food inse- curity and the prevalence of obesity and diabetes? (2) How does geographic and demographic char- acteristics of the population aect their sus- ceptibility to the risk of chronic diseases induced by food insecurity? (3) Can we predict areas with high risk of chronic diseases due to lack of food? (4) How many people can a supermarket/food pantry impact if built at a specic location? 3 SURVEY 3.1 Concept Food insecurity is a widely discussed topic in the literature. For example, Berry et. al.[4] oered a conceptual analysis of food security and related terms (e.g. accessibility). eir theoretical work helps ensure that we understand and use our key terms correctly. Buczynski et. al. [6] dened the concept of food desert, and this provides a guide on how we can visualize food insecurity. is only

Upload: others

Post on 27-Sep-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Food Banks Analytics - Improving Chronic Disease Control · 2019. 1. 1. · cations of food pantries are scraped from public web-page of foodpantries.org. A›er collecting all datasets,

Food Banks Analytics - Improving ChronicDisease Control

Yi-Hsuan HsiehGeorgia Institute of

[email protected]

Lu�na HuangGeorgia Institute of

[email protected]

Lei JiangGeorgia Institute of

[email protected]

Mario WijayaGeorgia Institute of

[email protected]

Keith WohGeorgia Institute of

[email protected]

1 INTRODUCTIONFood insecurity is the state of not having reliableaccess to adequate and a�ordable quantity of nutri-tious food for household members to lead an activeand healthy lifestyle. Multiple studies [3][14][9][16]have shown that there is a strong correlation be-tween food insecurity status and chronic healthconditions, and suggested that food banks could bea promising avenue for health promotion by tack-ling both problems of food insecurity and chronicdisease. Our goal is to develop analysis and visual-ization tools that help organizations and govern-ment agencies allocate resources more e�ciently,and ultimately help reduce food insecurity anddisease risk. For this purpose we

• Collected, cleaned, and scrapped severaldatasets: Demographic & population, foodinsecurity, disease prevalence, and foodpantries’ location,

• Applied analysis to study relationship be-tween chronic disease prevalence and foodinsecurity,

• Applied clustering algorithms to predictpotential food insecurity area,

• Created comprehensive visualization of thedata and results of analysis, and provideda user-friendly interface.

2 PROBLEM DEFINITIONMore speci�cally, our visualization tools and anal-yses help answer the following questions:

(1) What are the relations between food inse-curity and the prevalence of obesity anddiabetes?

(2) Howdoes geographic and demographic char-acteristics of the population a�ect their sus-ceptibility to the risk of chronic diseasesinduced by food insecurity?

(3) Canwe predict areaswith high risk of chronicdiseases due to lack of food?

(4) How many people can a supermarket/foodpantry impact if built at a speci�c location?

3 SURVEY3.1 ConceptFood insecurity is a widely discussed topic in theliterature. For example, Berry et. al.[4] o�ered aconceptual analysis of food security and relatedterms (e.g. accessibility). �eir theoretical workhelps ensure that we understand and use our keyterms correctly. Buczynski et. al. [6] de�ned theconcept of food desert, and this provides a guideon how we can visualize food insecurity. �is only

Page 2: Food Banks Analytics - Improving Chronic Disease Control · 2019. 1. 1. · cations of food pantries are scraped from public web-page of foodpantries.org. A›er collecting all datasets,

explores data in Baltimore, and our project canscale it up to a national level.

3.2 Visualization�ere is an absence of comprehensive visualiza-tion integrating multiple categories of information.FeedingAmerica has a choropleth map (Figure 1)showing food insecurity at county-level.1 Figure 2shows an interactive choropleth map provided byUSDA2, where users can visualize various indexes,such as access to grocery stores and socioeconomiccharacteristics.

Figure 1: Feeding America Choropleth Map

Figure 2: USDA Food Environment Atlas

However, these two visualizations, as well asother methodologies we surveyed, do not allow1h�p://map.feedingamerica.org/2h�ps://www.ers.usda.gov/data-products/food-environment-atlas/go-to-the-atlas/

users to view information frommultiple categoriesat the same time. Users are limited to visualizingone factor at a time, making it di�cult to spot anyrelationships between factors visually.Joshi et. al. [17] showed visualization using dis-

ease, economic, and educational factor. �e visual-ization gave us idea on how to visualize our datawith metrics that makes visualization meaningful.However, this visualization did not incorporateinformation regarding food insecurity.

Figure 3: Visualization using disease, eco-nomic, and educational factor

We also surveyed works regarding theoreticalprinciples in visualizing food-related and geographicinformation. Kronenfeld and Wong [13] statedthat choropleth map might lead to a biased visualperceptions due to size alone, without regard fordensity, thus we can avoid this by utilizing car-tograms. Goldsberry and Duvall [8] built a geo-spatial visualization of food accessibility. �eirvisualization is on a city-level. We can use thisas a guide to scale it up to national level. Jung et.al. [12] introduces a methodology to increase thepower of GIS and geovisual techniques to visualizethe socio-spatial inequality by incorporating socialdimensions. �ough the data resource needed toimplement the visualization is hard to access, wecan adapt this method to implement our visualiza-tion.

2

Page 3: Food Banks Analytics - Improving Chronic Disease Control · 2019. 1. 1. · cations of food pantries are scraped from public web-page of foodpantries.org. A›er collecting all datasets,

3.3 AnalysisFor data analysis, Amarasinghe et. al. [2] appliedordinary least square to county-based data to ana-lyze the relationship between food insecurity andvarious factors, such as water source and poverty.�is serves as a reference to analyze the relation-ship between disease and food insecurity. GlennHyman et. al. [11] explored 6 cases of spatial anal-ysis on poverty and food insecurity. �e small areaanalysis and geographically weighted regressionmentioned can be tentative models for our projectwhich uses county-based datasets.Seligman et. al. [15] provided evidence that

food insecurity contributes to poor diabetes self-management. �ey suggested potential interven-tions that our project can improve on. Gundersenet. al. [10] inferred that food insecurity is relatedwith chronic metabolic diseases and food-sensitivedisease. �is provides us with the disease typeson investigating the e�ect of food insecurity andthe potential intervention for each disease. �edata analysis and visualization on this paper stillrequires nationwide data to make concrete conclu-sion.Susan J. Algert et. al. [1] utilized clustering to

�nd areas with severe food insecurity. However,the data granularity is at a individual level, andwe might not able to access such detailed data.Brock et. al. [5] used neural network to estimatethe amount of food available for collection at su-permarket. �e approach expands our modelingtechnique given limited information. �e approachused in the paper is narrow focused and used singlemodel to approach the problem.

4 PROPOSED METHOD�is section details the approach we took: Section4.1 explains our data gathering process, and ourlist of innovations is described in sections 4.2 - 4.4.

4.1 Data Gathering andPre-Processing

All datasets are gathered from public sources: dis-ease prevalence datasets are obtained from CDC,geographic and demographic information are ob-tained from US Census Bureau, food-insecurityrates are obtained from FeedingAmerica, and lo-cations of food pantries are scraped from publicweb-page of foodpantries.org. A�er collecting alldatasets, we cleaned and merged all datasets toimprove ease of use for algorithm analysis and vi-sualization. Python libraries Pandas and Numpyare used for data preparation.

4.2 Multiple Layers in a MapWe want to create an interactive and comprehen-sive geo-spatial visualization with toggling capa-bilities using these layers:

(1) Food insecurity rates(2) Demographics and disease prevalence(3) Food pantries location(4) Algorithm results (see section 4.4)

To provide a user-friendly interface, our map al-low users to visualize di�erent layers on the samemap. As shown in Figure 4, user has selected 2 lay-ers: base layer is a choroplethmap of our clusteringresults (see section 4.4), red dots refer to diabetesprevalence, and the radius of circles re�ect preva-lence rates. Refer to Figures 5 - 8 for various maplayers. User has full �exibility to show/hide thelayers that they want to see by toggling the but-tons on the le� panel. We have a total of 6 layers,including algorithm results (such as clustering),and various information layers (such as diseaserates).

3

Page 4: Food Banks Analytics - Improving Chronic Disease Control · 2019. 1. 1. · cations of food pantries are scraped from public web-page of foodpantries.org. A›er collecting all datasets,

Figure 4: Integrated view

Figure 5: Zoomed-in view, with tooltip

Figure 6: Food Insecurity Layer

4.3 Comparison Capabilitybetween Counties

To provide additional analysis, we added a sidebarto allow users to compare disease prevalence ratesbetween two counties over time. Users are ableto toggle this additional sidebar via a bu�on onthe top-le� corner of the map. Figure 9 shows a

Figure 7: Obesity Layer

Figure 8: Diabetes Layer

screen-shot of our sidebar. Once the user selectsa county, the d3 line chart is then updated withrelevant information.

Figure 9: Comparison between GeorgiaCounties

4

Page 5: Food Banks Analytics - Improving Chronic Disease Control · 2019. 1. 1. · cations of food pantries are scraped from public web-page of foodpantries.org. A›er collecting all datasets,

4.4 Algorithms4.4.1 Clustering. To decide the optimal num-

ber of clusters, we use Elbow method and �nd theoptimal number of clusters where the change insum of squared errors become less signi�cant.We used K-Means clustering to cluster counties

into 6 groups based on food insecurity rate, obe-sity rate, and diabetes rate to be�er understandcondition of food insecurity and chronic diseasein U.S.

4.4.2 FindHigh-Risk Areas. A high-risk areain our project is de�ned as a county that has rela-tively high food-insecurity rate compared to othercounties with similar living condition, such as:food access distance, income, disease rate, andpoverty rate etc. �e goal is to analyze how simi-lar of the severity from a county to another. From[9][14], we also know that diabetes and obesityare associated with food insecurity. �erefore, theteam decided to add more features: diabetes andobesity prevalence.Radius based nearest neighbor algorithm is used

to �nd counties that has similar living condition.Compared to the K-nearest neighbor algorithm,this approach enables us to prevent some countiesthat have all neighbors far from them by se�inga threshold of distance, i.e. the radius, betweencounties and their neighbors to remove outliers.Food insecurity rate for counties is transformedinto food insecurity level according to the thresh-old obtained from FeedingAmerica as the label forour dataset.�e neighbors for each county is identi�ed by

measuring the similarity of living condition be-tween them. First, we compute the statistic dis-tance of given living conditions between each pairof counties. Based on these distances, we consider2 counties to be neighbors if the distance betweenthem is in the �rst percentile of all the distances,which is the radius for our nearest neighbor algo-rithm.We found that the features in the dataset have

dependency to some degree. To take this into con-sideration, instead of using Euclidean Distance, we

chose to use Mahalanobis Distance because it cane�ectively measure scale-invariant features andtakes into account the correlations of the dataset.Once all of the similar neighbors of a county

is determined, new food insecurity level for eachcounty is generated by neighbors voting. �at isthe expected food insecurity level with the givenliving condition. By comparing the expected foodinsecurity level against the existing food insecuritylevel, if the original food insecurity rate is higher,we can claim that this county has more severe liv-ing situation in regards to food access and foodstability. �is could act as an indicator for organi-zations to pay more a�ention and thus can adjustrelated policies.Figure 10 shows our risk layer. Gray colored

area indicates high-risk counties, whereas greenindicates low-risk.

Figure 10: Risk Layer

5 EXPERIMENTS5.1 Visualization

ComparisonAs mentioned in Section 3.2, all existing visualiza-tions are either focused on food insecurity rate, asshown in Figure 1, or disease prevalence, as shownin Figures 2 and 3. �e team experimented withthese layers of visualization using d3.js and GoogleMap API:

• Risky area• Food insecurity

5

Page 6: Food Banks Analytics - Improving Chronic Disease Control · 2019. 1. 1. · cations of food pantries are scraped from public web-page of foodpantries.org. A›er collecting all datasets,

• Diabetes prevalence• Obesity prevalence• Food pantries location• Disease and food insecurity• Supermarket location

Clearly, our tool is far more superior than anyexisting tool that exist out there. It allows users tolook for things that they are interested in and cananalyze disease prevalence and food insecurity atonce. It serves as one-stop shop for those inter-ested in food insecurity and disease prevalence.

5.2 Optimized VisualizationPerformance

Our map visualization used to have slight laggingissues. When users zoom and pan freely, the maphas to be re-rendered and recolored for states andcounties. �e team experimented with plo�ing themap using Google Map API instead of native d3based map. However, the di�erence is not notice-able hence we decide to use the native d3.js basedvisualization.To reduce the lagging issue, the team created a

drop down menu that contains all of the states inU.S. When a user select a state on the drop downlist, the map will be zoomed to that particular lo-cation.

5.3 Food InsecurityImplication

By visualizing the result of the clusters obtainedusing chronic disease prevalence and food insecu-rity rate as speci�ed in Section 4.4. One can answerthe following questions:

• Which county/state has higher food insecu-rity rates and chronic disease prevalence?

• Which county/state has worse healthy foodaccess and stability of healthy food resourcecompared to other county/state with simi-lar living condition such as income, diseaserate, ratio of car ownership, etc?

Based on the algorithm and visualization, we foundthat Mississippi appears to be the state with high-est number of red clusters. Red clusters tend tohave higher food insecurity rates and chronic dis-ease prevalence, indicating severe condition. Tofurther validate the result, we visualized the foodinsecurity layer as black dots visualization as shownin Figure 11 to check if our result matchwith our in-tuition. Clearly enough, the visualization showedexactly what we expected.

Figure 11: Result Validation

5.4 Relationship BetweenFood Pantries and FoodInsecurity

�e team is interested in understanding whetherthere is any connection between food pantries andfood insecurity. �e hypothesis that the team cameup with initially was that the lesser food pantriesavailable in the region, the riskier it is. To verifythis, the team visualized food pantries location andclusters based on similar living condition. Graycluster areas are those that have similar living con-dition such as: food assessment, income, diseaserate, poverty rate, etc to their neighbors but tend tohave relatively higher food insecurity rate. Whitecluster areas are those that have no di�erence in

6

Page 7: Food Banks Analytics - Improving Chronic Disease Control · 2019. 1. 1. · cations of food pantries are scraped from public web-page of foodpantries.org. A›er collecting all datasets,

food insecurity rate, while the green cluster areasare those that have relatively lower food insecurityrate.From Figure 12, gray cluster areas appears to

have fewer food pantries, this supports our hypoth-esis, indicating the areas with less food pantriestend to be riskier than those that have more foodpantries.

Figure 12: Georgia high-risk areas and foodpantries location

5.5 Locating Closest FoodPantries

�e team has added an additional interaction onthe map by providing users with the closest foodpantries given their approximate location. �ereare multiple food banks in a state, which raisethe question on which location is closest to theuser. Users can click on the ”Find Food Pantries”bu�on and the system will show 3 closest food

pantries. Top 3 closest locations are determinedby calculating the distance of where the user islocated compared to the location of food pantries.Figure 13 shows an example, closest 3 food pantriesare shown as markers on the map, and the foodpantry’s speci�c information is shown when userclicks on the markers.

Figure 13: Locate user’s closest 3 foodpantries

5.6 Supermarket Locations�e team developed another visual analytic mod-ule to analyze accessibility to supermarkets in Geor-gia as shown in Figure 14. Accessibility is mea-sured both in terms of distance and travel timeto the nearest supermarket. We were unable topinpoint the location of every individual or house-hold in the population, but we utilized census-tractlevel data. For each tract, we use a representativelocation and query Google Maps API to get traveltime and distance from that location to the nearestsupermarket, and use those data to describe typicalsituation of the population in the tract. �en, wecombine these data with demographic data and ag-gregate for each county. Users can query averagedistance or travel time to supermarkets, and applymultiple �lters by demographic groups, income,and distance threshold to be considered as low ac-cessibility. �e results under certain conditionswere compared against the USDA data and were

7

Page 8: Food Banks Analytics - Improving Chronic Disease Control · 2019. 1. 1. · cations of food pantries are scraped from public web-page of foodpantries.org. A›er collecting all datasets,

shown to be consistent with the la�er. When com-pared to USDA food insecurity data, this tool o�ersa more �exible and interactive way to explore andanalyze food accessibility in the state.Furthermore, the novel analysis that this module

enabled is to recommend tracts to build a new su-permarkets that can a�ect the most people in thegroup of interest, or result in the greatest reduc-tion of travel time. For each census tract, we queryGoogle’s Distance Matrix API to compute whichother nearby tracts will bene�ts from a new super-market in the current tract. Based on data fromabout 20,000 API calls, the system can compute therecommended place to build a new supermarketgiven the population and distance �lter the userprovides. �is provides the users with an interac-tive way to analyze the impact of interventions onfood insecurity.

Figure 14: Visualizing Food Inaccessibility

5.7 Children Health Issueand Food Insecurity

Child food insecurity is a pressing issue, as chil-dren are particularly vulnerable to inadequate foodintake. In 2015, nearly 13.1 million U.S. children,or 16.6 percent, lived in food-insecure households.Relatively li�le research has been conducted to�nd out what causes food insecurity among chil-dren [7].Using state level data, our team’s preliminary

analysis showed that there is a positive correlation

between child food insecurity rates and child obe-sity and death rates as shown in Figure 15. �elinear regression model used children obesity andchild death rate as covariates. �e p-value of coe�-cients for child death rate and children obesity are0.029 and 0.001 respectively, which are signi�cantwith type I error (α ) of 0.05.

Figure 15: Child Food Insecurity

6 CONCLUSIONS ANDDISCUSSION

6.1 Better Allocation ofResources

Our study results shed light on improving the fooddistribution e�ciency and enhancing the healthconcurrently. Our analysis provides food pantries,with more detailed information on the distribu-tion of chronic disease population. �ey can takeadvantage of these information to pinpoint thearea with high disease prevalence and allocate thefood resource more appropriately. For example,the food bank can allocate more low-sugar food toarea with higher prevalence of diabetes, or morelow-salt food to areas with higher prevalence ofhypertension.

6.2 Future Work6.2.1 County Level Data for Children Health Is-

sue and Food Insecurity. Prevention of chronic dis-ease as early as possible is be�er than cure. �eteam explored the data to �nd whether there is anyrelationship between children health condition andfood insecurity rate. We found that children food

8

Page 9: Food Banks Analytics - Improving Chronic Disease Control · 2019. 1. 1. · cations of food pantries are scraped from public web-page of foodpantries.org. A›er collecting all datasets,

insecurity correlated with children obesity anddeath rate of children & teenagers at state level.Since the detailed data for counties is not available,we are not able to conclude the relationship at thecounty level. If county level data is available in thefuture, the team would like to do comprehensiveanalysis to gain more insight.

7 DISTRIBUTION OF TEAMEFFORT

All members contribute similar amount of timeand e�ort. Below shows concrete details of taskperformed by each member.

All membersResearch, writing documents,brainstorming visualization.

Yi-Hsuan Hsieh

Problem formulation foralgorithms surveyed.Validation of data to usefor analysis.

Lu�na Huang

Found dataset for the project,surveyed algorithms.Proposed visualization design.

Lei Jiang

Optimization of the code.Algorithm exploration andformulation.

Mario Wijaya

Merged and Preprocesseddataset. Visualization designand implementation.

Keith Woh

Proposed samples for mapvisualization. Created APIfor visualization in d3.js.

REFERENCES[1] Susan J Algert, Aditya Agrawal, and Douglas S Lewis.

2006. Disparities in access to fresh produce in low-income neighborhoods in Los Angeles. American jour-nal of preventive medicine 30, 5 (2006), 365–370.

[2] Upali Amarasinghe, Madar Samad, and Markandu An-puthas. 2005. Spatial clustering of rural poverty andfood insecurity in Sri Lanka. Food Policy 30, 5 (2005),493–509.

[3] Seth A Berkowitz, �eodore SZ Berkowitz, James BMeigs, and Deborah J Wexler. 2017. Trends in foodinsecurity for adults with cardiometabolic disease in

the United States: 2005-2012. PloS one 12, 6 (2017),e0179172.

[4] Elliot M Berry, Sandro Dernini, Barbara Burlingame,Alexandre Meybeck, and Piero Conforti. 2015. Foodsecurity and sustainability: can one exist without theother? Public health nutrition 18, 13 (2015), 2293–2302.

[5] Luther G Brock and Lauren B Davis. 2015. Estimatingavailable supermarket commodities for food bank col-lection in the absence of information. Expert Systemswith Applications 42, 7 (2015), 3450–3461.

[6] A Buczynski, Holly Freishtat, and Sarah Buzogany. 2015.Mapping Baltimore City’s food environment: 2015 re-port. Baltimore Food Policy Initiative and Johns HopkinsCenter for a Livable Future (2015).

[7] Alisha Coleman-Jensen, Ma�hew Rabbi�, ChristianGregory, and Anita Singh. 2016. Household food secu-rity in the United States in 2015. (2016).

[8] Kirk Goldsberry and Chris Duvall. 2009. Visualizingnutritional terrain: an atlas of produce accessibility inLansing, Michigan, USA. American Journal of Preventa-tive Medicine 36 (2009), 485–499.

[9] Christian Gregory and Alisha Coleman-Jensen. 2017.Food Insecurity, Chronic Disease, and Health AmongWorking-Age Adults. (2017).

[10] Craig Gundersen and James P Ziliak. 2015. Food inse-curity and health outcomes. Health a�airs 34, 11 (2015),1830–1839.

[11] Glenn Hyman, Carlos Larrea, and Andrew Farrow. 2005.Methods, results and policy implications of poverty andfood security mapping assessments. Food Policy 30, 5(2005), 453–460.

[12] Jin-Kyu Jung and Christian Anderson. 2017. Extend-ing the conversation on socially engaged geographicvisualization: representing spatial inequality in Bu�alo,New York. Urban Geography 38, 6 (2017), 903–926.

[13] Barry J Kronenfeld and David WS Wong. 2017. Visu-alizing statistical signi�cance of disease clusters usingcartograms. International journal of health geographics16, 1 (2017), 19.

[14] Barbara A Laraia. 2013. Food insecurity and chronicdisease. Advances in Nutrition: An International ReviewJournal 4, 2 (2013), 203–212.

[15] Hilary K Seligman, Terry C Davis, Dean Schillinger,and Michael S Wolf. 2010. Food insecurity is associatedwith hypoglycemia and poor diabetes self-managementin a low-income sample with diabetes. Journal of healthcare for the poor and underserved 21, 4 (2010), 1227.

[16] Hilary K Seligman, Courtney Lyles, Michelle B Mar-shall, Kimberly Prendergast, Morgan C Smith, AmyHeadings, Georgiana Bradshaw, Sophie Rosenmoss,and Elaine Waxman. 2015. A pilot food bank inter-vention featuring diabetes-appropriate food improved

9

Page 10: Food Banks Analytics - Improving Chronic Disease Control · 2019. 1. 1. · cations of food pantries are scraped from public web-page of foodpantries.org. A›er collecting all datasets,

glycemic control among clients in three states. Healtha�airs 34, 11 (2015), 1956–1963.

[17] Jared Shenson andAlark Joshi. 2012. Visualizing diseaseincidence in the context of socioeconomic factors. InProceedings of the 5th International Symposium on VisualInformation Communication and Interaction. ACM, 29–38.

10