university of vaasa
SCHOOL of technology AND INNOVATIONS
communication and systems engineering
Hilda Amo Henaku
using an android mobile application for data minning and decision support purposes
Master’s thesis for the degree of Master of Science in Technology submitted for assessment.
Vaasa, April 27, 2018.
SupervisorProfessor Mohammed Elmusrati
AcknowledgementsI am very grateful to the Lord God Almighty for his blessings and grace upon me and for His guidance through my studies.
My profound appreciation goes to my supervisor, Professor Mohammed Elmusrati for his guidance and support throughout my masters programme and this thesis work. I would also like to express my deepest gratitude to my instructor Shaima Abdelmageed for her guidance, assistance and support throughout this thesis.
Special thanks go to my mum, Miss Patience Addo and my siblings for their love and support throughout this period. I am very grateful for all the love and care you have shown me. I would also like to thank Yayra Asare, Emmanuel Tettey, Kwabena Kan Dapaah and all those who contributed to making my studies a success. May God bless you all and guide you in all you do.
table of content TOC o “1-3” h z u Acknowledgements PAGEREF _Toc512688208 h 2table of content PAGEREF _Toc512688209 h 3LIST OF FIGURES PAGEREF _Toc512688210 h 7LIST OF TABLES PAGEREF _Toc512688211 h 12ABBREVIATIONS PAGEREF _Toc512688212 h 13abstract PAGEREF _Toc512688213 h 141INTRODUCTION PAGEREF _Toc512688214 h 151.1Background PAGEREF _Toc512688215 h 151.2Thesis Statement PAGEREF _Toc512688216 h 171.3Motivation PAGEREF _Toc512688217 h 171.4Methodology PAGEREF _Toc512688218 h 181.5Expectation PAGEREF _Toc512688219 h 182PRINCLIPLES OF WEIGHT MANAGEMENT AND DATA MINING PAGEREF _Toc512688220 h 192.1The significance of calorie counting in weight management PAGEREF _Toc512688221 h 192.2Interpreting Nutrition Fact Labels PAGEREF _Toc512688222 h 212.3Wearables for Activity tracking PAGEREF _Toc512688223 h 222.4Diabetes and Physical Activity PAGEREF _Toc512688224 h 242.5A brief description of the android mobile operating system PAGEREF _Toc512688225 h 262.6System design and architecture PAGEREF _Toc512688226 h 272.7Data mining and machine learning PAGEREF _Toc512688227 h 292.7.1Introduction to Logistic Regression PAGEREF _Toc512688228 h 323MOBILE APP DEVELOPMENT PAGEREF _Toc512688229 h 363.1Graphic User Interface and functionalities of the mobile app PAGEREF _Toc512688230 h 383.1.2User authentication (Firebase Authentication) PAGEREF _Toc512688231 h 393.1.3Database for app (Firebase Realtime database) PAGEREF _Toc512688232 h 413.2Calorie Counting PAGEREF _Toc512688233 h 423.3Per hundred gram labelling against per serving PAGEREF _Toc512688234 h 433.4Daily required calories calculation PAGEREF _Toc512688235 h 463.5Blood glucose interpretations PAGEREF _Toc512688236 h 493.5.2Wireless syncing of glucometers PAGEREF _Toc512688237 h 513.6Physical activity monitoring with Fitbit PAGEREF _Toc512688238 h 524logistic regression model and implemention PAGEREF _Toc512688239 h 544.1What is classification in machine learning? PAGEREF _Toc512688240 h 554.2Building the predictive model with Python PAGEREF _Toc512688241 h 554.2.2Data preparation PAGEREF _Toc512688242 h 564.2.3Data exploration PAGEREF _Toc512688243 h 594.3Model implementation PAGEREF _Toc512688244 h 614.3.2Determining coefficients PAGEREF _Toc512688245 h 624.3.3Predicting test results and calculating accuracy PAGEREF _Toc512688246 h 634.3.4Performing cross validation PAGEREF _Toc512688247 h 634.3.5Confusion matrix PAGEREF _Toc512688248 h 644.3.6Precision, recall, F-measure and Support PAGEREF _Toc512688249 h 644.3.7Receiver Operating Characteristic (ROC) curve PAGEREF _Toc512688250 h 654.4Python – Firebase Connection PAGEREF _Toc512688251 h 664.4.2Conversion of JSON into Dataframe. PAGEREF _Toc512688252 h 684.4.3Prediction of cardiac arrest PAGEREF _Toc512688253 h 704.5Building the machine learning model with MATLAB PAGEREF _Toc512688254 h 714.5.2Description of the MATLAB toolbox. PAGEREF _Toc512688255 h 714.6Challenges with data acquisition PAGEREF _Toc512688256 h 765discussion of future works PAGEREF _Toc512688257 h 776CONCLUSION PAGEREF _Toc512688258 h 79References PAGEREF _Toc512688259 h 80
LIST OF FIGURES TOC h z c “Figure”
Figure 1. A sample NFL (National Health Service UK website, 2015). PAGEREF _Toc512645488 h 21Figure 2. An illustration of the network architecture of wearable devices (Ching ; Singh, 2016). PAGEREF _Toc512645489 h 23Figure 3. A simple representation of the system architecture of this project PAGEREF _Toc512645490 h 27Figure 4. An example of a wearable data collection architecture. PAGEREF _Toc512645491 h 28Figure 5.The S-sharped logistic curve (Simonoff ,2018). PAGEREF _Toc512645492 h 32Figure 6. A flowchart of the algorithm of the mobile app – my Calbuddy. PAGEREF _Toc512645493 h 36Figure 7.Splash screen for NFLs and fitness tracking. PAGEREF _Toc512645494 h 37Figure 9. Splash screen for daily calorie needs. PAGEREF _Toc512645495 h 38Figure 10. A screenshot of the Firebase authentication platform. PAGEREF _Toc512645496 h 39Figure 11. User interface for login options and email entry. PAGEREF _Toc512645497 h 39Figure 13.User interface for creating an account PAGEREF _Toc512645498 h 40Figure 14.A screenshot of my Calbuddy food label interpretation screen. PAGEREF _Toc512645499 h 42Figure 15.NFL with labelling showing amount of ingredients per 100 g and per serving. PAGEREF _Toc512645500 h 43Figure 16.Screenshot showing calorie calculations per serving. PAGEREF _Toc512645501 h 44Figure 17. Screenshot showing calorie calculations per 100g. PAGEREF _Toc512645502 h 45Figure 18. Screenshots showing user interfaces sfor selecting option for calorie calculations. PAGEREF _Toc512645503 h 46Figure 19. Image showing results for Scenario 3. PAGEREF _Toc512645504 h 47Figure 20. Other images showing interfaces for other weight goals which includes datepickers for the selection of the time duration for goal. PAGEREF _Toc512645505 h 48Figure 21. Display of a case of hypoglycaemia form Scenario 4. PAGEREF _Toc512645506 h 49Figure 22.This also shows an incident of hyperglycemia where the user’s blood sugar is too high. PAGEREF _Toc512645507 h 50Figure 23. Login screens for fitbit account and selection of needed informationS PAGEREF _Toc512645508 h 51Figure 25. Display of physical activity information and account settings PAGEREF _Toc512645509 h 52Figure 27.Commands for importing excel file and converting into dataframe. PAGEREF _Toc512645510 h 57Figure 28. Command window showing model building and testing. PAGEREF _Toc512645511 h 58Figure 29.Bar chat of ages versus frequency of cardiac arrest. PAGEREF _Toc512645512 h 59Figure 30. Bar chat of gender versus frequency of cardiac arrest. PAGEREF _Toc512645513 h 59Figure 31. Bar Chat showing diabetes status against frequency of cardiac arrests. PAGEREF _Toc512645514 h 60Figure 32. Command window for implantation of logistic regression model. PAGEREF _Toc512645515 h 61Figure 33. Syntax for obtaining logistic regression coefficients. PAGEREF _Toc512645516 h 61Figure 34. Display of coefficients in console. PAGEREF _Toc512645517 h 62Figure 35. Display of model accuracy. PAGEREF _Toc512645518 h 62Figure 36. Implementation of cross validation and display of accuracy of cross validation. PAGEREF _Toc512645519 h 63Figure 37. Confusion matrix of model PAGEREF _Toc512645520 h 63Figure 38. Console showing precision, recall, F-measure and support values PAGEREF _Toc512645521 h 64Figure 39. ROC curve implementation syntax and display PAGEREF _Toc512645522 h 65Figure 40. Pyton codes for firebase database connection and imported JSON file from database after connection. PAGEREF _Toc512645523 h 66Figure 41. Syntax of retrieving user data and display of results. PAGEREF _Toc512645524 h 66Figure 42.This shows user details for the given scenario in the Firebase database. PAGEREF _Toc512645525 h 67Figure 43. Converting data from JSON format into dataframes. PAGEREF _Toc512645526 h 68Figure 44. Creating new data frames with needed information PAGEREF _Toc512645527 h 68Figure 45. Creating new columns and replacing missing data. PAGEREF _Toc512645528 h 69Figure 46. Syntax for display prediction. PAGEREF _Toc512645529 h 69Figure 47. Output and display of user’s chances of getting a cardiac arrest. PAGEREF _Toc512645530 h 70Figure 48. MATLAB codes for assigning predictors and the target of the model. PAGEREF _Toc512645531 h 70Figure 49. Model training MATLAB codes. PAGEREF _Toc512645532 h 71Figure 50. Codes for implementing cross-validation PAGEREF _Toc512645533 h 71Figure 51. Output showing the estimated coefficients obtained from the model. PAGEREF _Toc512645534 h 72Figure 52. Prediction using dummy user with information in Table 4. PAGEREF _Toc512645535 h 72Figure 53. Confusion matrix of the model. PAGEREF _Toc512645536 h 73Figure 54. The ROC curve of the model using the MATLAB classifier app. PAGEREF _Toc512645537 h 74Figure 55. Screenshot showing accuracy of model using the machine learning classifier app PAGEREF _Toc512645538 h 74
LIST OF TABLES TOC h z c “Table” Table 1. Activity levels and their corresponding activity multipliers PAGEREF _Toc512641972 h 13Table 2. Recommended BG levels pre prandial and post prandial PAGEREF _Toc512641973 h 43Table 3. Dataset for training and testing of model. PAGEREF _Toc512641974 h 50Table 4. Dummy user information for testing of logistic regression model in MATLAB. PAGEREF _Toc512641975 h 67ABBREVIATIONSADL Activities of Daily Living
API Application Program Interface
AUC Area Under Curve
BG Blood Glucose
BMR Body Mass Ratio
CSS3 Cascading Style Sheets 3
EHR Electronic Health Record
HTML Hyper Text Mark Up Language
IDE Integrated Development Environment
KDD Knowlegde Discovery in Database
NFC Near Field Communication
NFL Nutrition Fact Label
OCR Optical Character Recognition
PA Physical activity
PAL Physical activity level
PAN Public Area Network
ROC Receiving Operating Characteristic
SCA Sudden Cardiac Arrest
SDK Software Development Kit
SQL Structured Query Language
UI User Interface
Faculty of technology
Author: Hilda Amo Henaku
Topic of the Thesis: Using an android mobile software for data mining
and decision support purposes
Supervisor: Professor Mohammed Elmusrati
Instructor: Shaima Abdelmageed
Degree: Master of Science in Technology
Major of Subject: Communication and Systems Engineering
Year of Entering the University: 2015
Year of Completing the Thesis: 2018Pages: 84
abstractFrom day to day, and across various industries and fields of study, data is being populated into data storehouses through various means such as Electronic Health Records (EHRs), smart wearables, mobile devices, etc. There is therefore a need to develop computational techniques that can help make these stored data more useful to humanity. This idea of problem-solving with such techniques is what has led to the popularity of data mining in recent times. Data mining is the process of using techniques that are computer-based to study large amount of data samples to decide patterns and trends that yield the outcomes that are observed from the data. This thesis ‘Using an android mobile application for data mining and decision support purposes’ aimed at using an android mobile application as a data sampling tool for data mining purposes. In this work, a predictive machine learning model, was used to predict the probability of occurrence cardiac arrest in individuals over a ten- year span using data from users a mobile application designed for this work. To establish a complete connection from the mobile app to prediction, a network was formed where the mobile app was connected to a realtime database and to a machine learning model. A detailed account of the implementation processes and principles are discussed throughout this work.
KEYWORDS: BG, my CalBuddy, data mining, machine learning, SCA, calories
INTRODUCTIONBackgroundDue to the rapid growth of information technology and the daily use of systems, applications and software, data is being collected on daily basis across various fields and institutions. Aside data being collected and stored, novel techniques have been introduced to help make data collected into databases more useful for human development, and thus, the concept of data mining. Data mining involves the use of techniques to study large amount of sampled data to decide patterns and trends that yield the outcomes that are observed from the data (Morias et al., 2017). The idea behind this project is to use an android application as a datamining tool and a decision support tool for promoting fitness and wellbeing. A simple mobile application was developed for this project to purposely demonstrate how effectively android applications can be used for data mining purposes and was also developed with the aim of future enhancement and modifications for public use. This mobile application is named “my Calbuddy” and would be referred to as that throughout this report. My Calbuddy is an android application that provides a platform for its users to determine the total amount of calories on the food label of an item they are either about to consume or purchase, to calculate the daily required calories for losing, gaining or maintaining weight and for diabetes patients to interpret their blood sugar(glucose) level readings from their glucometers. A real-time database (Firebase Database) is used to store the information that is entered by the users of the application for data mining purposes.
Losing, maintaining or gaining weight may not just involve cutting down or increasing one’s food intake but may involve physical activity as well. Dieting however, to achieve any of these goals should be done healthily so that one does not get complications afterwards. As keeping blood glucose (BG) concentrations in check when exercising could be very strenuous for people with diabetes who are on insulin therapy (Zaharieva & Riddell, 2017), my Calbudy uses its glucose interpretation interface to help reduce the stress involved in doing this.
Smart watches and wristbands have in recent times become trendy for weight management and healthy living purposes by tracking the physical activities of people who wear them. Some of these smartwatch brands include, Fitbit, Polar, Jawbone, Garmin, Samsung, and many others. For activity level measurements and accurate classification of individuals’ activity levels, my Calbuddy provides a feature that allows its users to retrieve information on their daily physical activities by connecting their Fitbit watches or wristbands via Bluetooth to their mobile devices. With this function, users will be able to tell exactly how physically active they are which will also help them choose their correct activity levels for accurate answers of their daily required calories for weight gain, weight loss or weight maintenance.
Another important aspect of weight loss that is taken into consideration in this work is the calculation of calories on food labels as mentioned earlier. One of the methods that have been introduced to help prevent obesity is the labelling of calories on food items (Nikalauo, et.al., 2017). The reason for this to help users make the right choices when shopping and planning their meals.
As an engineer, one should have the attitude of problem solving and innovation and consider everything as an area of interest and a potential solution or a step to solving a problem. It is for this reason that this project focuses extensively on healthcare and wellbeing support by not just calculating calories and tracking physical activity but using an aspect of data mining and artificial intelligence known as machine learning to predict the likeliness of the occurrence of cardiac arrest in individual over a ten (10) year period using data of my Calbuddy users. Information of weight, height, gender, age, and diabetes states are variables for this prediction. The machine level programming is explored with two programming languages which are Python and MATLAB which are later discussed into details.
The continuing chapters of this work include the literature review in Chapter two which introduces the ideas, theories and the background of various aspects and factors of the project. It would include topics such as, the significance of calorie counting in weight management, Interpreting Nutrition Fact Labels, A brief description of the android software, logistic regression and others. Chapter three will provide information on the details and procedures involved in achieve the goals of this project. Chapter four will discuss the experimentation using Python and MATLAB, challenges that were faced during the project and alternative approaches that could help make things easier. Chapter five follows with the discussion of future works and Chapter six will be the conclusion.
Thesis StatementUsing an android mobile application named my Calbuddy for data mining purposes (predicting chance of getting a cardiac arrest over a ten-year span using machine learning) and also for calculating daily required calories, calories on food labels and interpreting blood glucose (BG) level of diabetes patients.
MotivationThe quest of engineers to make life better and easier has led to the invention of varying forms of technology for various areas of life which tackle issues related to health, wellbeing, energy, power, finances and all relevant areas of our day to day lives. Problem solving, and innovation are traits that every inventor possesses which makes them the unique and creative people we see them to be. With the dream of being an inventor of a novel technology someday, I have realised that gaining these traits would go a long way in helping me to achieve my dream. During my bachelor’s degree (2007 – 2011), I had a roommate who had diabetes and I saw how she battled with some issues especially those related to weight management and sustaining her blood glucose level. This was what lead to the concept of my ‘my CalBuddy’, an android application which focuses on not just the regular people but people with diabetes as well. As machine learning has become a very important subject of interest in recent times, the focus broadened to not just making an application, but also to use the data sampled from it to help solve another problem as well, and to serve as a decision support system for Healthcare facilities in the near future.
MethodologyMy Calbuddy is installed on an android smartphone. A user enters information such as his or her weight, height, age, gender, activity level, diabetes status (this information is stored in a Firebase Realtime Database) and the Harris Benedict formula is used to calculate the his or her Body Mass Ratio (B.M.R.). The BMR value is then multiplied by the Physical Activity level (P.A.L.) factor of the user is to obtain the total daily required calories he or she needs. Daily calories for weight loss and gain are also calculated based on standard recommended guidelines. Calorie calculations on food labels are also done using the Altwater general system formula. A diabetes patient also uses a portal to enter his or her blood glucose readings form his glucometer which is interpreted based on the type of diabetes he or she has, and prandial button he chooses from the options available. The application tells the user if his blood sugar level is too high (hyperglycemia) at that moment or is too low (hypoglycemia) or is normal. Activity level is also tracked with a Fitbit wristband which is connected via Bluetooth to the user’s mobile device. A model for predicting a user’s likeliness of getting a cardiac arrest in the next 10 years is also done using Python and is demonstrated as well using MATLAB’s machine learning toolbox.
ExpectationWhat we seek to achieve by the end of this project is to predict whether an individual is the probability of a person to get a cardiac arrest in the next ten years using a logistic regression (a machine learning technique). Predictors that will be used for building this model will be based on data that my Calbuddy users enter when using the app. This further discussed in explained in Chapter 4.
PRINCLIPLES OF WEIGHT MANAGEMENT AND DATA MININGThis chapters gives insight into the various principles and theories that are used for developing this project and discusses previous and current works that have been done in relation to this topic as well. It is also to help the reader get a better understanding of the all the principles and criteria used for implementing this work.
The significance of calorie counting in weight managementKeeping one’s weight in check is very essential for a healthy lifestyle and for confidence building as well. In doing so, one may either want to lose, gain, or maintain his or her weight, which must however, be done cautiously to get the right results. The ideology of ‘calories in calories out’ indicates that weight is either lost or gained based on how much kilocalories (also referred to as calories) is consumed and how much energy is dispensed (Riera-Crichton & Tefft, 2014). If calories consumed is more than energy dispensed, weight is lost and if it is less, weight is gained. There are various formulas that are used to calculate the daily caloric needs of individuals but, in this study, the Harris Benedict Equation is used to find the Basal Metabolic rate and multiplied by the Physical Activity Level multiplier to get the total daily energy expenditure or calories. Below is the Harris Benedict formula for both Men and women.
Women: BMR = 655 + (9.6 x weight in kg) + (1.8 x height in cm) – (4.7 x age in years)Men: BMR = 66 + (13.7 x weight in kg) + (5 x height in cm) – (6.8 x age in years) (Hsu., et al, 2018)
Where B.M.R. represents Body Mass Ration.
Table SEQ Table * ARABIC 1. Activity levels and their corresponding activity multipliersActivity Multipliers
1.2 Sedentary; little or no exercise
1.375 Lightly Active; light exercise 1-3 days per week
1.55 Moderately Active; moderate exercise 3-5 days per week
1.725 Very Active; hard exercise 6-7 days per week
1.9 Extremely Active; hard daily exercise
In a research done by Johnson et.al in 2006, it was proposed that by undertaking restriction in calories through reducing consumed food quantities on some days and eating more than required on certain days increased the length of life and the quality of health in several species. Results from this experiment was observed as early as two weeks from the beginning of the experiment. They observed significant health benefits in cases of asthma, bacteria, fungal and viral infection, Insulin Resistance (I.R). There are several factors which influence the way we control and monitor our diet such as our emotions. Studies have showed that cognitive restraints are associated with a higher desire for food with low calories than food with higher calories. It is also said that there is a higher probability that people with binge-eating disorder and people on a strict diet will rely on food when they are unhappy (Racine, 2018). Several other factors that influence dietary control such memory on food quality, the repercussions of concentrating on the sensory properties of food, among others have also been investigated by Seguias & Tapper (2018).
Nutrition Fact Labels (NFL) furnish consumers with details about nutrients and calories in food items they purchase from grocery stores. NFLs have been made compulsory by various food regulatory institutions and in most countries and some continents. NFLs have benefits for consumers which include helping them to select healthy foods to help them eat healthily (Wolfson, et al., 2017). The visibility of labels is hence very important in aiding with decision making. In a study done in 2017 by Nikolaou, et.al., it was proposed that since about 60 to 70% of adults in Europe and the United States were overweight, increasing the font size of food labels by about 10 folds larger than the current sizes of fonts on food labels could be helpful in combating obesity. According to a research done in Canada (McCrory, et al., 2016), it was observed that though most participants of the study used calorie information on NFLs when purchasing food items, very few knew the exact amount of calories they needed in order to help maintain their weight. There are several calorie counting apps currently available on the market that calculate calories and help with weight management and fitness. Some of these include, My fitnesspal, Lose it, FatSecret, Cron-o-meter and others which are also based on similar formulas and concepts as discussed earlier. There are also apps which have been made purposely for diabetes patients to monitor their Blood Glucose levels and manage their weight as well. Some of which are, Glucosio, BG Monitor, Glooko, Fooducate and others.
Interpreting Nutrition Fact LabelsThe soft tissue framework of the body is made up mainly of protein, carbohydrates and fats which are referred to as Macronutrients (Campbell, 2017). Interpreting Nutrition Fact Labels (NFL) can be very stressful and confusing but there are key items to look out for on these labels which can help with proper decision making. These include the serving size, amount of calories, the Percent (%) Daily Value.
Serving Size: This provides information on the number of servings the package contains and is sometimes referred to as cups or sizes. It is important to note that nutrition on labels are provided per one serving of food.
Amount of calories: Calories are stated in one serving of food.
Percent (%) Daily Value: This gives information on how nutrients in one serving of food add up to one’s entire daily diet. Based on this, one can select items based on which nutrients he or she needs more and which he or she need less (US Food and Drug Administration, 2015).
The Atwater general factor system created by W.Atwater, an American Chemist is used to assign values of energy to food items. It uses a factor for the major energy nutrients which are protein, carbohydrates, and fat (The Oxford Dictionary of Sport Science and Medicine). It is based on this system that most food label calorie calculations are done. The formula used for the project to calculate macronutrients is stated below.
Total Calories(Macronutrients) = (grams of fat x 9) + (grams of carbohydrates x 4) +(grams of protein x 4)
Figure SEQ Figure * ARABIC 1. A sample NFL (National Health Service UK website, 2015).Wearables for Activity trackingEngaging in more physical activity(P.A.) helps promote quality health and reduces the risk of one acquiring diabetes, cancer, heart diseases. It also improves mental health and reduces cognitive dysfunction as well (Philips, et al., 2018). This implies that, quality of life is improved when more physical activity is done. Usually, factors such as intensity, regularity of activity and the time taken to perform exercise or physical activity cause us to estimate our physical activity above what it usually is. Modern technological improvements that have been made to wearable activity monitors are helping us to estimate our physical activities correctly through readings of the number of steps we take per day, our heart rate and other parameters (Gresham, et al., 2018). Wrist band wearables such as smart watches and fitness trackers are being used to for physical activity measurements in various research fields and are currently of a high demand on the market. It was recorded that, the annual sale of wearables in the United States in 2013, was 84 million which is implied high sales and demand ((Philips, et al., 2018). Wearables have been recognised by the European Parliament Scientific and Technology Options Assessment Panel (S.T.O.A) as one of the life changing inventions of recent times. Aside health, well-being and fitness purposes, there are future prospect of diagnosis based on data collected with these wearables to help reduce cost of accessing health facilities for treatment (European Commission, Directorate-General for Communications Networks, Content and Technology, 2016).There are currently several brands of wristband wearables and mobile applications for tracking heart rate, sleep, movement, and other factors associated with health and well-being. Some of these wristbands include, Fitbit, Apple health, Google fit, S-health, Jawbone, Polar, Microsoft health, etc. De Arriba-Pérez et al (2016), investigated how data can be gathered and integrated by all manufacturers of these wearables where they identified key issues related to harmonising all data models and transfer modes of these wearables and proposed that all vendors should standardize their data models and transfer modes. Hwang ; Lee (2017), conducted a research to determine the physical demands of construction workers using wristband wearable devices. Measures of the Heart rate reserves (%HRR) were taken for this research and it was observed that physical demands changed due to the work patterns of these construction workers and other factors such as age and heart stress. Pevnick et al (2018) also discussed the future and current status of wearables in the field of cardiology. They highlighted the significance of wearables in improving healthcare and as a good source of data for various forms of health studies. Glyn et al (2014) also determined how efficiently a smartphone app could be use in primary care to encourage physical activity.
Pedometers have been known as tools for measuring and encouraging indulgence in physical activity. (Åkerberg, et al., 2012). Aside pedometers, other forms of devices for activity monitoring and tracking are accelerometers and Integrated multisensory systems. Pedometers evaluate how many steps are taken along the vertical plane via either digital or mechanical readings. Accelerometers discover acceleration in either three directions, one direction or even two directions so that the regularity, amount of movement and magnitude of motion can be obtained. Integrated multisensory systems, however use accelerometery alongside other sensors to measure the reaction or feedback of the body such as heart rate during exercise to help improve the evaluation of physical activity (Gresham, et al., 2018). These activity tracking tools are what are incorporated into the wearables and physical activity tracking mobile application that are currently available.
Figure SEQ Figure * ARABIC 2. An illustration of the network architecture of wearable devices (Ching ; Singh, 2016).Diabetes and Physical ActivityDiagnosis for diabetes is done when BG is greater than or equal to 126 mgdL (7 mmol/L) when fasting or when BG is greater than 200 mg dL (11 mmolL) post prandial or any at time instant or when haemoglobin A1C (HbA1c) is greater than 6.5 percent (Jain, et al., 2017). When the blood sugar level (referred to as BG in this work) is too low for the body to produce sufficient energy for its functions, hypoglycaemia occurs. People with BG levels below 70 mg dL (3.9 mmolL) are usually said to have low blood sugar. Hyoglycemia is a very critical condition which can affect the standard of life of patience and hence requires regular monitoring of BG (Khunti, et.al., 2017). Symptoms of hypoglycaemia include, difficulty in speaking, dizziness, nervousness, sweating, shakiness, confusion and can even lead to mortality in adverse cases (NorthWest Memorial Hospital, 2013). Measures taken to control this are insulin dose reduction, performing less physical activity and increasing calorie consumption. Doctors should however be seen for advice before taking these steps (Khunti, et.al., 2017). On the contrary, Hyperglycemia refers to the situation where BG is high and is mostly predominant in people with diabetes Type 2. Measures to control hyperglycemia include increasing physical activity and regulating calories (Northwestern Memorial Healthcare, 2016).
Though exercising regularly comes with several health benefits, people with diabetes face challenges with managing their blood glucose levels during exercise which becomes even more challenging for patients on insulin therapy or on some sort of hypoglycemic agents. Hypoglycomia is the most threatening effect of exercising for diabetes patients. Despite that, it can be controlled with the right insulin dosage and alterations and with carbohydrate supplements in some scenarios. Hyperglycemia may however be experienced as well, after vigorous exercising since insulin sensitivity could be heightened due to severe reaction to stress. (Zaharieva ; Ridell, 2017).
People with diabetes must put into consideration the above mentioned conditions (which are Hyperglycemia and Hypoclycemia) and other factors before engaging in physical activity, and as such, diabetes patients who take regular injections and insulin infusions should be very careful when exercising (Zaharieva ; Ridell, 2017). These other factors to be considered may include, reasons for exercise, timing of physical activity, insulin sensitivity, how long the exercise may take and others (Siomos, et al., 2017). Due to these challenges, people with diabetes are usually not motivated to exercise. Jenkins and Jenks, proposed that medical practitioners should play a part in motivating diabetes patients to engage in physical activity. They also suggested that exercise prescription and other related subject to diabetes, be added to the medical school curriculum. Some physical activity guidance on how physical activity could be prescribed for diabetes patients was also discussed by Siomos, et al. (2017) who also suggested that aside taking notes and observing tailored prescriptions for patients, nurse practitioners should also engage patients in motivational interviews to help motivate them and give them enough audience when having sessions with them. They are suggested that nurse practitioners should try to understand what the limitations and fears of the patients individually, and how willing they are to physical activity into improve with their health.
A brief description of the android mobile operating systemAndroid is a Linux based open source operating system for mobile devices and other platforms (Android TV and android Auto). Android is fast evolving as; novel functionalities and abilities are introduced to its users and developers from time to time. It equips developers with what is needed for creating applications and utilizes hardware abilities of all devices. Android user interfaces adjust to suit each device and can be managed in various devices.
An institution for developing software called Android Inc. was formed in October 2003 in California in the United States of America. The founders were Andy Ruby, Nick Sears, Rich Miner and Chris White. Their initial plan was to develop a Linux based operating system which could work on digital cameras that are able to connect to computers. Two years later, Android Inc. was bought by Google (August 2005) and in November 2007, Google unveiled an alliance for mobile technology companies and investors called the Open Handset Alliance Consortium (OHAC). There was however, no handset release under OHAC until HTC released its first smartphone in October 2008. That was when the first version of android which had just 35 apps was disclosed. Android has introduced many versions over time, some of which are cupcake, lollipop, donut, froyo, Éclair, Ice cream sandwich and others (Perera, et al., 2017).
The Android Software Development Kit (SDK) contains libraries that are used for the creating android applications and has tools for developing applications, executing them and testing them as well. Integrated development environments (IDEs) are user interfaces used to make software development easier. Android studio which is the official IDE for the development of android applications can be downloaded and installed for android software development (Barry, 2015). Android studio was the platform used for coding and developing my Calbuddy.
Android has some very important features and components that are discussed below:
User interface (UI): Android has already built UI components which are managed by the UI to develop GUIs (Graphic User Interfaces) for apps.
Connectivity: Aside supporting WANs (Wide Area Networks) like 3G, CDMA, 4G and GSM, Ethernet, Wifi, LAN technologies, PAN (Bluetooth), current versions of android also support Near Field Communication(NFC).
Storage: There vast alternatives for saving data can be selected based on whether data should be accessed by other applications or not and the space needed for storage. These storage options include SQLite database, Network connection, cloud storage, shared preferences, Internal and external storage.
Media Support: Some supported media include MPS, MDI for audio, GIF, PNG, BMP, MPEG-4 SP for audio and JPEG for images.
Messaging: GCM (Google Cloud Messaging) is also form of Push messaging for Android aside the most common ones which are, SMS (Short Message Service), Multimedia Messaging Service (MMS).
Web browser: The web browser for android supports Cascading Style Sheets (CSS3) and Hyper Text Mark up Languages (HTMLs).
Android beam: It is used to transfer photos, URLs and other forms of data form one device to another via Bluetooth or NFC.
Other features of android include multi-tasking, multi touch and multi-languages (Perera, et al., 2017).
System design and architectureThe system is made of connections that form a complete circle from the app to the prediction. The app is connected to the real-time database and then to the machine learning system (or neural network). Connections include, user device connection to the internet (via a cellular network), connection of my Calbuddy to the Firebase Realtime Database and the Firebase Authentication console (via internet), connection of the Fitbit wristband to the android device through Bluetooth (a Public Area Network , PAN) and connection of the Realtime Database to machine level prediction algorithm (Python Environment).
Figure SEQ Figure * ARABIC 3. A simple representation of the system architecture of this projectTelecommunication companies and Internet Service Providers (ISPs) are usually the main suppliers of mobile internet services for devices. There are however factors that can interrupt the smooth transfer of data packets between mobile devices and transmission nodes of these mobile networks. Some of the problems that can occur during transmission may be due to poor network coverage, network congestion as well as security issues (Abdelmageed, 2012). These problems can cause delay in the synchronisation of the user’s information to Firebase and may cause issues with authentication, and delay in information transfer onto the Firebase Realtime Database.
Data transfer from wearables to databases is either done through direct or indirect access. In indirect access, third parties access data by using an app an intermediary to either collect data from a wearable or send data to a data warehouse while in indirect access, a data is retrieved using a warehouse REST API to a third-party server. A warehouse REST API provided by Fitbit for third party systems allows data from Fitbit wristbands to be obtained through the REST API. In spite of this, developers must be authorized before they can get access to data. Successful transfer of Data from wearables from the devices to the data warehouses are also dependent on the quality of the mobile network and may face the same challenges as pointed out.
Figure SEQ Figure * ARABIC 4. An example of a wearable data collection architecture.Data mining and machine learningData mining is a very key aspect of Knowledge discovery in database (KDD) which is being greatly explored lately. In recent times, there have been the introduction of several data mining techniques for extraction and discovery of patterns since there is a large amount of information for use in various sectors and industries (Neesha, et al., 2015). There is therefore a need to use these collected data to solve problems to help make life better. With the use of computer-based Information Systems, hidden patterns and trends can be identified form bulk amount of data using techniques in a process called Data mining (Morias, et al., 2017).
Data mining consists of several areas which are, statistics, probability, artificial intelligence and machine learning. Data mining models can be categorised into two main groups; predictive and descriptive models. In the health sector, predictive models are usually utilised for research and decision support (Neesha, et al., 2015). Amutha et al. (2018), used a mobile application to predict outcomes of a Treadmill test using data mining algorithms. Age, sex, BMI, diabetes, Dyslipidemia, and Systematic hypertension were used as the predictors for the machine learning algorithms that were used. During the research they realised that their model yielded low accuracy when the ages were more than 60, which implied that there were other parameters that should be taken into consideration when future works are done on the subject. A model for predicting the chances of death in females with ST- elevation myocardinal infarction was also designed by Mansoor et al. In the work of Mansoor et al. (2017), logistic regression, a data mining model was compared with another model called the Random rain forest where, the former outperformed the later. Kavakiotis et al. (2017) also did a study on methods used for machine learning and data mining for people with diabetes. They discussed major papers and findings in relation to the subject of research which included healthcare, diagnosis, environmental factors, amongst others. An appropriate data mining model for predicting how essential resuscitation of new born babies was, was investigated through the trial of several models by Morais et al. (2017). who finally got the best model from a case study that used the K-Nearest Neighbour technique, cross validation and data with oversampling. All the examples prove that a lot of work is being done with data mining in the health sector.
The discipline of data mining that is of much interest as far as this project is concerned is machine learning. Machine learning is a field of study that enables machines to “learn” from historic data that they are fed with to improve experience. Machine learning algorithms can be classified into three main groups namely; supervised learning, unsupervised learning, and reinforcement learning. Machine learning is an aspect of artificial intelligence but is sometimes used interchangeably (Kavakiotis, et al, 2017).
In supervised learning, predictors are used to estimate an outcome. Examples are Decision tree, Random forest, Logistic regression, KNN (K-nearest neighbour). On the contrary, in unsupervised learning there is no outcome to be predicted. Unsupervised learning is used for grouping population into clusters. Examples of unsupervised learning are; K-means and Apriori algorithm. However, in reinforcement learning, the machine learns from experience through trials to make explicit decisions based on previous experiences (Ray, 2017). Machine learning is being used for prediction in the healthcare sector, business sector and even in criminal investigations. Amrit et al., identified that it was feasible to use machine learning classification methods to detect child abuse using free- text or structured data and therefore proposed a decision support system API and its implementation for this purpose. Support vector machine and Random forest models were both used for this study. Forsyth et al, used machine learning to investigate symptoms that occur in breast cancer patients that undergo chemotherapy. Data used for this research was retrieved form an electronic health record and trained with a model to identify words that indicated the presence of a symptom, absence of a symptom or no identified symptoms at all. 10,000 sentences were labelled manually and divided into training, validation and test datasets. The performance of the model was finally tested using data that had about two percent of data that did not exist in the training dataset. The most prevalent symptoms were, nausea, fatigue and pain. Bzdog & Meyer-Lindenberg (2017), in their article, discussed the pros and cons of incorporating machine learning into precision psychiatry. The conventional way for making decisions in psychiatry for a while now has been through classical psychiatry which is based on few datasets, Bzdod & Meyer- Lindenberg (2017), suggested that machine learning could be used in the area of psychiatry since they successfully used a large amount of data was to identify trends and patterns in machine learning. Lynch et al. (2017), also investigated how prediction could be made on the survival of people suffering from lung cancer using supervised machine learning classification algorithms. According to their findings, they stated that predicting the survival of lung cancer patients was possible with the various types of supervised classification methods except for decision tree which was not feasible. Based on data collected when stroke patients were being admitted, the Bathel index status was correctly classified using Logistic regression, Support Vector Machine (SMV) and Random Forest. The Bathel scores of these patients were also correctly classified using, Logistic regression, Support Vector Machine (SMV). This machine learning built model also correctly predicted the Activities of daily living (ADL) for patients recovering from stroke. It was therefore concluded that this approach of using Machine learning for ADL prediction was very pragmatic and applicable (Lin, et al., 2018). Psychological ways of supressing pain in the treatment of sickle cell disease was explored by Yang, et al. (2018) too. Classification methods were used in this approach which included Multinomial logistic regression, K-Nearest Neighbors, Support Vector Machine and Random Forest.
Introduction to Logistic RegressionLogistic regression is one of the major multivariant analysis methods used in the healthcare sector. Multivariant analysis involves the prediction of just one outcome variable with a number of variables (more than one). It also investigates the correlation between the outcome variable which is the dependent variable and the predictors which are the independent variables. This correlation is described by a model which uses an expression in which the outcome is expressed as a product of the coefficient of the independent variables and their values. Each coefficient represents how each independent variable affects the outcome variable, and hence regulating the effect of the other independent variable. The outcome of logistic regression is usually binary. Aside logistic regression, other multivariant methods used in the healthcare sector are linear regression, discriminant analysis and proportional hazard regression (Park, 2013).
Logistic regression is one of the generalized linear models where y? ~ Binomial (1, p?), with the model satisfying p?. Compared to other methods, logistic regression has more benefits and due to its efficiency especially on small datasets, it is also referred to as the canonical link function (Simonoff ,2018). Studies and research have been done using logistic regression for prediction in various fields of study. Huang et al. (2017), used logistic regression to examine the loss of grain during harvest. Jabeur (2017), also used Partial Least Squares Logistic Regression to create a model to forecast bankruptcy in a French company. Elio et al., (2017) did a study on areas liable to be affected by Radon in Ireland. They used indoor Radon readings sampled by the Environmental Protection Agency and four additional dataset which yielded a better spatial resolution for the Indoor Radon map that already existed. Wu et al (2018), used data mining techniques which included logistic regression to build a model to predict Diabetes Mellitus Type 2 using 8 predictors from a dataset and an outcome, which was the diabetes status of the patient. The categories were classified as positive and negative classes. Using logistic regression, Eström, et al. (2018), utilised data from a program that detected certain relevant environmental issues to investigate issues like climate change and factors that caused climate change. Tripoliti, et al. (2017) also discussed machine learning techniques which included logistic regression, for detecting heart failure, the intensity of heart failure and the likely occurrence of catastrophic situations.
It is very important to comprehend the coefficients that are evaluated in any fitted model. This can be achieved by identifying the correlation between dependent and independent variables and suitably determining the dimension with which the independent variable changes. The extent to which the logit is altered in relation to the how much alteration is done in one unit of the independent variable is represented by the coefficient of the slope in logistic regression (Hosmer, Lemeshow & Sturdivant, 2013). The logistic curve is used for fitting data to predict the probability that an event will occur. Logistic model is more known due to the s-shaped curve and the logistic function. Logistic regression can however be subcategorised into binomial and multinomial logistic regression. Binomial logistic regression is used when the independent variable is either categorical or continuous and the dependent variable has two categories while multinomial logistic regression is used when the dependent variable has more than one category (Park, 2013).
Figure SEQ Figure * ARABIC 5.The S-sharped logistic curve (Simonoff ,2018).In logistic regression, the response variable has an average which can be expressed in terms of x as, p = a+ ?x. This yields values that can fall out of the 0 to 1 range and for this reason, the natural log of the odds are found.
Logit y= ln odds= lnp1-p=a+ ?x
Where p is the probability of the outcome which is of interest, x is the interpretive variable and ?, ?, the variables of the logistic regression.
When the antilog of the predictive equation it gives:
P =P(Y = interested outcome/X=x, a specific value) =ea+?x1+ea+?x= 11+e-(?+?x)In cases where the predictors are more, we get
Logit y= ln odds= lnp1-p=a+ ??X?+…..+ ??X?Hence
P =P(Y = interested outcome/X=x?, x?, ….., x?, a specific value) =ea+??X?+…..+ ??X? 1+ea+??X?+…..+ ??X?= 11+e-(?+??X?+…..+ ??X?)
The odd ratio (OR) is the ratio of one odd to the another odd, the OR of an increase in x can be expressed as eb?. If OR= 1, the outcome is not affected by the other odd, that is, the exposure and if the OR > 1, the exposure is associated with a greater odd outcome and vice versa when OR <1.
In this project, the prediction of cardiac arrest is done which yield a categorial outcome of ‘Yes’ as 1 and ‘No’ as 0. Details of this is discussed in Chapter 4.
MOBILE APP DEVELOPMENT
This chapter gives a detailed description of my Calbuddy, the functions it performs and the features it presents to its users. It can be said that, how well a software product does, depends on the people that use it (Blackwell, A., 2010). It is based on this notion that the GUI (Graphic User Interface) of my Calbuddy is created is hence very user friendly and simple for all its users. The main functions of my Calbuddy are calculating and determining the right amount of calories needed for individuals to either lose, gain or maintain weight, calculating the total amount of calories on the labels of food items user’s want to either buy or consume, receiving information on the daily physical activities through a Fitbit wearable device, interpreting of BG readings for diabetes patients so they can take the necessary steps to keep their BG at the right levels during physical activity sessions or at any random time.
Figure SEQ Figure * ARABIC 6. A flowchart of the algorithm of the mobile app – my Calbuddy. Graphic User Interface and functionalities of the mobile appMy Calbuddy first introduces its users to the its functionalities using a set of splash screens. After these introductory screens (which can be skipped), the user creates an account either by using his or her email address, or phone number which is authenticated using Firebase.
Figure SEQ Figure * ARABIC 7.Splash screen for NFLs and fitness tracking.
Figure SEQ Figure * ARABIC 9. Splash screen for daily calorie needs.User authentication (Firebase Authentication)Firebase provides SDKs and libraries that enable android app developers to identify their users and also save their data in a cloud. Authentication with Firebase can be done using one’s email, phone number, social media accounts such as twitter, Facebook, and others (Firebase website, 2018). Preferred authentication methods the app developer wants can be chosen from the Firebase console and enabled using the using the required procedures. The app authenticates its users using email, and phone. These are shown in figures 11, 12 and 13.
Figure SEQ Figure * ARABIC 10. A screenshot of the Firebase authentication platform.
Figure SEQ Figure * ARABIC 11. User interface for login options and email entry.
Figure SEQ Figure * ARABIC 13.User interface for creating an accountDatabase for app (Firebase Realtime database)Firebase Realtime database is a database hosted on a cloud. It does not require SQL queries or any complicated set up process like building a database from scratch. Data of users are synchronised in real-time via internet and is stored as JSON. Data can be accessed at all times, not only when the user is online. Security polices however apply to the use of Firebase which must be adhered by the database administrators who in this context is the app owner or developer. Databases are very important for all applications that store bulk data (Lee, 2012). For this reason and for the purposes of data mining, integrating this application with a database was very important and critical in achieving our goal.
Calorie CountingThe application calculates the calories of macronutrients on the food label using the Altwater formula quoted above. This to help users make choices on which products to pick in the grocery store based on their preferences and to enable them to know how much they may be consuming. The initial aim of this was to create a way for non- Finnish speakers to interpret food labels. This is a functionality that may be developed properly for users in the near future. Reducing the intake of carbohydrates has been known to control diabetes, although there have been several debates on low carbohydrate diets (Fienman, et al., 2015). Since total carbohydrates printed on food labels include all types of carbohydrates and not just sugars, it is very important for diabetes patients to take note of the amount of carbohydrates they are consuming. Colours are known to affect our emotions, intentions, and behavioural patterns (Bonnardel, et al., 2011). It is for this reason that my Calbuddy uses a different text colour for the inputted text of the total amount of carbohydrate on food labels so that it will catch the attention of diabetes patients. This is illustrated in figure below. In order to assist non-Finnish speaker to identify ingredients in Finnish, the Finnish names of ingredients are also given by the app.
Figure SEQ Figure * ARABIC 14.A screenshot of my Calbuddy food label interpretation screen.Per hundred gram labelling against per serving Provision is made the app for the two labelling types that are found on food items. Items labelled in per hundred gram give the grams of ingredients in every hundred gram of the food item whereas items labelling per serving straight away give the exact amount of ingredients per serving. What should however, be noted by consumers is that if one eats more that the portion or serving size, he or she is to multiply the size per serving on the food label by the number of portions he or she consumed. For instance, if 5 pieces or biscuit make up one serving size of the food item and an individual eats 10 pieces, he or she has taken 2 portions at a serving and should therefore multiple the total number of calories per serving by 2.
Figure SEQ Figure * ARABIC 15.NFL with labelling showing amount of ingredients per 100 g and per serving.Scenario 1
A straight forward calculation of calories in food items is done in this scenario. The total calories (macronutrients) in an item which has 5g of fat, 33.7g of carbohydrates and 7.8g of protein per serving is done using the app. The total calories is obtained as 211 kCal which is showed in figure below.
Figure SEQ Figure * ARABIC 16.Screenshot showing calorie calculations per serving.Scenario 2
This aim of this scenario is to show accuracy of food labelling calculations done by my Calbuddy. The information used are from the same food label used in scenario 2. To get the total number of calories when labelling is done per 100g, total calories of per 100g is divided by 100 and multiplied by the total grams of the food item per serving/ portion. The food label contained 10.0 g of fat, 67.3g of carbohydrates and 15.5g protein, all per 100g and its the grams per portion was 50g. The total number of calories calculated is 211 kCal establishing the fact that both labelling types yield the same results. The code used for this is shown in the appendix.
Figure SEQ Figure * ARABIC 17. Screenshot showing calorie calculations per 100g.Daily required calories calculationThe daily required calories for an individual is obtain using the BMR and activity multiplier as explained in Chapter 2. Estimations of calories needed to gain and maintain weight are done using guidelines from the National Institute of Health and Care Excellence (NICE) in the UK which are stated on the Nation Health Service website of the UK. A total amount of 2500 kCal is estimated per day for average men and 2000k Cal for average women. These however, vary based on the height, weight, age and activity level individuals. The ideal calorie allowance for weight gain per day is 250 calories for men and 125 calories for women. Medical advice should sort first before one gains weight, that is when he or she is underweight and has a BMI (Body Mass Index less done 1.8). The recommended safe range weight loss is also between 0.5 kg to 1 kg per week. It based on all these principles that calculations are made for both weight maintenance, loss and gain by the app.
In this scenario, a male user who is 19 years old, 172 cm tall and moderately active tries find out how much to consume to on a daily basis to keep his weight as it is.
Figure SEQ Figure * ARABIC 18. Screenshots showing user interfaces sfor selecting option for calorie calculations.
Figure SEQ Figure * ARABIC 19. Image showing results for Scenario 3.
Figure SEQ Figure * ARABIC 20. Other images showing interfaces for other weight goals which includes datepickers for the selection of the time duration for goal.Blood glucose interpretationsThis function which is built for users with diabetes was also done using guidelines of the NICE UK published on the diabetes.co.uk website. Ranges for glucose reading for people with diabetes are provided on this site which is seen in table 2. It is to help people with diabetes keep their blood glucose in check during their daily activities and while during exercise. This informs users of conditions of hypoglycemia and hyperglycemia and normal BG levels as well.
Table SEQ Table * ARABIC 2. Recommended BG levels pre prandial and post prandialTarget Levels by Type Upon walking Before meals (pre prandial) At least 90 minutes after meals (post prandial)
Non-diabetic* 4.0 to 5.9 mmol/L Under 7.8 mmol/L
Type 2 diabetes 4 to 7 mmol/L Under 8.5 mmol/L
Type 1 diabetes 5 to 7 mmol/L 4 to 7 mmol/L 5 to 9 mmol/L
Children w/ type 1 diabetes 4 to 7 mmol/L 4 to 7 mmol/L 5 to 9 mmol/L
This demonstrates how BG level interpretations displayed to users. A person with diabetes Type II whose glucometer reading is 2 mmolL and has not eaten in the past 90 minutes (post prandial) gets a feedback that his blood sugar level is low and thus would have to take the needed steps in normalising it.
Figure SEQ Figure * ARABIC 21. Display of a case of hypoglycaemia form Scenario 4.
Figure SEQ Figure * ARABIC 22.This also shows an incident of hyperglycemia where the user’s blood sugar is too high.Wireless syncing of glucometersSyncing of one’s glucometer is however disabled and left for future development so that users can wirelessly sync their glucometers for automatic interpretation rather than done the manual entry that is currently being done.
Physical activity monitoring with FitbitFitbit is one of the best known smart band brands worldwide. Fitbit provides APIs for developer want to integrate their apps with fitbit. The developer however must register his app before he can be able to integrate his application. My Calbuddy is integrated with fitbit to retrieve information on the physical activities and heart rate of it users who have fitbit wearables. The user must have a fitbit account before he can access her or her Fitbit information. The user logs in to his or her Fitbit account first, where information that has been retrieved from the Fitbit server and stored in the database is fetched for him or her. In implementation and testing of this functionality, the fitbit charge wristband was used. This feature was added to help encourage fitness and to help users know their physical activities levels so that they can choose the right level of physical activity for their daily needed calorie consumption calculations since most often than not, people estimate how active they are wrongly
Figure SEQ Figure * ARABIC 23. Login screens for fitbit account and selection of needed informationS
Figure SEQ Figure * ARABIC 25. Display of physical activity information and account settings
logistic regression model and implementionSudden cardiac arrest (SCA) is a very critical condition that leads to mortality and has been an issue of concern in the health sector. SCA occurs when someone collapses because the heart has stopped pumping blood across his or her body, mostly due to an issue of the heart’s electrical signals. It has a high risk of mortality and occurs without the foreknowledge of the patients (British heart foundation). Recently, a European Union funded project (ESCAPE- NET) was done to determine the risks and occurrence of SCA through the study of 857,790 SCA cases (Empana et al, 2018). Nanayakkara, et al. (2018) also did a study of machine learning techniques that could help make better a forecast of mortality of patients that had already had cardiac arrest. Though cardiac arrest usually occurs abruptly, there are risk factors associated with it, which are; diabetes, obesity, sedentary lifestyle, age, gender, previous history of heart attack and others. The occurrence of heart attack in individuals increases as a person grows and also, males more likely of getting cardiac arrest that women (Mayo clinic website).
As data is being gathered from day to day from various fields, there is the need for novel concepts and implementations to retrieve very important information from all these huge data storehouses for mankind to use to make life better (Neesha, et al., 2015). Mobile devices such as mobile phones have been come a part of our daily lives and can be very good tools for collecting data for analysis. That is why the mobile app is used as a data collection tool in this work. As discussed in chapter 2, machine learning techniques have been used in recent times to make predictions on health- related issues and diseases such as lung cancer, stroke, diabetes mellitus and many others.
This project aims at pre-diagnosing cardiac arrest before its occurrence and is to use this as decision support system for cardiac arrest predictions over a 10-year period. When individuals get a fair idea of their chances of getting cardiac arrests, they can be more careful and take some steps in order to help prevent its occurrence. Since my Calbuddy collected data on age, gender, diabetes status and physical activity of its users, these parameters are used predictors for the logistic regression model built for this project. A 10- year span is chosen based on the assumption that since age, weight and diabetes are variable and can change with time (age increases annually),10 years could be a logical duration for a person to change his or her lifestyle to create a permanent change to his diabetes status and weight. The machine learning algorithm was first explored and implemented using Matlab but, in order to build complete chain from the app to the database and to the machine learning algorithm, a much flexible programming language (Python) was use used later for implementation. Both methods are described below with more insight on the implementation process using Python.
What is classification in machine learning?Classification is the process of finding a function that learns to group data items into classes that have already been designated (Neesha, et al., 2015). The model used for the classification process is called the classifier (Miskuf, et al. 2017). In this work. Logistic Regression is used for this purpose to classification were predictors are used to classify the outcome of cardiac arrest into an occurrence as 1 and a non-occurrence as 0.
Building the predictive model with PythonPython is a simple high-level, object-oriented programming language which can be used as a scripting language for binding components together in application development. It was developed by a man named, Guido van Rossum and is quite similar in nature to Fortran which is one of the oldest programming languages. Unlike other programming languages, variables can be used in python without declaring them and classes are also used only when needed. Python employs the use of indentation as a way of its structural check and management (Doty, 2008). This language was chosen for the implementation of the machine learning algorithm and the for the database connection due to its simplicity and flexibility.
Data preparationIn order to build a logistic regression model to make predictions, a dataset is needed. Getting information from Health facilities around was not possible within the timeframe of this project hence, data used was not real data gathered from a system but was data that was conjured. The dataset has 6 independent variables (predictors) which were age, gender, height, weight, activity level, and the diabetes status of user. The outcome (dependent variable) was categorised as to whether records of hospital patients indicate an occurrence of cardiac arrest or not. The data which was in excel format was imported into the python IDE and used. Table 3 is the dataset is used for training and testing of the model.
Table SEQ Table * ARABIC 3. Dataset for training and testing of model.
Independent Variables (Predictors)
The predictors (X) used are age (numeric), weight (numeric), height (numeric), Activity level (categorical: ‘Extremely active’, ‘Lightly active’, ‘Moderate activity’, ‘Sedentary’, ‘Very Active’), Diabetes status (categorial: Non-diabetes :0, diabetes :1).
Dependent Variable (Desired target)
The predicted variable y is classified based on question there is an incident of cardiac arrest in the patient’s hospital records. This classification was done with “1” as Yes and “0” as No.
The python implementation of these concepts are shown in Figure 26. Dummy variable were created for the various categories of Activity level to indicate their presence as 1 and absence as 0.
Figure SEQ Figure * ARABIC 27.Commands for importing excel file and converting into dataframe.A test size of 33% (0.33) of the original data was used to testing the model as indicated below.
Figure SEQ Figure * ARABIC 28. Command window showing model building and testing.Data explorationAge: Though age is one of the contribution factors to cardiac arrest, the dataset did not show a significant relationship between cardiac arrest and age. This shown in the bar chart of age against frequency of cardiac arrest. The is not particular pattern shown.
Figure SEQ Figure * ARABIC 29.Bar chat of ages versus frequency of cardiac arrest.Gender: Our dataset shows that more males had cardiac arrests than females which supports existing literature. Gender can reliably be said to have a significant correlation with the occurrence of cardiac arrest.
Figure SEQ Figure * ARABIC 30. Bar chat of gender versus frequency of cardiac arrest.Diabetes: Since diabetes is one of the contributing factors of cardiac arrest, diabetes was also ploted against the frequency of cardiac arrest. Diabetes patience had a higher frequency of cardiac arrest than non diabetes patience. The difference is non diabetes and diabetes was not so vast when observed form the plot in figure 30. We can however say that diabetes has a signifant effect on the occurrence of cardiac arrest.
Figure SEQ Figure * ARABIC 31. Bar Chat showing diabetes status against frequency of cardiac arrests.Model implementationImplementation of Logistic Regression using python commands is shown below in Figure 31.
Figure SEQ Figure * ARABIC 32. Command window for implantation of logistic regression model.Determining coefficientsThe individual coefficients of the logistic regression model help in determining the probability of the occurrence of an event as elaborated in chapter 2. When working with logistic regression, the null hypothesis is that X variables are not associated with the Y variables which means that the Y variables predicted are far from the actual Y values that can obtained unexpectedly (McDonald, 2014). As previously mentioned, these coefficients can be used to determine the likeliness of an outcome
Figure SEQ Figure * ARABIC 33. Syntax for obtaining logistic regression coefficients.
Figure SEQ Figure * ARABIC 34. Display of coefficients in console.Predicting test results and calculating accuracy
The prediction accuracy of the model was 87% which is good. The aim of every model is to achieve an accuracy close to 100%.
Figure SEQ Figure * ARABIC 35. Display of model accuracy.Performing cross validationCross validation which gives better results than hold out sets is used for performance evaluation in this project. K-folds cross validation is use do determine how well the model performs (Wu, et al, 2018). K-folds cross validation when the model puts the dataset into k number of blocks handles each as a hold out set and trains these blocks k number of times. Hold out sets are small sample sets kept for the purpose of tuning the model (Thorton, C., 2011). In this project, 5-Folds cross validation is used. Below in Figure 34 is the syntax used for cross validation. The importance of cross validation is to prevent over fitting. The cross validation accuracy was obtained as 65%.
Figure SEQ Figure * ARABIC 36. Implementation of cross validation and display of accuracy of cross validation.Confusion matrixThe counts of the correct and incorrect classifications done by a classifier are usually shown by the confusion matrix. The confusion matrix obtained from our model is shown in figure 35 below.
Figure SEQ Figure * ARABIC 37. Confusion matrix of modelThis result tells us that, we have 7+1 correct predictions and 1 +1 incorrect predictions.
Precision, recall, F-measure and SupportPrecision= T? T?+ F?Recall= T? T?+ T?Where T? refers to True positives, F? refers to False positives and T? refers to True negatives.
F-beta score uses beta = 1.0 to weight recall higher than precision and hence making recall as essential as precision.
Support represents the number of times each class occurs in y-test. (Li , 2017)
Figure SEQ Figure * ARABIC 38. Console showing precision, recall, F-measure and support valuesInterpretation: This means that the model is able to label the occurrence of cardiac arrest correctly and not otherwise by 87% (precision) and is also able to identity all occurrences of cardiac arrest by 87%.
Receiver Operating Characteristic (ROC) curve ROC curves illustrate how distinguishing between two classes is done properly by a logistic regression model. It is very good to use ROC when much precision is needed in the classification being done. If the ROC region of the 1×1 square is on the top left corner in a manner that, the Area Under the Curve (AUC) becomes 100%, the model is said to distinguish perfectly, but, if the ROC curve falls on the diagonal line and has an are less than 50%, then the model does no good work (Sainani, K. L. ,2014). In this model, the ROC curve was 75% which implies a moderate ability to discriminate.
Figure SEQ Figure * ARABIC 39. ROC curve implementation syntax and displayPython – Firebase ConnectionThis connection is done so that user data of Calorie Buddy users can be fetched for predictions. When the real-time database is connected to python, the user information is obtain in JSON format which is then converted into Dataframes for predictions with the model. The implementation and this Firebase-python connection and the fetching of data from Firebase is shown in Figure 37. The information of a particular user is obtained using the User Id with is unique for every user.
Figure SEQ Figure * ARABIC 40. Pyton codes for firebase database connection and imported JSON file from database after connection.Scenario 5
In the scenario being used, a dummy user was created with age 28, height 170 cm , weight 68 kg, gender Female, and a non- diabetes status. The user’s id was ‘-L9L5P7iXUpsGacmREXr’ .
Figure SEQ Figure * ARABIC 41. Syntax of retrieving user data and display of results.
Figure SEQ Figure * ARABIC 42.This shows user details for the given scenario in the Firebase database.Conversion of JSON into Dataframe.Dataframes are used in python to implement rows and colums or for tabular representation or data. It helps make data manupulation easier. In order to work with the exported data from the database is converted from JSON format into dataframes as shown in Figure 40.
Figure SEQ Figure * ARABIC 43. Converting data from JSON format into dataframes.A new dataframe is created afterwards with only needed information. This included all the colums except the column for userID. The commands used are shown in figure 41.
Figure SEQ Figure * ARABIC 44. Creating new data frames with needed informationRelevant information were kept and missing numbers were replace for the final dataframe. Figures 43 show this.
Figure SEQ Figure * ARABIC 45. Creating new columns and replacing missing data.Prediction of cardiac arrestThe output of the prediction of the dummy user using the model in shown below (Figure 44)
Figure SEQ Figure * ARABIC 46. Syntax for display prediction.Final answer to the question: “How likely are you (the user) to get a cardiac arrest in the next 10 years?” is given below
Figure SEQ Figure * ARABIC 47. Output and display of user’s chances of getting a cardiac arrest.
Building the machine learning model with MATLABThis section gives a brief description of how the model was initially implemented with MATLAB.
Description of the MATLAB toolbox.There are many modern softwares available either freely or on sale for analysing data which includes MATLAB. MATLAB has been designed in such a way that it executes task quickly. It is user friendly and has libraries and functions that are pre-written and can be used for executing various tasks easily. MATLAB uses matrices as its main way of representing variables for data mining and other purposes. The Statics and Machine learning toolboxes or curve fitting toolbox are the commonly used data mining tool boxes among the toolboxes MATLAB has. It can be installed on computers with Linux, Windows and other operating systems (Mikuf & Zolotova, 2017).
The train dataset had 6 independent variable (predictors) which were age, gender, height, weight, activity level, and the diabetes status of patients. The outcome (dependent variable) was categorised into ‘yes’ and ‘no’ as to whether patients had had cardiac arrest or not. The data which was in excel format (Table 3) was imported into MATLAB using the import tool. Table is the training dataset used. Syntax for implementation is in Figure 46.
Figure SEQ Figure * ARABIC 48. MATLAB codes for assigning predictors and the target of the model.Training the model
The syntax for training the model is shown in the codes below.
Figure SEQ Figure * ARABIC 49. Model training MATLAB codes.Cross validation
5-Folds cross validation is used. The syntax in Figure 48 shows how its implemented.
Figure SEQ Figure * ARABIC 50. Codes for implementing cross-validationInterpreting logistic regression coefficients
The P values determine the relation between two variables, that is if there is a relationship between them or not. If the p value is small, the null hypothesis is rejected and vice versa. However, the p values of multiple logistic regression are not to be applied null hypothesis that are biological (McDonald, 2014).
Figure SEQ Figure * ARABIC 51. Output showing the estimated coefficients obtained from the model.Predicting outcome for a new user
After modelling was done, testing was done with a dummy user with the information in table 4. The model predicted that there was a chance of the person getting a cardiac arrest. This is shown in figure 47 below.
Figure SEQ Figure * ARABIC 52. Prediction using dummy user with information in Table 4.Table SEQ Table * ARABIC 4. Dummy user information for testing of logistic regression model in MATLAB.
Confusion matrix and ROC (Receiving Operating Characteristic) curve
Below is a figure of the confusion matrix done with the MATLAB Machine learning toolbox. It shows 17 correctly predicted ‘No’ responses and 16 correctly predicted ‘Yes’ responses.
Figure SEQ Figure * ARABIC 53. Confusion matrix of the model.ROC curve for the model
The ROC was 75% which implies a moderate power to discriminate.
Figure SEQ Figure * ARABIC 54. The ROC curve of the model using the MATLAB classifier app.Accuracy of the model.
The validation accuracy of the model is estimated as 75.0 % which is shown in figure below. The figure implies a fairly model accuracy as well.
Figure SEQ Figure * ARABIC 55. Screenshot showing accuracy of model using the machine learning classifier appChallenges with data acquisitionThe main challenge with building the model was data acquisition. Acquiring data for training and testing was not possible as stated earlier. Hence the data use is not accurate and is just a dummy create for the modelling to be done. Due to privacy concerns health care facilities are very sceptical about given out data of the patients. A lot of procedures and approval had to be followed which could not been done due to time factor. In order to get a perfect system for prediction procedures for acquiring data should be started early in future works in order to get desired results.
discussion of future worksThe reason for this project was to determine how efficient it is to solve health issues with software and applications like mobile apps. Since android was a very simple mobile software for application development, it was chosen android for developing the applications for this work. Collecting data for databases using smartphones and wearables can go a long way in solving health related issues and would help save lives in the near future. These devices could be used to send alerts to emergency numbers when individuals are faced with sudden tragic incidences such as heart attack for immediate attention my ambulances. The alert can be in the form of a text message to an emergency number set by default by the user. This could be carer, an emergency unit or a close relation. By timeless intervention and response some incidences of mortality can be done away with. This a feature that can be incorporate in the my Calbuddy app when future work is done.
With the dream of making my Calbuddy a very easy to use and well- known application certain functions such are bar code scanning of food labels for calorie interpretation, synchronisation of app with glucose meters for diabetic patients for automated glucose level alerts and interpretations and a portal where customers who are on insulin dosage can customise their details for alerts on when to take their insulin doses, could be future add on to it.
There are existing barcode database that could be used to help make calorie checks more efficient and less stressful. Though this feature is already available in some apps, an add app to Calorie Buddy would be of great help for its users. This would save users the trouble and hustle of having to looks for information on food labels and manually enter them into the system.
Another feature that could be incorporated in this mobile app is the interpretation of food labels from Finish to English for non-Finish speakers. This would go a long way in making grocery shopping less stressful for people who do not speak and understand Finnish but can speak English. This implementation could be done using Optical Character Recognition (OCR) whereby texts are scanned and interpreted into English or by using scanners which would fetch information from database with English interpretations for users.
Some food allergies may cause swelling of cells in the body leading to insulin resistance in diabetes patients (Derrell, 2011). Due to this proven connection between Diabetes and food allergies, adding allergy checks that would provide information on likely allergies of food ingredients on food labels would be a very useful feature on for people with diabetes on the mobile application. In doing this, available allergen databases can be worked with so that when barcodes are scanned for information of risks of allergies are fetched for the user. Gaining prior knowledge of allergies can help people chose grocery product without the fear of the aftermaths of consuming them.
Due to the inability to get real data for building the predictive model, I would recommend that the process of data acquisition be commenced early when future works are being done. This can would help build a more accurate system for accurate predictions. Records can be obtained from EHRs on cardiac arrest in-patients and out-patients of health facilities.
CONCLUSIONMobile device are not just fancy tools that are used for calls and social media purposes but can use for a wide range of activities. By installing a simple mobile app on a mobile device, certain goals and targets can be achieved, some of which are depicted it this thesis. This thesis did not just focus on the mobile software decision support, but also addressed the issue of SCA (Sudden Cardiac Arrest) which is an area of concern in the healthcare sector. The android platform was chosen for the app development due to its wide usage and flexibility.
The use of wearables for health care purposes has a great potential of solving problems bigger problems in the health sector in the future. With aim of promoting fitness and exercise for a healthy life, the Fitbit SDK was incorporated into the mobile application so that users would be motivated to achieve their daily weight goals by not just counting calories by exercising as well since exercising can help solve the problems of obesity, heart diseases and other forms of health conditions.
At the end of this thesis, we were able to perform pre-diagnosis of SCA for users of a mobile software application. Though the dataset used was not from a real-life system or database, the steps and processes involved can be implemented using accurate data from real systems for predictive purposes in the health care sector in the future. Achieving this result has made me glad because, at the end of the day, different issues were tackled and addressed at various stages in one project, i.e. form weight management to physical activity tracking to cardiac arrest prediction thus implying that used for problem solving in diverse ways.
ReferencesÄkerberg, A., Lindén, M., & Folke, M. (2012). How accurate are pedometer cell phone applications?. Procedia Technology, 5, 787-792.
Altwater General factor System, Michael Kent in The Oxford Dictionary of Sports and Science Medicine. Third Edition. Published in print January 2006 | ISBN: 9780198568506. Published online January 2007 | e-ISBN: 9780191727788. http://oxfordindex.oup.com/view/10.1093/acref/9780198568506.013.0679
Amrit, C., Paauw, T., Aly, R., & Lavric, M. (2017). Identifying child abuse through text mining and machine learning. Expert systems with applications, 88, 402-418.
Amutha, A., J., Padmajavalli, R., & Prabhakar, D. (2018). A Novel Approach for the Prediction of Treadmill Test in Cardiology using Data Mining Algorithms Implemented as a Mobile Application. Indian Hear Journal.
Backwell A. (2009). Humanc Computer Interaction- Lecture Notes, Cambridge Computer Science Tripos, Part II, 2. Available: https://www.cl.cam.ac.uk/teaching/1011/HCI/HCI2010.pdf
Barry, B. (2015). Android application development all-in-one for dummies. Second Edition, 31. New Jersey, NJ: John Wiley & Sons, Inc.
Bonnardel, N., Piolat, A., & Le Bigot, L. (20111). The impact of colour on Website appeal and users’ cognitive processes. Displays, 32(2), 69-80.
Bradley, A. P. (1997). The use of the area under the ROC curve in the evaluation of machine learning algorithms. Patten recognition, 30(7), 1145-1159.
British Heart Foundation. Cardiac Arrest. Available: https://www.bhf.org.uk/heart-health/conditions/cardiac-arrest
Bzdok, D., ; Meyer-Linderberg, A. (2017). Machine learning for precision psychiatry. Opportunites and challenges. Biological Psychiatry: Cognitive Neuroscience and Neuroimaging.
Campbell, I., (2017). Macronutrients, minerals, vitamins and energy. Anaesthesia ; Intensive Care Medicine, 18(3), 141-146.
Ching, K. W., ; Singh, M. M. (2016). Wearable technology devices security and privacy vulnerability analysis. Int. J. Netw. Secur. Appl, 8 (3), 19-30. https://www.researchgate.net/publication/303870892_Wearable_Technology_Devices_Security_and_Privacy_Vulnerability_Analysis
De Arriba-Pérez, F., Caeiro-Rodríguez, M., ; Santos-Gago, J. M. (2016). Collection and processing of data from wrist wearable devices in heterogeneous and multiple-user scenarios. Sensors, 16(9), 1558.
Derrell J. (2011). Understanding the connection between food allergies and diabetes. Natural News. Available: https://www.naturalnews.com/033125_food_allergies_diabetes.html
Diabetes.co.uk. Blood sugar level ranges. The global diabetes community. Available: https://www.diabetes.co.uk/diabetes_care/blood-sugar-level-ranges.html
Doty, S.R. (2008). Lecture notes- Python Basics. Loyola University of Chicago. Available: https://anh.cs.luc.edu/331/notes/PythonBasics.pdf
Ekström, M., Esseen, P.A., Westerlund, Bl, Grafström, A., Jonsson, B. G., ; Ståhl, G. (2018). Logistic regression for clustered data from environmental monitoring programs. Ecological Informatics, 43, 165- 173.
Elío, J., Crowley, Q., Scanlon, R., Hodgson, J., ; Long, S. (2017). Logistic regression model for detecting radon prone areas in Ireland. Science of The Total Environment, 599, 1317-1329.
Empana, J. P., Blom, M. T., Böttiger, B. W., Dagres, N., Dekker, J. M., Gislason, G., … ; Jonsson, M. (2018). Determinants of occurrence and survival after sudden cardiac arrest-A European perspective: The ESCAPE-NET project. Resuscitation, 124, 7-13.
Fienman, R. D., Pogozelski, W. K., Astrup, A., Bernstein, R. K., Fine, E. J., Westman, E. C., … ; Nielsen, J. V. (2015). Dietary carbohydrate restriction as the first approach in diabetes management: critical review and evidence base. Nutrition, 31(1), 1-13.
Get physical. Calculating your daily calorie expenditure. Available: http://getphysical.co.uk/index.php/training-resources/calculating-your-calorie-expenditure/
Glynn, L. G., Hayes, P. s., Cassey, M., Glynn, F., Alvarez-Iglesias, A., Newell, J., … ; Murphy, A. W. (2014). Effectiveness of a smartphone application to promote physical activity in primary care: the SMART MOVE randomised controlled trial. Br J Gen Pract, 64(624), e384-e391.
Gresham, G., Schrack, J., Gresham, L. M., Shinde, A. M., Hendifar, A. E., Tuli, R., … ; Piantadosi, S. (2018). Wearable activity monitors in oncology trials: Current use of an emerging technology. Contemporary clinical trials, 64, 13-21.
Hosmer Jr, D. W., Lemeshow, S., ; Sturdivant, R. X. (2013). Applied logistic regression (Vol. 398). John Wiley ; Sons.
Huang, T., Li., B., Shen, D., Cao, J., ; Mao, B. (2017). Analysis of the grain loss in harvest based logistic regression. Procedia Computer Science, 122, 698-705.
Hwang, S., ; Lee, S. (2017). Wristband-type wearable health devices to measure construction workers’ physical demands. Automation in Construction, 83, 330-340.
Jabeur, S. B. (2017). Bankruptcy prediction using Partial Least Squares Logistic Regression. Journal of Retailing and Consumer Services, 36, 197-202.
Jain, V., Patel, R. K., Kapadia, Z., Galiveeti, S., Banerji, M., & Hope, L. (2017). Drugs and hyperglycemia: A practical guide. Maturitas, 104, 80-83.
Jenkins, D. W., & Jenks, A. (2017). Exercise and diabetes: A narrative review. The Journal of Foot and Ankle Surgery, 56(5), 968-974.
Kavakiotis, I., Tsave, O., Salifoglou, A., Maglaveras, N., Vlahavas, I., & Chouvarda, I. (2017). Machine learning and data mining methods in diabetes research. Computational and structural biotechnology journal, 15, 104-116.
Khunti, K., Alsifri, S., Aronson, R., Berkovi?, M. C., Enters-Weijnen, C., Forsén, T., … & Kapur, R. (2017). Impact of hypoglycaemia on patient-reported outcomes from a global, 24- country study of 27,585 people with type 1 and insulin-treated type 2 diabetes. diabetes research and clinical practice, 130, 121-129.
Lee, S. (2012). Creating and using databases for android applications. International Journal of Database Theory and Application, 5(2), 99-106.
Li, S (2017). Building a logistic regression model with python step-by-step. Towards Data Science. Available: https://towardsdatascience.com/building-a-logistic-regression-in-python-step-by-step-becd4d56c9c8
Lin, W. Y., Chen, C. H., Tseng, Y. J., Tsai, Y. T., Chang, C. Y., Wang, H. Y., & Chen, C. K. (2018). Predicting Post-Stroke Activities of Daily Living Through Machine Learning-Based Approach on Initiating Rehabilitation. International Journal of Medical Informatics.
Lynch, C., M., Abdollai, B., Fuqua. J. D., Alexandra, R., Bartholomai, J. A., Blgemann, R. N., … & Frieboes, H. B. (2017). Prediction of lung cancer for patient survival via supervise machine learning classification techniques. International journal of medical informatics, 108, 1-8.
Mansoor, H., Elgendy, I. Y., Segal, R., Bavry, A. A., & Bian, J. (2017). Risk prediction model for in-hospital mortality in women with ST -elevation myocardial infarction: A machine learning approach. Heart & Lung: The Journal of Acute and Critical Care, 46(6), 405-411.
Mayo Clinic. Sudden Cardiac Arrest. Available: https://www.mayoclinic.org/diseases-conditions/sudden-cardiac-arrest/symptoms-causes/syc-20350634
McCrory, C., Vanderlee, L., White, C., M., Reid, J.L., & Hammond, D. (2016). Knowledge of recommended calorie intake and influence of calories on food selection among Canadians. Journal of nutrition and education and behaviour, 48(3), 199-207.
McDonald, J.H. 2014. Handbook of Biological Statistics, Third Editon. Sparky House Publishing, Baltimore, Maryland. http://www.biostathandbook.com/multiplelogistic.html
Morias, A., Pexoto, Hö, Coimbra, C., Abelha, A., & Machado, J. (2017). Predicting the need of Neonatal Resuscitation using Data Mining. Procedia Computer Science, 113, 571-576.
Nanayakkara, S., Fogarty, S., Ross, K., Milosevic, Z., Richards, B., Liew, D., … & Kaye, D. (2018). Machine learning models significantly improve outcome prediction after cardiac arrest. Journal of the American College of Cardiology, 71(11 Supplement), A775.
Neesha, J., Nur’Aini, A. R., Wahidah, H. (2015). Data mining in healthcare- A review. Procedia Computer Science, 72, 306-313.
Nikolaou, C. K., McPartland, M., Demkova, L., ; Lean, M. E. (2017). Supersize the label: The effect of prominent calorie labeling on sales. Nutrition, 25, 112-113.
Park H. (2013). An introduction to logistic regression: from basic concepts to interpretation with particular attention to nursing domain. Journal of Korean Academy of Nursing, 43(2), 154-164.
Perera. W. A. S. N., Nananyakkara, J., Werapitiya, B. K., Rajasingham, S., Premaratne, U.S., Erandaka, W.U., … ; Ratnayake, H. U. W. (2017). Introduction to Android.
Pevnick, J. M., Birkeland, K., Zimmer, R., Elad, Y., ; Kedan, I. (2018). Wearable technology for cardiology: An update and framework for the future. Trends in cardiovascular medicine, 28(2), 144-150.
Phillips, S. M., Cadmus-Bertram, L., Rosenberg, D., Buman, M. P., ; Lynch, B. M. (2018). Wearable Technology and Physical Activity in Chronic Disease: Opportunities and Challenges. American journal of preventive medicine, 54(1), 144-150.
Racine, S. E. (2018). Emotional ratings of high-and low-calorie food are differentially associated with cognitive restraint and dietary restriction. Appetite, 121, 302-308
Riera-Crichton, D., Tefft, N. (2014). Macronutrients and obesity: revisiting the calories in, calories out framework. Economics and Human Biology, 14, 33-49.
Sainani, K.L. (2014). Logistic regression. PM;R, 6(12), 1157-1162.
Seguias, L., ; Tapper, K. (2018). The effect of mindful eating on subsequent intake of a high calorie snack. Appetite, 121, 93-100.
Simonoff, J. S. (2012). Logistic regression- modelling the probability of success. Teaching paper of Leonard N. Stern School of Business, New York University.
Siomos, M., Z., Andreoni, M., Buchholz, S. W., ; Dickins, K. (2017). A guide to physical activity for individuals with diabetes. The Journal for Nurse Practitioners, 13(1), 82-88.
Thornton, C. (2011). Machine Learning: Lecture 5- Cross validation. University of Sussex, 15-18. Available: http://users.sussex.ac.uk/~christ/crs/ml/lec03a.pdf
Tripepi, G., Jager, K. J., Dekker, F.W., ; Zoccali, C. (2008). Linear and logistic regression analysis. Kidney international, 73(7), 806-810.
Tripoliti, E. E., Papadopoulos, T. G., Karanasiou, G. S., Naka, K. K., ; Fotiadis, D. I. (2017). Hear failure: diagnosis, severity estimation and prediction of adverse events through machine learning techniques. Computation and structural journal, 15, 26-47.
US Food and Drug Administration. (2015). Using nutrition fact labels: A how -to guide for older adults. Available: https://www.fda.gov/Food/ResourcesForYou/Consumers/ucm267499.htm
Wolfson, J. A., Graham, D. J., ; Bleich, S. N. (2017). Attention to physical activity-equivalent calorie information on nutrition facts labels: An eye-tracking investigation. Journal of nutrition education and behaviour, 49(1), 35-42.
Wu, H., Yang, S., Huang, Z., He, J., ; Wang, X. (2018). Type 2 diabetes mellitus prediction model based on data mining. Informatics in Medicine Unlocked, 10, 100-107.
Yang, F., Banerjee, T., Narine, K., ; Shah, N. (2018). Improving pain management in patients with sickle cell disease form physiological measures using machine learning techniques. Smart Health.
Zaharieva, D. P., ; Ridell, M. C. (2017). Insulin Management Strategies for Exercise in Diabetes. Canadian journal of diabetes, 41(5), 507-516.