13-Weeks of Data Science for Social Good with University of Chicago

Data Science for Social Good (DSSG) is a paramount fellowship program helping data science enthusiast to develop interdisciplinary skills that aim at solving social good problems. DSSG fellowship from the University of Chicago has been involving students, mentors and project managers since 2013 with a motive to provide meaning to data generated in NGOs' (e.g. Ushahidi) and Healthcare organization (e.g. Jose De Mello Saude, Lisbon Portugal). As a selected DSSG fellow in 2017, I seek to apply my machine learning understanding to solve the problem in healthcare in Portugal along with a hybrid team of statistician and software engineers. On the contrary, I was able to build corporate data science skills, teamwork, and social skills which are even more important for the success of the project. 13 weeks time bound workplace of high communication, training and target oriented is summarized sense of DSSG.
Before selection, I envisioned DSSG as a program where students (undergraduate or graduate) interested in data mining or machine (deep) learning can utilize their skills to solve problems in various multi-modal projects. Selection procedure for the DSSG fellowship is different from other fellowship or internships and is largely influenced by the applicant's penchant for social good. After the selection, I felt like, I will be utilizing my knowledge in the data science domain and will end my internship with a publication. Though what I expected was achieved with the combined effort from my team it was truly as nail breaking experience until the end of 12th week. I will try to be as short as I can in explaining the cycle of transformation a university trained data science student to a real-world trained data science researcher. After the selection, we were stationed at Nova School of Business and Economics at Lisbon, Portugal waiting for our Project Mentors and Managers. We do have some idea about the project but not about the data. This was the beginning of the first week. From second to the fourth week, we struggled with making some sense from the data received from Project Partners (mine was Jose De Mello Saude). During this period, we used to have weekly updates and bi-weekly deep dive presentation. Though a hurdle in your work, it was necessary to keep track on your progress. Until the fourth week, it was mostly, data analysis, number crunching, playing with Tableau, Identification the features and understanding the data. From the fifth, the clock started ticking, as only 7 weeks left. During the fifth week, we put a cork on data analysis work as we need to progress further. One thing we learned at the end of the fourth week is that in a corporate data science pipeline, the majority of the work lies in understanding the data and derive meaningful features through data analysis. Fifth to the seventh week, we directed ourselves to feature engineering, model selection, model training, and model evaluation phases. This was needed to see, what does model learn from this data and to get an answer to the question, Do we need more data of same type or some different type of data?. During the sixth week, we started having a weekly meeting with the Project partner, giving the presentation on our work and getting back their responses. Till now, what we know for sure was, that in this enormous data, gold is comprised of only 5% of it. Now, the question was, how to extract the gold form of the data and re-train our models so as to achieve some meaning. This question marks the beginning of the eighth week. From eighth to the tenth week, we did prototyping of the project, assess model performance and provided an even more comprehensive presentation to the project partners with specific questions for them. At the end of the tenth week and beginning of the eleventh week, we were directed towards laying the strategy to achieve improvement over the baseline. Working with NGOs' and healthcare organization who do not invest in Data Science, it is hard to create a baseline and try to beat it, if it is good. So, in our case, an eleventh week and twelfth week went in tuning the model and trying to beat the baseline. At the end of the twelfth week, we got some improvement as we hypothesized in the beginning, but the work left was more of cleaning the code and creating the deliverables. Though 13 weeks were fell short for us, we understood the importance of each phase in data science pipeline and it was the teamwork that really propelled the success of the project. Such an environment that DSSG creates is hard to find in an organization and is the reason the for carving the data scientist from an applicant. My learning from internship was beyond my expectation and I feel great to be a part of the DSSG University of Chicago Community.
My Team: Leid Zejnilovic, Qiwei Han, Laura Szczuczak, Mengxin Ji , IƱigo MDR
Follow exciting DSSG Blogs and DSSG Twitter and DSSG Europe

Comments

  1. Brilliant Blog! I might want to thank you for the endeavors you have made recorded as a hard copy of this post. I am trusting a similar best work from you later on also. I needed to thank you for these sites! Much obliged for sharing. Incredible sites!
    data scientist training and placement

    ReplyDelete
  2. Machine Learning Training in Hyderabad by AI Patasala will help accelerate the aspirant’s career in the field of Machine learning concepts.
    Machine Learning Course with Placements in Hyderabad

    ReplyDelete
  3. I just wanted to comment on this blog to support you. Nice blog and informative content. Keep sharing more blogs with us. All the best for your future blogs.
    Data Science Training and Placements in Hyderabad
    Data Science Course Training in Hyderabad

    ReplyDelete
  4. Great info! I recently came across your blog and have been reading along. I thought I would leave my first comment. I don’t know what to say except that I have.
    cyber security course in malaysia

    ReplyDelete

Post a Comment

Popular posts from this blog

PURPOSE FOR JOINING SOFTWARE DEVELOPERS NETWORK AT TOPTAL

Data Science in practice (real world) diff to DS online (Kaggle etc): what does the real data look like?