Data Science in practice (real world) diff to DS online (Kaggle etc): what does the real data look like?
Data science in real world differs by varied factors from data science in Kaggle. Factors are :
1. Level of complexity of data: The number of tables used for decision-making in an organization is far far higher than used for machine learning in Kaggle.
2. Lack of clear objective in the real world: In Kaggle competition, the programmer knows what to do and how to do. In the real world, most of the time is spent in data analysis and cleansing. This helps us in developing the goal, that can be achieved in the stipulated time.
3. The sparsity of forums for help: Real world data problem cannot be solved by simply looking over the internet. They require careful though process over weeks so that you can develop a plausible strategy.
4. Expectation: Kaggle competition, lay their expectation clearly and it is calculated based on strength of the people participating. In Real world, the expectation of the organization changes every day, week or month, and requires the data scientist to adapt to new expectation. This is essential for keeping the organization competitive.
5. Pipeline formation and descriptive statistics: There is no component of pipeline and descriptive statistics in Kaggle dataset. This is because generally, the problem mentioned in Kaggle is fairly comprehensible where as such is not the case in real world. Data science in the real world requires descriptive statistics for reaching out to non-technical organizational population and pipeline for techies.
1. Level of complexity of data: The number of tables used for decision-making in an organization is far far higher than used for machine learning in Kaggle.
2. Lack of clear objective in the real world: In Kaggle competition, the programmer knows what to do and how to do. In the real world, most of the time is spent in data analysis and cleansing. This helps us in developing the goal, that can be achieved in the stipulated time.
3. The sparsity of forums for help: Real world data problem cannot be solved by simply looking over the internet. They require careful though process over weeks so that you can develop a plausible strategy.
4. Expectation: Kaggle competition, lay their expectation clearly and it is calculated based on strength of the people participating. In Real world, the expectation of the organization changes every day, week or month, and requires the data scientist to adapt to new expectation. This is essential for keeping the organization competitive.
5. Pipeline formation and descriptive statistics: There is no component of pipeline and descriptive statistics in Kaggle dataset. This is because generally, the problem mentioned in Kaggle is fairly comprehensible where as such is not the case in real world. Data science in the real world requires descriptive statistics for reaching out to non-technical organizational population and pipeline for techies.
Comments
Post a Comment