Data science is a multidisciplinary area dedicated to transforming data into information, which will be translated into useful knowledge for business.
It can be divided into six fields: data capture, storage, preprocessing, visualization, processing, and data analysis.
1️⃣ Data capture: Data can be extracted in different ways such as web scrapping, IoT sensors, APIs…
2️⃣ Storage: Once these data are captured, they can be stored in relational databases, non-relational databases, and options like the Hadoop File System (HDFS) for Big Data environments.
3️⃣ Preprocessing: To extract knowledge from this data, it is necessary to eliminate noise, null values… and data that do not provide information or may worsen the results.
4️⃣ Visualisation: Data visualization ranges from a well-designed dashboard showcasing sensor inputs from an IoT environment to the ability to visually analyze the productivity performance of a company.
5️⃣ Processing: The processing of data focuses on discovering patterns, trends, and aims to predict customer behavior, among other functions.
6️⃣ Data analysis: Once knowledge about the company is obtained based on data, it is analyzed for coherence and potential improvement.