Note that this approach is only useful for solving homogeneous problems. However, you can use it as a method for finding the initial solution when trying to solve a problem for which there is no data available. Start by defining the task, defining features, data and variables, if necessary. Then define a classification or regression model that you would use to solve the problem, and provide any data required to run the model. Use the variables and the data to run the model. Find the models that perform best, and take a deeper look at them to understand why they work. You could then use this knowledge to build your own model. This may require building new variables and features from scratch. Submitted by: Amsha Amir.
7. Data Cleaning | Prepare data for analysis. Prepare data for analysis in Python using the Pandas library and the Python module NumPy. You may need to transform or clean the data to make it ready for analysis. To clean data, delete incorrect values and check for inconsistencies. You can clean data using Pandas, NumPy and Scikit-Learn. Submitted by: Tonia Ali.
8. Data Cleaning | Using the Pandas library and the Python module NumPy, find keywords in text files. An example is using text that is tagged with spam and remove it from the data. If you are analysing unstructured data, this may be the most efficient method. The Pandas library provides many functions that can help you clean your data. In this case, we are using the Scikit-Learn module and the Pandas library to find key words. One approach would be to store all the text files in a dictionary, and then load the dictionary and check if a keyword is present. If so, remove the file from the dictionary. There are many variations of this approach. Submitted by: Raghav Shah.
9. Data Cleaning | Merge files, sort and clean data using the Pandas library. You may need to merge several data files together. You may need to sort the data or delete it if you have an odd number of files. You may also need to clean it if there are inconsistencies. You can use Pandas for this. This module provides many functions that are useful for cleaning the data. Submitted by: Apoorva Singh.
10. Data Visualization | Use R to create a number of graphs, maps and charts to explore the data. This approach is useful if you have a large volume of data.
Related links:
Comments