Hi all,
When you get the data in CSV format or any other format then you should try this 10 basic command to understand more the data. How much you understand the dataset will help you to make better model for it.
Before jumping into the code I want to mention some basic packages/software you want to have:
- Python
- Jupyter Notebook (Highly recommend this one) or Spyder
- pandas (default it will be installed in python)
Import the necessary packages and data like this
import pandas as pd
df= pd.read_csv("path to csv file)
Now Lets the establish top common commands:
1. df.head(): This command will show top rows in the dataset
2. df.info(): This command will tell the type of each column such as float, int, object type.
3. df. describe(): This will describe the column in the form as min, max, standard deviation, mean, count. This command is used to under the relation between each column in numerical form.
4. df.shape: This is used to know the number of column and no.of rows in the data.
5. df.value_counts(): This command I use most of the time. This command is used to the count of each unique value in the column.
6. df.unique(): This used to know the unique value in the column.
7. df.groupby: This somewhat big command compare to the others. This is used to group two columns or group by some condition to make another data frame.
8. df.sort_values(): This used to sort the value of the column-like ascending or descending order.
9. df.isnul().sum(): This is an important command. This command will help us know the number of null values in each column
10. df.fillna(): This command you may predict. This is used to fill the null values with any variable or string.
I fell these are the some repeated pandas command we use in our day to day work to know the data more about it. If you have any other commands or suggestion feel free to post it in the comments section guys.
Happy Learning!!!!
Comments
Post a Comment