Testiranje pandas_profiling

In [2]:
#po potrebi instaliraj paket
#!pip install pandas-profiling

import pandas as pd
import pandas_profiling as pp

Naložim podatke:

In [3]:
df = pd.read_csv(r"D:\OneDrive\GitHub\slanad_blog\static\data\2019-05-04_listings.csv") 
In [4]:
df.describe()
Out[4]:
id host_id neighbourhood_group latitude longitude price minimum_nights number_of_reviews reviews_per_month calculated_host_listings_count availability_365
count 1.127600e+04 1.127600e+04 0.0 11276.000000 11276.000000 11276.000000 11276.000000 11276.000000 9394.000000 11276.000000 11276.000000
mean 1.791508e+07 6.831232e+07 NaN 48.206394 16.360991 77.850656 4.237052 28.989447 1.757868 5.768358 142.674885
std 9.416062e+06 6.912654e+07 NaN 0.019361 0.034110 134.397362 17.971762 51.325959 1.917504 12.645909 135.123864
min 2.309000e+03 2.522000e+03 NaN 48.125848 16.190898 9.000000 1.000000 0.000000 0.010000 1.000000 0.000000
25% 1.010131e+07 1.073092e+07 NaN 48.192638 16.340987 38.000000 1.000000 2.000000 0.350000 1.000000 0.000000
50% 1.929118e+07 3.845697e+07 NaN 48.206156 16.358773 56.000000 2.000000 9.000000 1.000000 1.000000 99.000000
75% 2.596927e+07 1.132267e+08 NaN 48.219206 16.379510 85.000000 3.000000 32.000000 2.590000 4.000000 281.000000
max 3.224153e+07 2.419409e+08 NaN 48.298573 16.546787 9270.000000 1000.000000 514.000000 14.230000 85.000000 365.000000
In [5]:
pp.ProfileReport(df)
Out[5]:

Overview

Dataset info

Number of variables 16
Number of observations 11276
Total Missing (%) 8.3%
Total size in memory 1.4 MiB
Average record size in memory 128.0 B

Variables types

Numeric 10
Categorical 5
Boolean 0
Date 0
Text (Unique) 0
Rejected 1
Unsupported 0

Warnings

Variables

availability_365
Numeric

Distinct count 366
Unique (%) 3.2%
Missing (%) 0.0%
Missing (n) 0
Infinite (%) 0.0%
Infinite (n) 0
Mean 142.67
Minimum 0
Maximum 365
Zeros (%) 26.4%

Quantile statistics

Minimum 0
5-th percentile 0
Q1 0
Median 99
Q3 281
95-th percentile 362
Maximum 365
Range 365
Interquartile range 281

Descriptive statistics

Standard deviation 135.12
Coef of variation 0.94708
Kurtosis -1.4436
Mean 142.67
MAD 122.54
Skewness 0.36892
Sum 1608802
Variance 18258
Memory size 88.2 KiB
Value Count Frequency (%)  
0 2977 26.4%
 
365 291 2.6%
 
364 162 1.4%
 
90 104 0.9%
 
363 85 0.8%
 
362 85 0.8%
 
39 81 0.7%
 
89 79 0.7%
 
180 74 0.7%
 
345 70 0.6%
 
Other values (356) 7268 64.5%
 

Minimum 5 values

Value Count Frequency (%)  
0 2977 26.4%
 
1 63 0.6%
 
2 50 0.4%
 
3 36 0.3%
 
4 43 0.4%
 

Maximum 5 values

Value Count Frequency (%)  
361 32 0.3%
 
362 85 0.8%
 
363 85 0.8%
 
364 162 1.4%
 
365 291 2.6%
 

calculated_host_listings_count
Numeric

Distinct count 37
Unique (%) 0.3%
Missing (%) 0.0%
Missing (n) 0
Infinite (%) 0.0%
Infinite (n) 0
Mean 5.7684
Minimum 1
Maximum 85
Zeros (%) 0.0%

Quantile statistics

Minimum 1
5-th percentile 1
Q1 1
Median 1
Q3 4
95-th percentile 29
Maximum 85
Range 84
Interquartile range 3

Descriptive statistics

Standard deviation 12.646
Coef of variation 2.1923
Kurtosis 19.289
Mean 5.7684
MAD 6.7004
Skewness 4.1898
Sum 65044
Variance 159.92
Memory size 88.2 KiB
Value Count Frequency (%)  
1 6119 54.3%
 
2 1470 13.0%
 
3 774 6.9%
 
4 384 3.4%
 
5 310 2.7%
 
7 231 2.0%
 
6 210 1.9%
 
8 192 1.7%
 
9 126 1.1%
 
12 108 1.0%
 
Other values (27) 1352 12.0%
 

Minimum 5 values

Value Count Frequency (%)  
1 6119 54.3%
 
2 1470 13.0%
 
3 774 6.9%
 
4 384 3.4%
 
5 310 2.7%
 

Maximum 5 values

Value Count Frequency (%)  
51 51 0.5%
 
53 53 0.5%
 
56 56 0.5%
 
77 77 0.7%
 
85 85 0.8%
 

host_id
Numeric

Distinct count 7448
Unique (%) 66.1%
Missing (%) 0.0%
Missing (n) 0
Infinite (%) 0.0%
Infinite (n) 0
Mean 68312000
Minimum 2522
Maximum 241940896
Zeros (%) 0.0%

Quantile statistics

Minimum 2522
5-th percentile 1547100
Q1 10731000
Median 38457000
Q3 113230000
95-th percentile 214870000
Maximum 241940896
Range 241938374
Interquartile range 102500000

Descriptive statistics

Standard deviation 69127000
Coef of variation 1.0119
Kurtosis -0.29134
Mean 68312000
MAD 58264000
Skewness 0.96756
Sum 770289687845
Variance 4778500000000000
Memory size 88.2 KiB
Value Count Frequency (%)  
8632750 85 0.8%
 
2816192 77 0.7%
 
54441651 56 0.5%
 
5874520 53 0.5%
 
518644 51 0.5%
 
4331202 46 0.4%
 
1547126 35 0.3%
 
37769736 34 0.3%
 
162761604 31 0.3%
 
17712311 31 0.3%
 
Other values (7438) 10777 95.6%
 

Minimum 5 values

Value Count Frequency (%)  
2522 1 0.0%
 
5783 2 0.0%
 
19997 1 0.0%
 
22467 1 0.0%
 
45425 2 0.0%
 

Maximum 5 values

Value Count Frequency (%)  
241836874 1 0.0%
 
241838343 1 0.0%
 
241840731 1 0.0%
 
241846485 1 0.0%
 
241940896 1 0.0%
 

host_name
Categorical

Distinct count 2978
Unique (%) 26.4%
Missing (%) 0.0%
Missing (n) 3
Michael
 
175
Martin
 
174
Andreas
 
162
Other values (2974)
10762
Value Count Frequency (%)  
Michael 175 1.6%
 
Martin 174 1.5%
 
Andreas 162 1.4%
 
Anna 96 0.9%
 
Christian 88 0.8%
 
Florian 86 0.8%
 
Julia 80 0.7%
 
Stefan 80 0.7%
 
Peter 77 0.7%
 
Thomas 77 0.7%
 
Other values (2967) 10178 90.3%
 

id
Numeric

Distinct count 11276
Unique (%) 100.0%
Missing (%) 0.0%
Missing (n) 0
Infinite (%) 0.0%
Infinite (n) 0
Mean 17915000
Minimum 2309
Maximum 32241530
Zeros (%) 0.0%

Quantile statistics

Minimum 2309
5-th percentile 1443700
Q1 10101000
Median 19291000
Q3 25969000
95-th percentile 31198000
Maximum 32241530
Range 32239221
Interquartile range 15868000

Descriptive statistics

Standard deviation 9416100
Coef of variation 0.52559
Kurtosis -1.0707
Mean 17915000
MAD 8034500
Skewness -0.29325
Sum 202010479906
Variance 88662000000000
Memory size 88.2 KiB
Value Count Frequency (%)  
27478015 1 0.0%
 
31605802 1 0.0%
 
222614 1 0.0%
 
11996563 1 0.0%
 
673170 1 0.0%
 
531857 1 0.0%
 
26547600 1 0.0%
 
15445391 1 0.0%
 
23285541 1 0.0%
 
1722869 1 0.0%
 
Other values (11266) 11266 99.9%
 

Minimum 5 values

Value Count Frequency (%)  
2309 1 0.0%
 
15883 1 0.0%
 
38768 1 0.0%
 
40625 1 0.0%
 
51287 1 0.0%
 

Maximum 5 values

Value Count Frequency (%)  
32238075 1 0.0%
 
32238907 1 0.0%
 
32241000 1 0.0%
 
32241254 1 0.0%
 
32241530 1 0.0%
 

last_review
Categorical

Distinct count 1017
Unique (%) 9.0%
Missing (%) 16.7%
Missing (n) 1882
2019-01-02
 
482
2019-01-01
 
370
2019-01-03
 
308
Other values (1013)
8234
(Missing)
1882
Value Count Frequency (%)  
2019-01-02 482 4.3%
 
2019-01-01 370 3.3%
 
2019-01-03 308 2.7%
 
2019-02-03 288 2.6%
 
2019-01-04 243 2.2%
 
2019-01-06 223 2.0%
 
2019-01-20 222 2.0%
 
2019-01-27 208 1.8%
 
2019-01-05 183 1.6%
 
2019-02-04 137 1.2%
 
Other values (1006) 6730 59.7%
 
(Missing) 1882 16.7%
 

latitude
Numeric

Distinct count 11276
Unique (%) 100.0%
Missing (%) 0.0%
Missing (n) 0
Infinite (%) 0.0%
Infinite (n) 0
Mean 48.206
Minimum 48.126
Maximum 48.299
Zeros (%) 0.0%

Quantile statistics

Minimum 48.126
5-th percentile 48.177
Q1 48.193
Median 48.206
Q3 48.219
95-th percentile 48.237
Maximum 48.299
Range 0.17273
Interquartile range 0.026568

Descriptive statistics

Standard deviation 0.019361
Coef of variation 0.00040162
Kurtosis 0.89116
Mean 48.206
MAD 0.01537
Skewness 0.2306
Sum 543580
Variance 0.00037484
Memory size 88.2 KiB
Value Count Frequency (%)  
48.2336551073448 1 0.0%
 
48.2064998938564 1 0.0%
 
48.20861780925839 1 0.0%
 
48.2001944308107 1 0.0%
 
48.194082532857536 1 0.0%
 
48.23273813763047 1 0.0%
 
48.22726068198479 1 0.0%
 
48.20625615668543 1 0.0%
 
48.21057288277489 1 0.0%
 
48.17898367055073 1 0.0%
 
Other values (11266) 11266 99.9%
 

Minimum 5 values

Value Count Frequency (%)  
48.12584773082685 1 0.0%
 
48.127215541553575 1 0.0%
 
48.12802821076231 1 0.0%
 
48.12817621297169 1 0.0%
 
48.128312506896975 1 0.0%
 

Maximum 5 values

Value Count Frequency (%)  
48.288447052755615 1 0.0%
 
48.28892879656033 1 0.0%
 
48.29041636614624 1 0.0%
 
48.294597566084086 1 0.0%
 
48.298572734125635 1 0.0%
 

longitude
Numeric

Distinct count 11276
Unique (%) 100.0%
Missing (%) 0.0%
Missing (n) 0
Infinite (%) 0.0%
Infinite (n) 0
Mean 16.361
Minimum 16.191
Maximum 16.547
Zeros (%) 0.0%

Quantile statistics

Minimum 16.191
5-th percentile 16.312
Q1 16.341
Median 16.359
Q3 16.38
95-th percentile 16.41
Maximum 16.547
Range 0.35589
Interquartile range 0.038523

Descriptive statistics

Standard deviation 0.03411
Coef of variation 0.0020849
Kurtosis 4.7539
Mean 16.361
MAD 0.024842
Skewness 0.79622
Sum 184490
Variance 0.0011635
Memory size 88.2 KiB
Value Count Frequency (%)  
16.342352399882813 1 0.0%
 
16.352019635959465 1 0.0%
 
16.36066044451892 1 0.0%
 
16.397024752710358 1 0.0%
 
16.381250994038005 1 0.0%
 
16.323166397156534 1 0.0%
 
16.385078685702723 1 0.0%
 
16.349879422559475 1 0.0%
 
16.374395517305018 1 0.0%
 
16.38563390964956 1 0.0%
 
Other values (11266) 11266 99.9%
 

Minimum 5 values

Value Count Frequency (%)  
16.190897740928744 1 0.0%
 
16.194535902710175 1 0.0%
 
16.196278362407188 1 0.0%
 
16.204514395161752 1 0.0%
 
16.208645989414812 1 0.0%
 

Maximum 5 values

Value Count Frequency (%)  
16.54668250895395 1 0.0%
 
16.54672497637489 1 0.0%
 
16.546724979514845 1 0.0%
 
16.546741469300017 1 0.0%
 
16.546786535612462 1 0.0%
 

minimum_nights
Numeric

Distinct count 58
Unique (%) 0.5%
Missing (%) 0.0%
Missing (n) 0
Infinite (%) 0.0%
Infinite (n) 0
Mean 4.2371
Minimum 1
Maximum 1000
Zeros (%) 0.0%

Quantile statistics

Minimum 1
5-th percentile 1
Q1 1
Median 2
Q3 3
95-th percentile 13
Maximum 1000
Range 999
Interquartile range 2

Descriptive statistics

Standard deviation 17.972
Coef of variation 4.2416
Kurtosis 1723.9
Mean 4.2371
MAD 4.0564
Skewness 34.34
Sum 47777
Variance 322.98
Memory size 88.2 KiB
Value Count Frequency (%)  
1 3725 33.0%
 
2 3674 32.6%
 
3 2001 17.7%
 
4 497 4.4%
 
5 326 2.9%
 
7 257 2.3%
 
30 149 1.3%
 
6 120 1.1%
 
14 97 0.9%
 
10 69 0.6%
 
Other values (48) 361 3.2%
 

Minimum 5 values

Value Count Frequency (%)  
1 3725 33.0%
 
2 3674 32.6%
 
3 2001 17.7%
 
4 497 4.4%
 
5 326 2.9%
 

Maximum 5 values

Value Count Frequency (%)  
214 1 0.0%
 
300 2 0.0%
 
365 2 0.0%
 
999 1 0.0%
 
1000 1 0.0%
 

name
Categorical

Distinct count 11051
Unique (%) 98.0%
Missing (%) 0.2%
Missing (n) 18
Cozy apartment in the heart of Vienna
 
6
City Center Apartment
 
5
City Pension Stephansplatz
 
5
Other values (11047)
11242
(Missing)
 
18
Value Count Frequency (%)  
Cozy apartment in the heart of Vienna 6 0.1%
 
City Center Apartment 5 0.0%
 
City Pension Stephansplatz 5 0.0%
 
Wien Wohnung 5 0.0%
 
Central Private Room 5 0.0%
 
Adelin Pension und Zimmervermietung 5 0.0%
 
1 Bedroom Apartment with Balcony 4 0.0%
 
Charming Studio 4 0.0%
 
Vienna Dream Apartments 4 0.0%
 
Family Apartment 4 0.0%
 
Other values (11040) 11211 99.4%
 
(Missing) 18 0.2%
 

neighbourhood
Categorical

Distinct count 23
Unique (%) 0.2%
Missing (%) 0.0%
Missing (n) 0
Leopoldstadt
 
1223
Landstra§e
 
972
Alsergrund
 
772
Other values (20)
8309
Value Count Frequency (%)  
Leopoldstadt 1223 10.8%
 
Landstra§e 972 8.6%
 
Alsergrund 772 6.8%
 
Neubau 731 6.5%
 
Margareten 728 6.5%
 
Innere Stadt 714 6.3%
 
Rudolfsheim-FŸnfhaus 675 6.0%
 
Mariahilf 576 5.1%
 
Favoriten 554 4.9%
 
Wieden 548 4.9%
 
Other values (13) 3783 33.5%
 

neighbourhood_group
Constant

This variable is constant and should be ignored for analysis

Constant value

number_of_reviews
Numeric

Distinct count 324
Unique (%) 2.9%
Missing (%) 0.0%
Missing (n) 0
Infinite (%) 0.0%
Infinite (n) 0
Mean 28.989
Minimum 0
Maximum 514
Zeros (%) 16.7%

Quantile statistics

Minimum 0
5-th percentile 0
Q1 2
Median 9
Q3 32
95-th percentile 130
Maximum 514
Range 514
Interquartile range 30

Descriptive statistics

Standard deviation 51.326
Coef of variation 1.7705
Kurtosis 14.779
Mean 28.989
MAD 32.382
Skewness 3.3696
Sum 326885
Variance 2634.4
Memory size 88.2 KiB
Value Count Frequency (%)  
0 1882 16.7%
 
1 925 8.2%
 
2 658 5.8%
 
3 531 4.7%
 
4 418 3.7%
 
5 368 3.3%
 
7 314 2.8%
 
6 300 2.7%
 
9 237 2.1%
 
8 227 2.0%
 
Other values (314) 5416 48.0%
 

Minimum 5 values

Value Count Frequency (%)  
0 1882 16.7%
 
1 925 8.2%
 
2 658 5.8%
 
3 531 4.7%
 
4 418 3.7%
 

Maximum 5 values

Value Count Frequency (%)  
460 1 0.0%
 
463 1 0.0%
 
471 1 0.0%
 
501 1 0.0%
 
514 1 0.0%
 

price
Numeric

Distinct count 282
Unique (%) 2.5%
Missing (%) 0.0%
Missing (n) 0
Infinite (%) 0.0%
Infinite (n) 0
Mean 77.851
Minimum 9
Maximum 9270
Zeros (%) 0.0%

Quantile statistics

Minimum 9
5-th percentile 22
Q1 38
Median 56
Q3 85
95-th percentile 185
Maximum 9270
Range 9261
Interquartile range 47

Descriptive statistics

Standard deviation 134.4
Coef of variation 1.7263
Kurtosis 2193.3
Mean 77.851
MAD 45.459
Skewness 36.36
Sum 877844
Variance 18063
Memory size 88.2 KiB
Value Count Frequency (%)  
50 459 4.1%
 
40 417 3.7%
 
60 403 3.6%
 
30 398 3.5%
 
45 396 3.5%
 
35 392 3.5%
 
55 335 3.0%
 
80 306 2.7%
 
25 306 2.7%
 
65 301 2.7%
 
Other values (272) 7563 67.1%
 

Minimum 5 values

Value Count Frequency (%)  
9 1 0.0%
 
10 14 0.1%
 
11 6 0.1%
 
12 14 0.1%
 
13 11 0.1%
 

Maximum 5 values

Value Count Frequency (%)  
1250 1 0.0%
 
1510 1 0.0%
 
2500 1 0.0%
 
5500 1 0.0%
 
9270 1 0.0%
 

reviews_per_month
Numeric

Distinct count 834
Unique (%) 7.4%
Missing (%) 16.7%
Missing (n) 1882
Infinite (%) 0.0%
Infinite (n) 0
Mean 1.7579
Minimum 0.01
Maximum 14.23
Zeros (%) 0.0%

Quantile statistics

Minimum 0.01
5-th percentile 0.06
Q1 0.35
Median 1
Q3 2.59
95-th percentile 5.81
Maximum 14.23
Range 14.22
Interquartile range 2.24

Descriptive statistics

Standard deviation 1.9175
Coef of variation 1.0908
Kurtosis 2.8395
Mean 1.7579
MAD 1.4816
Skewness 1.6364
Sum 16513
Variance 3.6768
Memory size 88.2 KiB
Value Count Frequency (%)  
0.03 126 1.1%
 
1.0 120 1.1%
 
0.07 118 1.0%
 
0.05 115 1.0%
 
0.15 98 0.9%
 
0.06 95 0.8%
 
0.1 90 0.8%
 
0.16 87 0.8%
 
0.11 87 0.8%
 
0.02 79 0.7%
 
Other values (823) 8379 74.3%
 
(Missing) 1882 16.7%
 

Minimum 5 values

Value Count Frequency (%)  
0.01 1 0.0%
 
0.02 79 0.7%
 
0.03 126 1.1%
 
0.04 68 0.6%
 
0.05 115 1.0%
 

Maximum 5 values

Value Count Frequency (%)  
11.79 1 0.0%
 
12.15 1 0.0%
 
12.8 1 0.0%
 
14.0 1 0.0%
 
14.23 1 0.0%
 

room_type
Categorical

Distinct count 3
Unique (%) 0.0%
Missing (%) 0.0%
Missing (n) 0
Entire home/apt
8173
Private room
3005
Shared room
 
98
Value Count Frequency (%)  
Entire home/apt 8173 72.5%
 
Private room 3005 26.6%
 
Shared room 98 0.9%
 

Correlations

Sample

id name host_id host_name neighbourhood_group neighbourhood latitude longitude room_type price minimum_nights number_of_reviews last_review reviews_per_month calculated_host_listings_count availability_365
0 2309 Greenview Design Apartment 2522 Chris NaN Leopoldstadt 48.202814 16.404353 Entire home/apt 80 4 66 2018-06-03 0.62 1 338
1 15883 b&b near Old Danube river 62142 Eva NaN Donaustadt 48.241436 16.428118 Private room 85 1 9 2018-01-03 0.19 5 287
2 38768 central cityapartement- wifi- nice neighbourhood 166283 Hannes NaN Leopoldstadt 48.218225 16.379255 Entire home/apt 65 3 271 2019-01-05 2.82 2 129
3 40625 Near Palace Schönbrunn, Apt. 1 175131 Ingela NaN Rudolfsheim-FŸnfhaus 48.184862 16.327401 Entire home/apt 99 1 128 2019-01-07 1.23 13 291
4 51287 little studio- next to citycenter- wifi- nice ... 166283 Hannes NaN Leopoldstadt 48.218514 16.377810 Entire home/apt 60 3 243 2019-01-08 2.48 2 146
In [ ]: