Overview

Dataset statistics

Number of variables15
Number of observations69586
Missing cells0
Missing cells (%)0.0%
Duplicate rows10857
Duplicate rows (%)15.6%
Total size in memory8.0 MiB
Average record size in memory120.0 B

Variable types

Categorical14
DateTime1

Alerts

project_name has constant value ""Constant
Dataset has 10857 (15.6%) duplicate rowsDuplicates
country_code has a high cardinality: 75 distinct valuesHigh cardinality
installer_version has a high cardinality: 103 distinct valuesHigh cardinality
python_implementation_version has a high cardinality: 65 distinct valuesHigh cardinality
setuptools_version has a high cardinality: 125 distinct valuesHigh cardinality
sys_distro_version has a high cardinality: 84 distinct valuesHigh cardinality
country_code is highly overall correlated with python_implementation_name and 1 other fieldsHigh correlation
cpu is highly overall correlated with distribution_type and 6 other fieldsHigh correlation
distribution_type is highly overall correlated with cpu and 7 other fieldsHigh correlation
installer_name is highly overall correlated with distribution_type and 2 other fieldsHigh correlation
openssl_version is highly overall correlated with cpu and 6 other fieldsHigh correlation
package_version is highly overall correlated with openssl_version and 1 other fieldsHigh correlation
python_implementation_name is highly overall correlated with country_code and 8 other fieldsHigh correlation
python_implementation_version is highly overall correlated with cpu and 5 other fieldsHigh correlation
sys_distro_name is highly overall correlated with cpu and 4 other fieldsHigh correlation
sys_distro_version is highly overall correlated with cpu and 5 other fieldsHigh correlation
sys_name is highly overall correlated with country_code and 8 other fieldsHigh correlation
country_code is highly imbalanced (84.8%)Imbalance
cpu is highly imbalanced (82.1%)Imbalance
distribution_type is highly imbalanced (77.0%)Imbalance
installer_name is highly imbalanced (84.8%)Imbalance
installer_version is highly imbalanced (61.6%)Imbalance
openssl_version is highly imbalanced (55.3%)Imbalance
python_implementation_name is highly imbalanced (61.3%)Imbalance
python_implementation_version is highly imbalanced (56.7%)Imbalance
setuptools_version is highly imbalanced (61.6%)Imbalance
sys_distro_name is highly imbalanced (85.6%)Imbalance
sys_distro_version is highly imbalanced (70.5%)Imbalance
sys_name is highly imbalanced (75.1%)Imbalance

Reproduction

Analysis started2023-05-04 15:12:19.886254
Analysis finished2023-05-04 15:12:23.498501
Duration3.61 seconds
Software versionydata-profiling vv4.1.2
Download configurationconfig.json

Variables

country_code
Categorical

HIGH CARDINALITY  HIGH CORRELATION  IMBALANCE 

Distinct75
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size543.8 KiB
US
62399 
NL
 
1023
CN
 
821
None
 
537
JP
 
477
Other values (70)
 
4329

Length

Max length4
Median length2
Mean length2.0154341
Min length2

Characters and Unicode

Total characters140246
Distinct characters30
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique12 ?
Unique (%)< 0.1%

Sample

1st rowCN
2nd rowCN
3rd rowCA
4th rowUS
5th rowUS

Common Values

ValueCountFrequency (%)
US 62399
89.7%
NL 1023
 
1.5%
CN 821
 
1.2%
None 537
 
0.8%
JP 477
 
0.7%
TW 394
 
0.6%
RU 387
 
0.6%
FR 354
 
0.5%
GB 343
 
0.5%
DE 290
 
0.4%
Other values (65) 2561
 
3.7%

Length

2023-05-04T15:12:23.564585image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
us 62399
89.7%
nl 1023
 
1.5%
cn 821
 
1.2%
none 537
 
0.8%
jp 477
 
0.7%
tw 394
 
0.6%
ru 387
 
0.6%
fr 354
 
0.5%
gb 343
 
0.5%
de 290
 
0.4%
Other values (65) 2561
 
3.7%

Most occurring characters

ValueCountFrequency (%)
S 62927
44.9%
U 62890
44.8%
N 2581
 
1.8%
R 1174
 
0.8%
L 1153
 
0.8%
C 1110
 
0.8%
E 748
 
0.5%
T 647
 
0.5%
I 609
 
0.4%
K 599
 
0.4%
Other values (20) 5808
 
4.1%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 138633
98.8%
Lowercase Letter 1611
 
1.1%
Decimal Number 2
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
S 62927
45.4%
U 62890
45.4%
N 2581
 
1.9%
R 1174
 
0.8%
L 1153
 
0.8%
C 1110
 
0.8%
E 748
 
0.5%
T 647
 
0.5%
I 609
 
0.4%
K 599
 
0.4%
Other values (16) 4195
 
3.0%
Lowercase Letter
ValueCountFrequency (%)
o 537
33.3%
n 537
33.3%
e 537
33.3%
Decimal Number
ValueCountFrequency (%)
1 2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 140244
> 99.9%
Common 2
 
< 0.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
S 62927
44.9%
U 62890
44.8%
N 2581
 
1.8%
R 1174
 
0.8%
L 1153
 
0.8%
C 1110
 
0.8%
E 748
 
0.5%
T 647
 
0.5%
I 609
 
0.4%
K 599
 
0.4%
Other values (19) 5806
 
4.1%
Common
ValueCountFrequency (%)
1 2
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 140246
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
S 62927
44.9%
U 62890
44.8%
N 2581
 
1.8%
R 1174
 
0.8%
L 1153
 
0.8%
C 1110
 
0.8%
E 748
 
0.5%
T 647
 
0.5%
I 609
 
0.4%
K 599
 
0.4%
Other values (20) 5808
 
4.1%

cpu
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct6
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size543.8 KiB
x86_64
63728 
None
 
5266
AMD64
 
527
arm64
 
47
aarch64
 
13

Length

Max length7
Median length6
Mean length5.8405858
Min length4

Characters and Unicode

Total characters406423
Distinct characters20
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNone
2nd rowNone
3rd rowarm64
4th rowx86_64
5th rowx86_64

Common Values

ValueCountFrequency (%)
x86_64 63728
91.6%
None 5266
 
7.6%
AMD64 527
 
0.8%
arm64 47
 
0.1%
aarch64 13
 
< 0.1%
armv7l 5
 
< 0.1%

Length

2023-05-04T15:12:23.701169image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-05-04T15:12:23.845887image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
ValueCountFrequency (%)
x86_64 63728
91.6%
none 5266
 
7.6%
amd64 527
 
0.8%
arm64 47
 
0.1%
aarch64 13
 
< 0.1%
armv7l 5
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
6 128043
31.5%
4 64315
15.8%
x 63728
15.7%
8 63728
15.7%
_ 63728
15.7%
N 5266
 
1.3%
o 5266
 
1.3%
n 5266
 
1.3%
e 5266
 
1.3%
D 527
 
0.1%
Other values (10) 1290
 
0.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 256091
63.0%
Lowercase Letter 79757
 
19.6%
Connector Punctuation 63728
 
15.7%
Uppercase Letter 6847
 
1.7%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
x 63728
79.9%
o 5266
 
6.6%
n 5266
 
6.6%
e 5266
 
6.6%
a 78
 
0.1%
r 65
 
0.1%
m 52
 
0.1%
c 13
 
< 0.1%
h 13
 
< 0.1%
v 5
 
< 0.1%
Decimal Number
ValueCountFrequency (%)
6 128043
50.0%
4 64315
25.1%
8 63728
24.9%
7 5
 
< 0.1%
Uppercase Letter
ValueCountFrequency (%)
N 5266
76.9%
D 527
 
7.7%
M 527
 
7.7%
A 527
 
7.7%
Connector Punctuation
ValueCountFrequency (%)
_ 63728
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 319819
78.7%
Latin 86604
 
21.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
x 63728
73.6%
N 5266
 
6.1%
o 5266
 
6.1%
n 5266
 
6.1%
e 5266
 
6.1%
D 527
 
0.6%
M 527
 
0.6%
A 527
 
0.6%
a 78
 
0.1%
r 65
 
0.1%
Other values (5) 88
 
0.1%
Common
ValueCountFrequency (%)
6 128043
40.0%
4 64315
20.1%
8 63728
19.9%
_ 63728
19.9%
7 5
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 406423
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
6 128043
31.5%
4 64315
15.8%
x 63728
15.7%
8 63728
15.7%
_ 63728
15.7%
N 5266
 
1.3%
o 5266
 
1.3%
n 5266
 
1.3%
e 5266
 
1.3%
D 527
 
0.1%
Other values (10) 1290
 
0.3%

distribution_type
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size543.8 KiB
bdist_wheel
66994 
sdist
 
2592

Length

Max length11
Median length11
Mean length10.776507
Min length5

Characters and Unicode

Total characters749894
Distinct characters10
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowsdist
2nd rowsdist
3rd rowsdist
4th rowbdist_wheel
5th rowbdist_wheel

Common Values

ValueCountFrequency (%)
bdist_wheel 66994
96.3%
sdist 2592
 
3.7%

Length

2023-05-04T15:12:23.969265image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-05-04T15:12:24.098673image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
ValueCountFrequency (%)
bdist_wheel 66994
96.3%
sdist 2592
 
3.7%

Most occurring characters

ValueCountFrequency (%)
e 133988
17.9%
s 72178
9.6%
d 69586
9.3%
i 69586
9.3%
t 69586
9.3%
b 66994
8.9%
_ 66994
8.9%
w 66994
8.9%
h 66994
8.9%
l 66994
8.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 682900
91.1%
Connector Punctuation 66994
 
8.9%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 133988
19.6%
s 72178
10.6%
d 69586
10.2%
i 69586
10.2%
t 69586
10.2%
b 66994
9.8%
w 66994
9.8%
h 66994
9.8%
l 66994
9.8%
Connector Punctuation
ValueCountFrequency (%)
_ 66994
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 682900
91.1%
Common 66994
 
8.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 133988
19.6%
s 72178
10.6%
d 69586
10.2%
i 69586
10.2%
t 69586
10.2%
b 66994
9.8%
w 66994
9.8%
h 66994
9.8%
l 66994
9.8%
Common
ValueCountFrequency (%)
_ 66994
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 749894
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 133988
17.9%
s 72178
9.6%
d 69586
9.3%
i 69586
9.3%
t 69586
9.3%
b 66994
8.9%
_ 66994
8.9%
w 66994
8.9%
h 66994
8.9%
l 66994
8.9%

installer_name
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct9
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size543.8 KiB
pip
64320 
bandersnatch
 
4067
Browser
 
457
requests
 
399
None
 
213
Other values (4)
 
130

Length

Max length12
Median length3
Mean length3.5909378
Min length3

Characters and Unicode

Total characters249879
Distinct characters23
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowBrowser
2nd rowBrowser
3rd rowpip
4th rowpip
5th rowpip

Common Values

ValueCountFrequency (%)
pip 64320
92.4%
bandersnatch 4067
 
5.8%
Browser 457
 
0.7%
requests 399
 
0.6%
None 213
 
0.3%
conda 54
 
0.1%
setuptools 36
 
0.1%
Nexus 33
 
< 0.1%
Artifactory 7
 
< 0.1%

Length

2023-05-04T15:12:24.202797image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-05-04T15:12:24.349293image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
ValueCountFrequency (%)
pip 64320
92.4%
bandersnatch 4067
 
5.8%
browser 457
 
0.7%
requests 399
 
0.6%
none 213
 
0.3%
conda 54
 
0.1%
setuptools 36
 
0.1%
nexus 33
 
< 0.1%
artifactory 7
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
p 128676
51.5%
i 64327
25.7%
n 8401
 
3.4%
a 8195
 
3.3%
e 5604
 
2.2%
s 5427
 
2.2%
r 5394
 
2.2%
t 4552
 
1.8%
c 4128
 
1.7%
d 4121
 
1.6%
Other values (13) 11054
 
4.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 249169
99.7%
Uppercase Letter 710
 
0.3%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
p 128676
51.6%
i 64327
25.8%
n 8401
 
3.4%
a 8195
 
3.3%
e 5604
 
2.2%
s 5427
 
2.2%
r 5394
 
2.2%
t 4552
 
1.8%
c 4128
 
1.7%
d 4121
 
1.7%
Other values (10) 10344
 
4.2%
Uppercase Letter
ValueCountFrequency (%)
B 457
64.4%
N 246
34.6%
A 7
 
1.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 249879
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
p 128676
51.5%
i 64327
25.7%
n 8401
 
3.4%
a 8195
 
3.3%
e 5604
 
2.2%
s 5427
 
2.2%
r 5394
 
2.2%
t 4552
 
1.8%
c 4128
 
1.7%
d 4121
 
1.6%
Other values (13) 11054
 
4.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 249879
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
p 128676
51.5%
i 64327
25.7%
n 8401
 
3.4%
a 8195
 
3.3%
e 5604
 
2.2%
s 5427
 
2.2%
r 5394
 
2.2%
t 4552
 
1.8%
c 4128
 
1.7%
d 4121
 
1.6%
Other values (13) 11054
 
4.4%

installer_version
Categorical

HIGH CARDINALITY  IMBALANCE 

Distinct103
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size543.8 KiB
20.0.2
30678 
19.0.3
23095 
21.0.1
 
3036
21.1.3
 
1584
19.3.1
 
1189
Other values (98)
10004 

Length

Max length19
Median length6
Mean length5.91642
Min length3

Characters and Unicode

Total characters411700
Distinct characters21
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique11 ?
Unique (%)< 0.1%

Sample

1st rowNone
2nd rowNone
3rd row21.3.1
4th row20.1.1
5th row20.1.1

Common Values

ValueCountFrequency (%)
20.0.2 30678
44.1%
19.0.3 23095
33.2%
21.0.1 3036
 
4.4%
21.1.3 1584
 
2.3%
19.3.1 1189
 
1.7%
4.4.0 971
 
1.4%
21.2.4 937
 
1.3%
20.1.1 894
 
1.3%
5.0.0 837
 
1.2%
21.3.1 683
 
1.0%
Other values (93) 5682
 
8.2%

Length

2023-05-04T15:12:24.497167image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
20.0.2 30678
44.1%
19.0.3 23095
33.2%
21.0.1 3036
 
4.4%
21.1.3 1584
 
2.3%
19.3.1 1189
 
1.7%
4.4.0 971
 
1.4%
21.2.4 937
 
1.3%
20.1.1 894
 
1.3%
5.0.0 837
 
1.2%
21.3.1 683
 
1.0%
Other values (93) 5682
 
8.2%

Most occurring characters

ValueCountFrequency (%)
. 137678
33.4%
0 94165
22.9%
2 74188
18.0%
1 42722
 
10.4%
3 28913
 
7.0%
9 24604
 
6.0%
4 4200
 
1.0%
5 1844
 
0.4%
e 676
 
0.2%
o 672
 
0.2%
Other values (11) 2038
 
0.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 271283
65.9%
Other Punctuation 137678
33.4%
Lowercase Letter 2036
 
0.5%
Uppercase Letter 670
 
0.2%
Dash Punctuation 33
 
< 0.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 94165
34.7%
2 74188
27.3%
1 42722
15.7%
3 28913
 
10.7%
9 24604
 
9.1%
4 4200
 
1.5%
5 1844
 
0.7%
6 520
 
0.2%
7 79
 
< 0.1%
8 48
 
< 0.1%
Lowercase Letter
ValueCountFrequency (%)
e 676
33.2%
o 672
33.0%
n 670
32.9%
d 6
 
0.3%
v 6
 
0.3%
p 2
 
0.1%
s 2
 
0.1%
t 2
 
0.1%
Other Punctuation
ValueCountFrequency (%)
. 137678
100.0%
Uppercase Letter
ValueCountFrequency (%)
N 670
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 33
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 408994
99.3%
Latin 2706
 
0.7%

Most frequent character per script

Common
ValueCountFrequency (%)
. 137678
33.7%
0 94165
23.0%
2 74188
18.1%
1 42722
 
10.4%
3 28913
 
7.1%
9 24604
 
6.0%
4 4200
 
1.0%
5 1844
 
0.5%
6 520
 
0.1%
7 79
 
< 0.1%
Other values (2) 81
 
< 0.1%
Latin
ValueCountFrequency (%)
e 676
25.0%
o 672
24.8%
n 670
24.8%
N 670
24.8%
d 6
 
0.2%
v 6
 
0.2%
p 2
 
0.1%
s 2
 
0.1%
t 2
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 411700
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
. 137678
33.4%
0 94165
22.9%
2 74188
18.0%
1 42722
 
10.4%
3 28913
 
7.0%
9 24604
 
6.0%
4 4200
 
1.0%
5 1844
 
0.4%
e 676
 
0.2%
o 672
 
0.2%
Other values (11) 2038
 
0.5%

openssl_version
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct39
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size543.8 KiB
OpenSSL 1.1.1 11 Sep 2018
27793 
OpenSSL 1.0.2g 1 Mar 2016
22393 
None
5266 
OpenSSL 1.1.1g 21 Apr 2020
5071 
OpenSSL 1.1.1f 31 Mar 2020
3779 
Other values (34)
5284 

Length

Max length32
Median length26
Mean length24.545814
Min length4

Characters and Unicode

Total characters1708045
Distinct characters46
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique7 ?
Unique (%)< 0.1%

Sample

1st rowNone
2nd rowNone
3rd rowOpenSSL 1.1.1l 24 Aug 2021
4th rowOpenSSL 1.1.1l 24 Aug 2021
5th rowOpenSSL 1.1.1l 24 Aug 2021

Common Values

ValueCountFrequency (%)
OpenSSL 1.1.1 11 Sep 2018 27793
39.9%
OpenSSL 1.0.2g 1 Mar 2016 22393
32.2%
None 5266
 
7.6%
OpenSSL 1.1.1g 21 Apr 2020 5071
 
7.3%
OpenSSL 1.1.1f 31 Mar 2020 3779
 
5.4%
OpenSSL 1.1.1k 25 Mar 2021 1335
 
1.9%
OpenSSL 1.1.1l 24 Aug 2021 1075
 
1.5%
OpenSSL 1.1.1d 10 Sep 2019 808
 
1.2%
OpenSSL 1.1.1b 26 Feb 2019 691
 
1.0%
OpenSSL 1.0.2k-fips 26 Jan 2017 257
 
0.4%
Other values (29) 1118
 
1.6%

Length

2023-05-04T15:12:24.633041image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
openssl 64281
19.7%
sep 28844
8.8%
2018 27855
8.5%
1.1.1 27793
8.5%
11 27793
8.5%
mar 27773
8.5%
2016 22395
 
6.9%
1.0.2g 22393
 
6.9%
1 22393
 
6.9%
2020 9204
 
2.8%
Other values (65) 46027
14.1%

Most occurring characters

ValueCountFrequency (%)
321431
18.8%
1 290163
17.0%
S 157472
9.2%
. 128640
7.5%
2 108430
 
6.3%
e 99578
 
5.8%
p 98456
 
5.8%
0 97221
 
5.7%
n 70130
 
4.1%
L 64359
 
3.8%
Other values (36) 272165
15.9%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 556022
32.6%
Lowercase Letter 374874
21.9%
Uppercase Letter 326821
19.1%
Space Separator 321431
18.8%
Other Punctuation 128640
 
7.5%
Dash Punctuation 257
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 99578
26.6%
p 98456
26.3%
n 70130
18.7%
r 32885
 
8.8%
g 28551
 
7.6%
a 28100
 
7.5%
o 5307
 
1.4%
f 4068
 
1.1%
k 1602
 
0.4%
b 1584
 
0.4%
Other values (12) 4613
 
1.2%
Uppercase Letter
ValueCountFrequency (%)
S 157472
48.2%
L 64359
19.7%
O 64281
19.7%
M 27830
 
8.5%
A 6157
 
1.9%
N 5302
 
1.6%
F 856
 
0.3%
D 294
 
0.1%
J 266
 
0.1%
I 2
 
< 0.1%
Decimal Number
ValueCountFrequency (%)
1 290163
52.2%
2 108430
 
19.5%
0 97221
 
17.5%
8 28060
 
5.0%
6 23516
 
4.2%
3 3834
 
0.7%
5 1615
 
0.3%
9 1544
 
0.3%
4 1164
 
0.2%
7 475
 
0.1%
Space Separator
ValueCountFrequency (%)
321431
100.0%
Other Punctuation
ValueCountFrequency (%)
. 128640
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 257
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 1006350
58.9%
Latin 701695
41.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
S 157472
22.4%
e 99578
14.2%
p 98456
14.0%
n 70130
10.0%
L 64359
9.2%
O 64281
9.2%
r 32885
 
4.7%
g 28551
 
4.1%
a 28100
 
4.0%
M 27830
 
4.0%
Other values (23) 30053
 
4.3%
Common
ValueCountFrequency (%)
321431
31.9%
1 290163
28.8%
. 128640
12.8%
2 108430
 
10.8%
0 97221
 
9.7%
8 28060
 
2.8%
6 23516
 
2.3%
3 3834
 
0.4%
5 1615
 
0.2%
9 1544
 
0.2%
Other values (3) 1896
 
0.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1708045
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
321431
18.8%
1 290163
17.0%
S 157472
9.2%
. 128640
7.5%
2 108430
 
6.3%
e 99578
 
5.8%
p 98456
 
5.8%
0 97221
 
5.7%
n 70130
 
4.1%
L 64359
 
3.8%
Other values (36) 272165
15.9%

package_version
Categorical

Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size543.8 KiB
1.2.2
47174 
1.2.3
15050 
1.1.2
5585 
1.2.1
 
1777

Length

Max length5
Median length5
Mean length5
Min length5

Characters and Unicode

Total characters347930
Distinct characters4
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1.2.2
2nd row1.2.2
3rd row1.2.2
4th row1.1.2
5th row1.1.2

Common Values

ValueCountFrequency (%)
1.2.2 47174
67.8%
1.2.3 15050
 
21.6%
1.1.2 5585
 
8.0%
1.2.1 1777
 
2.6%

Length

2023-05-04T15:12:24.755230image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-05-04T15:12:24.884053image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
ValueCountFrequency (%)
1.2.2 47174
67.8%
1.2.3 15050
 
21.6%
1.1.2 5585
 
8.0%
1.2.1 1777
 
2.6%

Most occurring characters

ValueCountFrequency (%)
. 139172
40.0%
2 116760
33.6%
1 76948
22.1%
3 15050
 
4.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 208758
60.0%
Other Punctuation 139172
40.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
2 116760
55.9%
1 76948
36.9%
3 15050
 
7.2%
Other Punctuation
ValueCountFrequency (%)
. 139172
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 347930
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
. 139172
40.0%
2 116760
33.6%
1 76948
22.1%
3 15050
 
4.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 347930
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
. 139172
40.0%
2 116760
33.6%
1 76948
22.1%
3 15050
 
4.3%

project_name
Categorical

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size543.8 KiB
scikit-mobility
69586 

Length

Max length15
Median length15
Mean length15
Min length15

Characters and Unicode

Total characters1043790
Distinct characters11
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowscikit-mobility
2nd rowscikit-mobility
3rd rowscikit-mobility
4th rowscikit-mobility
5th rowscikit-mobility

Common Values

ValueCountFrequency (%)
scikit-mobility 69586
100.0%

Length

2023-05-04T15:12:25.261453image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-05-04T15:12:25.378974image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
ValueCountFrequency (%)
scikit-mobility 69586
100.0%

Most occurring characters

ValueCountFrequency (%)
i 278344
26.7%
t 139172
13.3%
s 69586
 
6.7%
c 69586
 
6.7%
k 69586
 
6.7%
- 69586
 
6.7%
m 69586
 
6.7%
o 69586
 
6.7%
b 69586
 
6.7%
l 69586
 
6.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 974204
93.3%
Dash Punctuation 69586
 
6.7%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
i 278344
28.6%
t 139172
14.3%
s 69586
 
7.1%
c 69586
 
7.1%
k 69586
 
7.1%
m 69586
 
7.1%
o 69586
 
7.1%
b 69586
 
7.1%
l 69586
 
7.1%
y 69586
 
7.1%
Dash Punctuation
ValueCountFrequency (%)
- 69586
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 974204
93.3%
Common 69586
 
6.7%

Most frequent character per script

Latin
ValueCountFrequency (%)
i 278344
28.6%
t 139172
14.3%
s 69586
 
7.1%
c 69586
 
7.1%
k 69586
 
7.1%
m 69586
 
7.1%
o 69586
 
7.1%
b 69586
 
7.1%
l 69586
 
7.1%
y 69586
 
7.1%
Common
ValueCountFrequency (%)
- 69586
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1043790
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
i 278344
26.7%
t 139172
13.3%
s 69586
 
6.7%
c 69586
 
6.7%
k 69586
 
6.7%
- 69586
 
6.7%
m 69586
 
6.7%
o 69586
 
6.7%
b 69586
 
6.7%
l 69586
 
6.7%

python_implementation_name
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size543.8 KiB
CPython
64320 
None
 
5266

Length

Max length7
Median length7
Mean length6.7729716
Min length4

Characters and Unicode

Total characters471304
Distinct characters9
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNone
2nd rowNone
3rd rowCPython
4th rowCPython
5th rowCPython

Common Values

ValueCountFrequency (%)
CPython 64320
92.4%
None 5266
 
7.6%

Length

2023-05-04T15:12:25.480060image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-05-04T15:12:25.606853image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
ValueCountFrequency (%)
cpython 64320
92.4%
none 5266
 
7.6%

Most occurring characters

ValueCountFrequency (%)
o 69586
14.8%
n 69586
14.8%
C 64320
13.6%
P 64320
13.6%
y 64320
13.6%
t 64320
13.6%
h 64320
13.6%
N 5266
 
1.1%
e 5266
 
1.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 337398
71.6%
Uppercase Letter 133906
 
28.4%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
o 69586
20.6%
n 69586
20.6%
y 64320
19.1%
t 64320
19.1%
h 64320
19.1%
e 5266
 
1.6%
Uppercase Letter
ValueCountFrequency (%)
C 64320
48.0%
P 64320
48.0%
N 5266
 
3.9%

Most occurring scripts

ValueCountFrequency (%)
Latin 471304
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
o 69586
14.8%
n 69586
14.8%
C 64320
13.6%
P 64320
13.6%
y 64320
13.6%
t 64320
13.6%
h 64320
13.6%
N 5266
 
1.1%
e 5266
 
1.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 471304
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
o 69586
14.8%
n 69586
14.8%
C 64320
13.6%
P 64320
13.6%
y 64320
13.6%
t 64320
13.6%
h 64320
13.6%
N 5266
 
1.1%
e 5266
 
1.1%

python_implementation_version
Categorical

HIGH CARDINALITY  HIGH CORRELATION  IMBALANCE 

Distinct65
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size543.8 KiB
3.7.5
24954 
3.7.3
23169 
3.7.6
5532 
None
5266 
3.8.10
3491 
Other values (60)
7174 

Length

Max length8
Median length5
Mean length5.0405541
Min length4

Characters and Unicode

Total characters350752
Distinct characters19
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique6 ?
Unique (%)< 0.1%

Sample

1st rowNone
2nd rowNone
3rd row3.9.7
4th row3.7.10
5th row3.7.10

Common Values

ValueCountFrequency (%)
3.7.5 24954
35.9%
3.7.3 23169
33.3%
3.7.6 5532
 
7.9%
None 5266
 
7.6%
3.8.10 3491
 
5.0%
3.7.10 1788
 
2.6%
3.7.12 1013
 
1.5%
3.8.12 660
 
0.9%
3.7.11 438
 
0.6%
3.8.5 435
 
0.6%
Other values (55) 2840
 
4.1%

Length

2023-05-04T15:12:25.717845image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
3.7.5 24954
35.9%
3.7.3 23169
33.3%
3.7.6 5532
 
7.9%
none 5266
 
7.6%
3.8.10 3491
 
5.0%
3.7.10 1788
 
2.6%
3.7.12 1013
 
1.5%
3.8.12 660
 
0.9%
3.7.11 438
 
0.6%
3.8.5 435
 
0.6%
Other values (54) 2840
 
4.1%

Most occurring characters

ValueCountFrequency (%)
. 128640
36.7%
3 87761
25.0%
7 58129
16.6%
5 25513
 
7.3%
1 8857
 
2.5%
6 6226
 
1.8%
8 5756
 
1.6%
0 5529
 
1.6%
n 5266
 
1.5%
e 5266
 
1.5%
Other values (9) 13809
 
3.9%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 201038
57.3%
Other Punctuation 128640
36.7%
Lowercase Letter 15807
 
4.5%
Uppercase Letter 5266
 
1.5%
Math Symbol 1
 
< 0.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
3 87761
43.7%
7 58129
28.9%
5 25513
 
12.7%
1 8857
 
4.4%
6 6226
 
3.1%
8 5756
 
2.9%
0 5529
 
2.8%
2 1965
 
1.0%
9 1122
 
0.6%
4 180
 
0.1%
Lowercase Letter
ValueCountFrequency (%)
n 5266
33.3%
e 5266
33.3%
o 5266
33.3%
r 4
 
< 0.1%
c 4
 
< 0.1%
b 1
 
< 0.1%
Other Punctuation
ValueCountFrequency (%)
. 128640
100.0%
Uppercase Letter
ValueCountFrequency (%)
N 5266
100.0%
Math Symbol
ValueCountFrequency (%)
+ 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 329679
94.0%
Latin 21073
 
6.0%

Most frequent character per script

Common
ValueCountFrequency (%)
. 128640
39.0%
3 87761
26.6%
7 58129
17.6%
5 25513
 
7.7%
1 8857
 
2.7%
6 6226
 
1.9%
8 5756
 
1.7%
0 5529
 
1.7%
2 1965
 
0.6%
9 1122
 
0.3%
Other values (2) 181
 
0.1%
Latin
ValueCountFrequency (%)
n 5266
25.0%
e 5266
25.0%
o 5266
25.0%
N 5266
25.0%
r 4
 
< 0.1%
c 4
 
< 0.1%
b 1
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 350752
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
. 128640
36.7%
3 87761
25.0%
7 58129
16.6%
5 25513
 
7.3%
1 8857
 
2.5%
6 6226
 
1.8%
8 5756
 
1.6%
0 5529
 
1.6%
n 5266
 
1.5%
e 5266
 
1.5%
Other values (9) 13809
 
3.9%

setuptools_version
Categorical

HIGH CARDINALITY  IMBALANCE 

Distinct125
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size543.8 KiB
45.2.0
25440 
40.8.0
23102 
None
5458 
45.2.0.post20200210
5210 
52.0.0
2701 
Other values (120)
7675 

Length

Max length19
Median length6
Mean length7.0954215
Min length4

Characters and Unicode

Total characters493742
Distinct characters18
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique22 ?
Unique (%)< 0.1%

Sample

1st rowNone
2nd rowNone
3rd row57.4.0
4th row47.3.1.post20200616
5th row47.3.1.post20200616

Common Values

ValueCountFrequency (%)
45.2.0 25440
36.6%
40.8.0 23102
33.2%
None 5458
 
7.8%
45.2.0.post20200210 5210
 
7.5%
52.0.0 2701
 
3.9%
57.4.0 1780
 
2.6%
56.0.0 565
 
0.8%
47.3.1.post20200616 561
 
0.8%
57.0.0 458
 
0.7%
49.6.0.post20210108 431
 
0.6%
Other values (115) 3880
 
5.6%

Length

2023-05-04T15:12:25.854744image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
45.2.0 25440
36.6%
40.8.0 23102
33.2%
none 5458
 
7.8%
45.2.0.post20200210 5210
 
7.5%
52.0.0 2701
 
3.9%
57.4.0 1780
 
2.6%
56.0.0 565
 
0.8%
47.3.1.post20200616 561
 
0.8%
57.0.0 458
 
0.7%
49.6.0.post20210108 431
 
0.6%
Other values (115) 3880
 
5.6%

Most occurring characters

ValueCountFrequency (%)
. 134959
27.3%
0 115569
23.4%
4 58457
11.8%
2 53689
 
10.9%
5 39273
 
8.0%
8 24159
 
4.9%
o 12161
 
2.5%
1 10018
 
2.0%
s 6703
 
1.4%
t 6703
 
1.4%
Other values (8) 32051
 
6.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 310139
62.8%
Other Punctuation 134959
27.3%
Lowercase Letter 43186
 
8.7%
Uppercase Letter 5458
 
1.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 115569
37.3%
4 58457
18.8%
2 53689
17.3%
5 39273
 
12.7%
8 24159
 
7.8%
1 10018
 
3.2%
7 3942
 
1.3%
6 2726
 
0.9%
3 1419
 
0.5%
9 887
 
0.3%
Lowercase Letter
ValueCountFrequency (%)
o 12161
28.2%
s 6703
15.5%
t 6703
15.5%
p 6703
15.5%
e 5458
12.6%
n 5458
12.6%
Other Punctuation
ValueCountFrequency (%)
. 134959
100.0%
Uppercase Letter
ValueCountFrequency (%)
N 5458
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 445098
90.1%
Latin 48644
 
9.9%

Most frequent character per script

Common
ValueCountFrequency (%)
. 134959
30.3%
0 115569
26.0%
4 58457
13.1%
2 53689
 
12.1%
5 39273
 
8.8%
8 24159
 
5.4%
1 10018
 
2.3%
7 3942
 
0.9%
6 2726
 
0.6%
3 1419
 
0.3%
Latin
ValueCountFrequency (%)
o 12161
25.0%
s 6703
13.8%
t 6703
13.8%
p 6703
13.8%
e 5458
11.2%
n 5458
11.2%
N 5458
11.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 493742
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
. 134959
27.3%
0 115569
23.4%
4 58457
11.8%
2 53689
 
10.9%
5 39273
 
8.0%
8 24159
 
4.9%
o 12161
 
2.5%
1 10018
 
2.0%
s 6703
 
1.4%
t 6703
 
1.4%
Other values (8) 32051
 
6.5%

sys_distro_name
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct29
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size543.8 KiB
Ubuntu
61351 
None
 
5794
Debian GNU/Linux
 
1460
macOS
 
382
CentOS Linux
 
177
Other values (24)
 
422

Length

Max length31
Median length6
Mean length6.1055528
Min length4

Characters and Unicode

Total characters424861
Distinct characters48
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique8 ?
Unique (%)< 0.1%

Sample

1st rowNone
2nd rowNone
3rd rowmacOS
4th rowUbuntu
5th rowUbuntu

Common Values

ValueCountFrequency (%)
Ubuntu 61351
88.2%
None 5794
 
8.3%
Debian GNU/Linux 1460
 
2.1%
macOS 382
 
0.5%
CentOS Linux 177
 
0.3%
Amazon Linux 165
 
0.2%
Amazon Linux AMI 113
 
0.2%
Red Hat Enterprise Linux 44
 
0.1%
Manjaro Linux 16
 
< 0.1%
Arch Linux 15
 
< 0.1%
Other values (19) 69
 
0.1%

Length

2023-05-04T15:12:25.994447image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
ubuntu 61351
85.4%
none 5794
 
8.1%
gnu/linux 1474
 
2.1%
debian 1460
 
2.0%
linux 552
 
0.8%
macos 382
 
0.5%
amazon 278
 
0.4%
centos 177
 
0.2%
ami 113
 
0.2%
enterprise 55
 
0.1%
Other values (27) 224
 
0.3%

Most occurring characters

ValueCountFrequency (%)
u 124733
29.4%
n 71203
16.8%
U 62828
14.8%
b 62816
14.8%
t 61670
14.5%
e 7664
 
1.8%
N 7268
 
1.7%
o 6119
 
1.4%
i 3573
 
0.8%
2274
 
0.5%
Other values (38) 14713
 
3.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 343824
80.9%
Uppercase Letter 77276
 
18.2%
Space Separator 2274
 
0.5%
Other Punctuation 1481
 
0.3%
Connector Punctuation 6
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
u 124733
36.3%
n 71203
20.7%
b 62816
18.3%
t 61670
17.9%
e 7664
 
2.2%
o 6119
 
1.8%
i 3573
 
1.0%
a 2233
 
0.6%
x 2031
 
0.6%
m 661
 
0.2%
Other values (13) 1121
 
0.3%
Uppercase Letter
ValueCountFrequency (%)
U 62828
81.3%
N 7268
 
9.4%
L 2030
 
2.6%
G 1484
 
1.9%
D 1465
 
1.9%
S 591
 
0.8%
O 570
 
0.7%
A 408
 
0.5%
C 177
 
0.2%
M 139
 
0.2%
Other values (10) 316
 
0.4%
Other Punctuation
ValueCountFrequency (%)
/ 1474
99.5%
! 6
 
0.4%
. 1
 
0.1%
Space Separator
ValueCountFrequency (%)
2274
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 6
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 421100
99.1%
Common 3761
 
0.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
u 124733
29.6%
n 71203
16.9%
U 62828
14.9%
b 62816
14.9%
t 61670
14.6%
e 7664
 
1.8%
N 7268
 
1.7%
o 6119
 
1.5%
i 3573
 
0.8%
a 2233
 
0.5%
Other values (33) 10993
 
2.6%
Common
ValueCountFrequency (%)
2274
60.5%
/ 1474
39.2%
_ 6
 
0.2%
! 6
 
0.2%
. 1
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 424861
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
u 124733
29.4%
n 71203
16.8%
U 62828
14.8%
b 62816
14.8%
t 61670
14.5%
e 7664
 
1.8%
N 7268
 
1.7%
o 6119
 
1.4%
i 3573
 
0.8%
2274
 
0.5%
Other values (38) 14713
 
3.5%

sys_distro_version
Categorical

HIGH CARDINALITY  HIGH CORRELATION  IMBALANCE 

Distinct84
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size543.8 KiB
18.04
33415 
16.04
23087 
None
5797 
20.04
4834 
10
 
1019
Other values (79)
 
1434

Length

Max length8
Median length5
Mean length4.8383152
Min length1

Characters and Unicode

Total characters336679
Distinct characters21
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique32 ?
Unique (%)< 0.1%

Sample

1st rowNone
2nd rowNone
3rd row12.0.1
4th row20.04
5th row20.04

Common Values

ValueCountFrequency (%)
18.04 33415
48.0%
16.04 23087
33.2%
None 5797
 
8.3%
20.04 4834
 
6.9%
10 1019
 
1.5%
11 395
 
0.6%
7 176
 
0.3%
2 165
 
0.2%
10.16 161
 
0.2%
2018.03 113
 
0.2%
Other values (74) 424
 
0.6%

Length

2023-05-04T15:12:26.132147image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
18.04 33415
48.0%
16.04 23087
33.2%
none 5797
 
8.3%
20.04 4834
 
6.9%
10 1019
 
1.5%
11 395
 
0.6%
7 176
 
0.3%
2 165
 
0.2%
10.16 161
 
0.2%
2018.03 113
 
0.2%
Other values (74) 424
 
0.6%

Most occurring characters

ValueCountFrequency (%)
0 67774
20.1%
. 62155
18.5%
4 61455
18.3%
1 59269
17.6%
8 33576
10.0%
6 23315
 
6.9%
n 5810
 
1.7%
o 5809
 
1.7%
e 5798
 
1.7%
N 5797
 
1.7%
Other values (11) 5921
 
1.8%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 251245
74.6%
Other Punctuation 62155
 
18.5%
Lowercase Letter 17482
 
5.2%
Uppercase Letter 5797
 
1.7%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 67774
27.0%
4 61455
24.5%
1 59269
23.6%
8 33576
13.4%
6 23315
 
9.3%
2 5294
 
2.1%
7 259
 
0.1%
3 144
 
0.1%
5 98
 
< 0.1%
9 61
 
< 0.1%
Lowercase Letter
ValueCountFrequency (%)
n 5810
33.2%
o 5809
33.2%
e 5798
33.2%
l 24
 
0.1%
i 13
 
0.1%
g 13
 
0.1%
r 12
 
0.1%
t 2
 
< 0.1%
s 1
 
< 0.1%
Other Punctuation
ValueCountFrequency (%)
. 62155
100.0%
Uppercase Letter
ValueCountFrequency (%)
N 5797
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 313400
93.1%
Latin 23279
 
6.9%

Most frequent character per script

Common
ValueCountFrequency (%)
0 67774
21.6%
. 62155
19.8%
4 61455
19.6%
1 59269
18.9%
8 33576
10.7%
6 23315
 
7.4%
2 5294
 
1.7%
7 259
 
0.1%
3 144
 
< 0.1%
5 98
 
< 0.1%
Latin
ValueCountFrequency (%)
n 5810
25.0%
o 5809
25.0%
e 5798
24.9%
N 5797
24.9%
l 24
 
0.1%
i 13
 
0.1%
g 13
 
0.1%
r 12
 
0.1%
t 2
 
< 0.1%
s 1
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 336679
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 67774
20.1%
. 62155
18.5%
4 61455
18.3%
1 59269
17.6%
8 33576
10.0%
6 23315
 
6.9%
n 5810
 
1.7%
o 5809
 
1.7%
e 5798
 
1.7%
N 5797
 
1.7%
Other values (11) 5921
 
1.8%

sys_name
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size543.8 KiB
Linux
63411 
None
 
5266
Windows
 
527
Darwin
 
382

Length

Max length7
Median length5
Mean length4.9449602
Min length4

Characters and Unicode

Total characters344100
Distinct characters15
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNone
2nd rowNone
3rd rowDarwin
4th rowLinux
5th rowLinux

Common Values

ValueCountFrequency (%)
Linux 63411
91.1%
None 5266
 
7.6%
Windows 527
 
0.8%
Darwin 382
 
0.5%

Length

2023-05-04T15:12:26.274054image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-05-04T15:12:26.410087image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
ValueCountFrequency (%)
linux 63411
91.1%
none 5266
 
7.6%
windows 527
 
0.8%
darwin 382
 
0.5%

Most occurring characters

ValueCountFrequency (%)
n 69586
20.2%
i 64320
18.7%
L 63411
18.4%
u 63411
18.4%
x 63411
18.4%
o 5793
 
1.7%
N 5266
 
1.5%
e 5266
 
1.5%
w 909
 
0.3%
W 527
 
0.2%
Other values (5) 2200
 
0.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 274514
79.8%
Uppercase Letter 69586
 
20.2%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
n 69586
25.3%
i 64320
23.4%
u 63411
23.1%
x 63411
23.1%
o 5793
 
2.1%
e 5266
 
1.9%
w 909
 
0.3%
d 527
 
0.2%
s 527
 
0.2%
a 382
 
0.1%
Uppercase Letter
ValueCountFrequency (%)
L 63411
91.1%
N 5266
 
7.6%
W 527
 
0.8%
D 382
 
0.5%

Most occurring scripts

ValueCountFrequency (%)
Latin 344100
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
n 69586
20.2%
i 64320
18.7%
L 63411
18.4%
u 63411
18.4%
x 63411
18.4%
o 5793
 
1.7%
N 5266
 
1.5%
e 5266
 
1.5%
w 909
 
0.3%
W 527
 
0.2%
Other values (5) 2200
 
0.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII 344100
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
n 69586
20.2%
i 64320
18.7%
L 63411
18.4%
u 63411
18.4%
x 63411
18.4%
o 5793
 
1.7%
N 5266
 
1.5%
e 5266
 
1.5%
w 909
 
0.3%
W 527
 
0.2%
Other values (5) 2200
 
0.6%
Distinct46713
Distinct (%)67.1%
Missing0
Missing (%)0.0%
Memory size543.8 KiB
Minimum2020-11-18 19:00:03
Maximum2022-04-18 22:03:10
2023-05-04T15:12:26.540984image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-05-04T15:12:26.694559image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

Correlations

2023-05-04T15:12:26.818868image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
country_codecpudistribution_typeinstaller_nameopenssl_versionpackage_versionpython_implementation_namepython_implementation_versionsys_distro_namesys_distro_versionsys_name
country_code1.0000.3760.4770.3150.2560.2420.6910.2190.2800.2030.502
cpu0.3761.0000.6810.4470.5330.2891.0000.5580.6510.5830.841
distribution_type0.4770.6811.0000.6900.6810.3480.6800.6820.6470.6480.680
installer_name0.3150.4470.6901.0000.3530.2941.0000.3520.3350.3340.577
openssl_version0.2560.5330.6810.3531.0000.5031.0000.5980.3820.5000.682
package_version0.2420.2890.3480.2940.5031.0000.4980.5310.2880.4040.289
python_implementation_name0.6911.0000.6801.0001.0000.4981.0001.0000.9490.9491.000
python_implementation_version0.2190.5580.6820.3520.5980.5311.0001.0000.3640.3620.690
sys_distro_name0.2800.6510.6470.3350.3820.2880.9490.3641.0000.9130.816
sys_distro_version0.2030.5830.6480.3340.5000.4040.9490.3620.9131.0000.815
sys_name0.5020.8410.6800.5770.6820.2891.0000.6900.8160.8151.000

Missing values

2023-05-04T15:12:22.774324image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
A simple visualization of nullity by column.
2023-05-04T15:12:23.220017image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

country_codecpudistribution_typeinstaller_nameinstaller_versionopenssl_versionpackage_versionproject_namepython_implementation_namepython_implementation_versionsetuptools_versionsys_distro_namesys_distro_versionsys_nametimestamp
0CNNonesdistBrowserNoneNone1.2.2scikit-mobilityNoneNoneNoneNoneNoneNone2021-11-11 03:01:34
1CNNonesdistBrowserNoneNone1.2.2scikit-mobilityNoneNoneNoneNoneNoneNone2021-11-11 08:25:11
2CAarm64sdistpip21.3.1OpenSSL 1.1.1l 24 Aug 20211.2.2scikit-mobilityCPython3.9.757.4.0macOS12.0.1Darwin2021-11-11 20:10:37
3USx86_64bdist_wheelpip20.1.1OpenSSL 1.1.1l 24 Aug 20211.1.2scikit-mobilityCPython3.7.1047.3.1.post20200616Ubuntu20.04Linux2021-11-11 15:08:19
4USx86_64bdist_wheelpip20.1.1OpenSSL 1.1.1l 24 Aug 20211.1.2scikit-mobilityCPython3.7.1047.3.1.post20200616Ubuntu20.04Linux2021-11-11 14:57:51
5USx86_64bdist_wheelpip20.1.1OpenSSL 1.1.1l 24 Aug 20211.1.2scikit-mobilityCPython3.7.1047.3.1.post20200616Ubuntu20.04Linux2021-11-11 13:12:56
6USx86_64bdist_wheelpip20.1.1OpenSSL 1.1.1l 24 Aug 20211.1.2scikit-mobilityCPython3.7.1047.3.1.post20200616Ubuntu20.04Linux2021-11-11 13:00:43
7USx86_64bdist_wheelpip20.1.1OpenSSL 1.1.1l 24 Aug 20211.1.2scikit-mobilityCPython3.7.1047.3.1.post20200616Ubuntu20.04Linux2021-11-11 09:57:55
8USx86_64bdist_wheelpip20.1.1OpenSSL 1.1.1l 24 Aug 20211.1.2scikit-mobilityCPython3.7.1047.3.1.post20200616Ubuntu20.04Linux2021-11-11 17:02:30
9USx86_64bdist_wheelpip20.0.2OpenSSL 1.1.1l 24 Aug 20211.1.2scikit-mobilityCPython3.7.1045.2.0.post20200210Debian GNU/Linux10Linux2021-11-11 00:17:38
country_codecpudistribution_typeinstaller_nameinstaller_versionopenssl_versionpackage_versionproject_namepython_implementation_namepython_implementation_versionsetuptools_versionsys_distro_namesys_distro_versionsys_nametimestamp
69576USx86_64bdist_wheelpip21.0.1OpenSSL 1.1.1f 31 Mar 20201.2.2scikit-mobilityCPython3.8.1052.0.0Ubuntu20.04Linux2021-11-19 18:09:52
69577USx86_64bdist_wheelpip21.0.1OpenSSL 1.1.1f 31 Mar 20201.2.2scikit-mobilityCPython3.8.1052.0.0Ubuntu20.04Linux2021-11-19 18:27:24
69578USx86_64bdist_wheelpip21.0.1OpenSSL 1.1.1f 31 Mar 20201.2.2scikit-mobilityCPython3.8.1052.0.0Ubuntu20.04Linux2021-11-19 18:19:14
69579USx86_64bdist_wheelpip21.0.1OpenSSL 1.1.1f 31 Mar 20201.2.2scikit-mobilityCPython3.8.1052.0.0Ubuntu20.04Linux2021-11-19 18:27:01
69580USx86_64bdist_wheelpip21.0.1OpenSSL 1.1.1f 31 Mar 20201.2.2scikit-mobilityCPython3.8.1052.0.0Ubuntu20.04Linux2021-11-19 18:44:33
69581USx86_64bdist_wheelpip21.0.1OpenSSL 1.1.1f 31 Mar 20201.2.2scikit-mobilityCPython3.8.1052.0.0Ubuntu20.04Linux2021-11-19 18:21:47
69582USx86_64bdist_wheelpip21.0.1OpenSSL 1.1.1f 31 Mar 20201.2.2scikit-mobilityCPython3.8.1052.0.0Ubuntu20.04Linux2021-11-19 00:02:51
69583USx86_64bdist_wheelpip21.0.1OpenSSL 1.1.1f 31 Mar 20201.2.2scikit-mobilityCPython3.8.1052.0.0Ubuntu20.04Linux2021-11-19 00:02:44
69584USx86_64bdist_wheelpip21.1.3OpenSSL 1.1.1 11 Sep 20181.2.2scikit-mobilityCPython3.7.1257.4.0Ubuntu18.04Linux2021-11-19 05:32:02
69585ITx86_64bdist_wheelpip21.3.1OpenSSL 1.1.1k 25 Mar 20211.2.2scikit-mobilityCPython3.8.1257.5.0Debian GNU/Linux11Linux2021-11-19 10:27:53

Duplicate rows

Most frequently occurring

country_codecpudistribution_typeinstaller_nameinstaller_versionopenssl_versionpackage_versionproject_namepython_implementation_namepython_implementation_versionsetuptools_versionsys_distro_namesys_distro_versionsys_nametimestamp# duplicates
433USx86_64bdist_wheelpip19.0.3OpenSSL 1.0.2g 1 Mar 20161.2.2scikit-mobilityCPython3.7.340.8.0Ubuntu16.04Linux2021-05-03 14:52:5415
434USx86_64bdist_wheelpip19.0.3OpenSSL 1.0.2g 1 Mar 20161.2.2scikit-mobilityCPython3.7.340.8.0Ubuntu16.04Linux2021-05-03 15:02:1015
4243USx86_64bdist_wheelpip20.0.2OpenSSL 1.1.1 11 Sep 20181.2.2scikit-mobilityCPython3.7.545.2.0Ubuntu18.04Linux2021-12-09 17:14:3215
6350USx86_64bdist_wheelpip20.0.2OpenSSL 1.1.1 11 Sep 20181.2.2scikit-mobilityCPython3.7.545.2.0Ubuntu18.04Linux2022-01-10 20:56:3315
8305USx86_64bdist_wheelpip20.0.2OpenSSL 1.1.1 11 Sep 20181.2.3scikit-mobilityCPython3.7.545.2.0Ubuntu18.04Linux2022-02-22 12:58:0214
10443USx86_64bdist_wheelpip20.0.2OpenSSL 1.1.1 11 Sep 20181.2.3scikit-mobilityCPython3.7.545.2.0Ubuntu18.04Linux2022-04-14 18:27:5513
505USx86_64bdist_wheelpip19.0.3OpenSSL 1.0.2g 1 Mar 20161.2.2scikit-mobilityCPython3.7.340.8.0Ubuntu16.04Linux2021-05-05 17:22:3812
9124USx86_64bdist_wheelpip20.0.2OpenSSL 1.1.1 11 Sep 20181.2.3scikit-mobilityCPython3.7.545.2.0Ubuntu18.04Linux2022-03-14 14:16:0912
10402USx86_64bdist_wheelpip20.0.2OpenSSL 1.1.1 11 Sep 20181.2.3scikit-mobilityCPython3.7.545.2.0Ubuntu18.04Linux2022-04-13 21:31:4312
4771USx86_64bdist_wheelpip20.0.2OpenSSL 1.1.1 11 Sep 20181.2.2scikit-mobilityCPython3.7.545.2.0Ubuntu18.04Linux2021-12-22 14:21:0211