Overview

Dataset statistics

Number of variables2
Number of observations121234
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory1.9 MiB
Average record size in memory16.0 B

Variable types

Categorical1
Numeric1

Reproduction

Analysis started2023-05-04 15:10:58.531275
Analysis finished2023-05-04 15:10:59.180852
Duration0.65 seconds
Software versionydata-profiling vv4.1.2
Download configurationconfig.json

Variables

area
Categorical

Distinct32
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size947.3 KiB
Computer Science
22023 
Engineering
19496 
Medicine
11664 
Social Sciences
10761 
Arts and Humanities
 
4844
Other values (27)
52446 

Length

Max length36
Median length28
Mean length15.401727
Min length3

Characters and Unicode

Total characters1867213
Distinct characters37
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowBiochemistry
2nd rowGenetics and Molecular Biology
3rd rowBiochemistry
4th rowGenetics and Molecular Biology
5th rowImmunology and Microbiology

Common Values

ValueCountFrequency (%)
Computer Science 22023
18.2%
Engineering 19496
16.1%
Medicine 11664
 
9.6%
Social Sciences 10761
 
8.9%
Arts and Humanities 4844
 
4.0%
Mathematics 4772
 
3.9%
Physics and Astronomy 3821
 
3.2%
Materials Science 3609
 
3.0%
Earth and Planetary Sciences 3311
 
2.7%
Environmental Science 3245
 
2.7%
Other values (22) 33688
27.8%

Length

2023-05-04T15:10:59.254415image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
science 28877
12.9%
and 24258
 
10.9%
computer 22023
 
9.9%
engineering 21338
 
9.6%
sciences 18845
 
8.4%
medicine 11664
 
5.2%
social 10761
 
4.8%
arts 4844
 
2.2%
humanities 4844
 
2.2%
mathematics 4772
 
2.1%
Other values (36) 71061
31.8%

Most occurring characters

ValueCountFrequency (%)
e 241083
12.9%
n 201257
 
10.8%
i 195186
 
10.5%
c 166886
 
8.9%
102053
 
5.5%
a 93779
 
5.0%
r 90312
 
4.8%
o 87871
 
4.7%
t 79103
 
4.2%
s 76413
 
4.1%
Other values (27) 533270
28.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1566134
83.9%
Uppercase Letter 199026
 
10.7%
Space Separator 102053
 
5.5%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 241083
15.4%
n 201257
12.9%
i 195186
12.5%
c 166886
10.7%
a 93779
 
6.0%
r 90312
 
5.8%
o 87871
 
5.6%
t 79103
 
5.1%
s 76413
 
4.9%
g 66894
 
4.3%
Other values (11) 267350
17.1%
Uppercase Letter
ValueCountFrequency (%)
S 58483
29.4%
E 34017
17.1%
M 26852
13.5%
C 25183
12.7%
A 14516
 
7.3%
P 11943
 
6.0%
B 11919
 
6.0%
H 5783
 
2.9%
G 3034
 
1.5%
D 1973
 
1.0%
Other values (5) 5323
 
2.7%
Space Separator
ValueCountFrequency (%)
102053
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 1765160
94.5%
Common 102053
 
5.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 241083
13.7%
n 201257
11.4%
i 195186
11.1%
c 166886
 
9.5%
a 93779
 
5.3%
r 90312
 
5.1%
o 87871
 
5.0%
t 79103
 
4.5%
s 76413
 
4.3%
g 66894
 
3.8%
Other values (26) 466376
26.4%
Common
ValueCountFrequency (%)
102053
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1867213
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 241083
12.9%
n 201257
 
10.8%
i 195186
 
10.5%
c 166886
 
8.9%
102053
 
5.5%
a 93779
 
5.0%
r 90312
 
4.8%
o 87871
 
4.7%
t 79103
 
4.2%
s 76413
 
4.1%
Other values (27) 533270
28.6%

journal__sourceid
Real number (ℝ)

Distinct70227
Distinct (%)57.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.3797662 × 1010
Minimum12000
Maximum2.110106 × 1010
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size947.3 KiB
2023-05-04T15:10:59.409712image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Quantile statistics

Minimum12000
5-th percentile16304
Q195895.75
median2.1100196 × 1010
Q32.1100777 × 1010
95-th percentile2.1100937 × 1010
Maximum2.110106 × 1010
Range2.1101048 × 1010
Interquartile range (IQR)2.1100681 × 1010

Descriptive statistics

Standard deviation9.3393585 × 109
Coefficient of variation (CV)0.67687981
Kurtosis-1.4459207
Mean1.3797662 × 1010
Median Absolute Deviation (MAD)862052.5
Skewness-0.66319873
Sum1.6727457 × 1015
Variance8.7223617 × 1019
MonotonicityNot monotonic
2023-05-04T15:10:59.600235image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
22476 11
 
< 0.1%
13845 10
 
< 0.1%
19383 9
 
< 0.1%
16319 9
 
< 0.1%
16088 9
 
< 0.1%
2.11009441 × 10109
 
< 0.1%
1.970020167 × 10109
 
< 0.1%
24578 9
 
< 0.1%
2.110026842 × 10109
 
< 0.1%
2.110093188 × 10109
 
< 0.1%
Other values (70217) 121141
99.9%
ValueCountFrequency (%)
12000 2
< 0.1%
12001 2
< 0.1%
12002 2
< 0.1%
12004 2
< 0.1%
12005 2
< 0.1%
12006 4
< 0.1%
12007 1
 
< 0.1%
12008 4
< 0.1%
12009 2
< 0.1%
12010 1
 
< 0.1%
ValueCountFrequency (%)
2.110105979 × 10105
< 0.1%
2.110105978 × 10101
 
< 0.1%
2.110105978 × 10102
 
< 0.1%
2.110105949 × 10102
 
< 0.1%
2.11010593 × 10104
< 0.1%
2.11010593 × 10101
 
< 0.1%
2.110105901 × 10102
 
< 0.1%
2.110105901 × 10102
 
< 0.1%
2.110105897 × 10102
 
< 0.1%
2.110105896 × 10101
 
< 0.1%

Interactions

2023-05-04T15:10:58.822345image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Correlations

2023-05-04T15:10:59.714847image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
journal__sourceidarea
journal__sourceid1.0000.158
area0.1581.000

Missing values

2023-05-04T15:10:58.992167image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
A simple visualization of nullity by column.
2023-05-04T15:10:59.105403image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

areajournal__sourceid
0Biochemistry16801
1Genetics and Molecular Biology16801
2Biochemistry18434
3Genetics and Molecular Biology18434
4Immunology and Microbiology20651
5Medicine20651
6Biochemistry18395
7Genetics and Molecular Biology18395
8Neuroscience14181
9Biochemistry22126
areajournal__sourceid
121224Medicine96394
121225Chemical Engineering21101055130
121226Environmental Science21101055130
121227Pharmacology21101047129
121228Toxicology and Pharmaceutics21101047129
121229Medicine21101043236
121230Medicine21101042998
121231Medicine21101042490
121232Arts and Humanities21101046690
121233Social Sciences21101046690