Dataset statistics
Number of variables | 2 |
---|---|
Number of observations | 121234 |
Missing cells | 0 |
Missing cells (%) | 0.0% |
Duplicate rows | 0 |
Duplicate rows (%) | 0.0% |
Total size in memory | 1.9 MiB |
Average record size in memory | 16.0 B |
Variable types
Categorical | 1 |
---|---|
Numeric | 1 |
Reproduction
Analysis started | 2023-05-04 15:10:58.531275 |
---|---|
Analysis finished | 2023-05-04 15:10:59.180852 |
Duration | 0.65 seconds |
Software version | ydata-profiling vv4.1.2 |
Download configuration | config.json |
area
Categorical
Distinct | 32 |
---|---|
Distinct (%) | < 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 947.3 KiB |
Computer Science | |
---|---|
Engineering | |
Medicine | |
Social Sciences | |
Arts and Humanities | 4844 |
Other values (27) |
Length
Max length | 36 |
---|---|
Median length | 28 |
Mean length | 15.401727 |
Min length | 3 |
Characters and Unicode
Total characters | 1867213 |
---|---|
Distinct characters | 37 |
Distinct categories | 3 ? |
Distinct scripts | 2 ? |
Distinct blocks | 1 ? |
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Sample
1st row | Biochemistry |
---|---|
2nd row | Genetics and Molecular Biology |
3rd row | Biochemistry |
4th row | Genetics and Molecular Biology |
5th row | Immunology and Microbiology |
Common Values
Value | Count | Frequency (%) |
Computer Science | 22023 | |
Engineering | 19496 | |
Medicine | 11664 | 9.6% |
Social Sciences | 10761 | 8.9% |
Arts and Humanities | 4844 | 4.0% |
Mathematics | 4772 | 3.9% |
Physics and Astronomy | 3821 | 3.2% |
Materials Science | 3609 | 3.0% |
Earth and Planetary Sciences | 3311 | 2.7% |
Environmental Science | 3245 | 2.7% |
Other values (22) | 33688 |
Length
Value | Count | Frequency (%) |
science | 28877 | |
and | 24258 | 10.9% |
computer | 22023 | 9.9% |
engineering | 21338 | 9.6% |
sciences | 18845 | 8.4% |
medicine | 11664 | 5.2% |
social | 10761 | 4.8% |
arts | 4844 | 2.2% |
humanities | 4844 | 2.2% |
mathematics | 4772 | 2.1% |
Other values (36) | 71061 |
Most occurring characters
Value | Count | Frequency (%) |
e | 241083 | |
n | 201257 | 10.8% |
i | 195186 | 10.5% |
c | 166886 | 8.9% |
102053 | 5.5% | |
a | 93779 | 5.0% |
r | 90312 | 4.8% |
o | 87871 | 4.7% |
t | 79103 | 4.2% |
s | 76413 | 4.1% |
Other values (27) | 533270 |
Most occurring categories
Value | Count | Frequency (%) |
Lowercase Letter | 1566134 | |
Uppercase Letter | 199026 | 10.7% |
Space Separator | 102053 | 5.5% |
Most frequent character per category
Lowercase Letter
Value | Count | Frequency (%) |
e | 241083 | |
n | 201257 | |
i | 195186 | |
c | 166886 | |
a | 93779 | 6.0% |
r | 90312 | 5.8% |
o | 87871 | 5.6% |
t | 79103 | 5.1% |
s | 76413 | 4.9% |
g | 66894 | 4.3% |
Other values (11) | 267350 |
Uppercase Letter
Value | Count | Frequency (%) |
S | 58483 | |
E | 34017 | |
M | 26852 | |
C | 25183 | |
A | 14516 | 7.3% |
P | 11943 | 6.0% |
B | 11919 | 6.0% |
H | 5783 | 2.9% |
G | 3034 | 1.5% |
D | 1973 | 1.0% |
Other values (5) | 5323 | 2.7% |
Space Separator
Value | Count | Frequency (%) |
102053 |
Most occurring scripts
Value | Count | Frequency (%) |
Latin | 1765160 | |
Common | 102053 | 5.5% |
Most frequent character per script
Latin
Value | Count | Frequency (%) |
e | 241083 | |
n | 201257 | |
i | 195186 | |
c | 166886 | 9.5% |
a | 93779 | 5.3% |
r | 90312 | 5.1% |
o | 87871 | 5.0% |
t | 79103 | 4.5% |
s | 76413 | 4.3% |
g | 66894 | 3.8% |
Other values (26) | 466376 |
Common
Value | Count | Frequency (%) |
102053 |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 1867213 |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
e | 241083 | |
n | 201257 | 10.8% |
i | 195186 | 10.5% |
c | 166886 | 8.9% |
102053 | 5.5% | |
a | 93779 | 5.0% |
r | 90312 | 4.8% |
o | 87871 | 4.7% |
t | 79103 | 4.2% |
s | 76413 | 4.1% |
Other values (27) | 533270 |
journal__sourceid
Real number (ℝ)
Distinct | 70227 |
---|---|
Distinct (%) | 57.9% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 1.3797662 × 1010 |
Minimum | 12000 |
---|---|
Maximum | 2.110106 × 1010 |
Zeros | 0 |
Zeros (%) | 0.0% |
Negative | 0 |
Negative (%) | 0.0% |
Memory size | 947.3 KiB |
Quantile statistics
Minimum | 12000 |
---|---|
5-th percentile | 16304 |
Q1 | 95895.75 |
median | 2.1100196 × 1010 |
Q3 | 2.1100777 × 1010 |
95-th percentile | 2.1100937 × 1010 |
Maximum | 2.110106 × 1010 |
Range | 2.1101048 × 1010 |
Interquartile range (IQR) | 2.1100681 × 1010 |
Descriptive statistics
Standard deviation | 9.3393585 × 109 |
---|---|
Coefficient of variation (CV) | 0.67687981 |
Kurtosis | -1.4459207 |
Mean | 1.3797662 × 1010 |
Median Absolute Deviation (MAD) | 862052.5 |
Skewness | -0.66319873 |
Sum | 1.6727457 × 1015 |
Variance | 8.7223617 × 1019 |
Monotonicity | Not monotonic |
Value | Count | Frequency (%) |
22476 | 11 | < 0.1% |
13845 | 10 | < 0.1% |
19383 | 9 | < 0.1% |
16319 | 9 | < 0.1% |
16088 | 9 | < 0.1% |
2.11009441 × 1010 | 9 | < 0.1% |
1.970020167 × 1010 | 9 | < 0.1% |
24578 | 9 | < 0.1% |
2.110026842 × 1010 | 9 | < 0.1% |
2.110093188 × 1010 | 9 | < 0.1% |
Other values (70217) | 121141 |
Value | Count | Frequency (%) |
12000 | 2 | |
12001 | 2 | |
12002 | 2 | |
12004 | 2 | |
12005 | 2 | |
12006 | 4 | |
12007 | 1 | < 0.1% |
12008 | 4 | |
12009 | 2 | |
12010 | 1 | < 0.1% |
Value | Count | Frequency (%) |
2.110105979 × 1010 | 5 | |
2.110105978 × 1010 | 1 | < 0.1% |
2.110105978 × 1010 | 2 | < 0.1% |
2.110105949 × 1010 | 2 | < 0.1% |
2.11010593 × 1010 | 4 | |
2.11010593 × 1010 | 1 | < 0.1% |
2.110105901 × 1010 | 2 | < 0.1% |
2.110105901 × 1010 | 2 | < 0.1% |
2.110105897 × 1010 | 2 | < 0.1% |
2.110105896 × 1010 | 1 | < 0.1% |
journal__sourceid | area | |
---|---|---|
journal__sourceid | 1.000 | 0.158 |
area | 0.158 | 1.000 |
area | journal__sourceid | |
---|---|---|
0 | Biochemistry | 16801 |
1 | Genetics and Molecular Biology | 16801 |
2 | Biochemistry | 18434 |
3 | Genetics and Molecular Biology | 18434 |
4 | Immunology and Microbiology | 20651 |
5 | Medicine | 20651 |
6 | Biochemistry | 18395 |
7 | Genetics and Molecular Biology | 18395 |
8 | Neuroscience | 14181 |
9 | Biochemistry | 22126 |
area | journal__sourceid | |
---|---|---|
121224 | Medicine | 96394 |
121225 | Chemical Engineering | 21101055130 |
121226 | Environmental Science | 21101055130 |
121227 | Pharmacology | 21101047129 |
121228 | Toxicology and Pharmaceutics | 21101047129 |
121229 | Medicine | 21101043236 |
121230 | Medicine | 21101042998 |
121231 | Medicine | 21101042490 |
121232 | Arts and Humanities | 21101046690 |
121233 | Social Sciences | 21101046690 |