Overview

Dataset statistics

Number of variables8
Number of observations70227
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory4.3 MiB
Average record size in memory64.0 B

Variable types

Numeric1
Categorical7

Alerts

country has a high cardinality: 127 distinct valuesHigh cardinality
coverage has a high cardinality: 10044 distinct valuesHigh cardinality
issn has a high cardinality: 37076 distinct valuesHigh cardinality
publisher has a high cardinality: 11095 distinct valuesHigh cardinality
title has a high cardinality: 68293 distinct valuesHigh cardinality
country is highly imbalanced (55.1%)Imbalance
coverage is highly imbalanced (53.4%)Imbalance
publisher is highly imbalanced (52.0%)Imbalance
title is uniformly distributedUniform
sourceid has unique valuesUnique

Reproduction

Analysis started2023-05-04 15:12:09.807657
Analysis finished2023-05-04 15:12:13.789986
Duration3.98 seconds
Software versionydata-profiling vv4.1.2
Download configurationconfig.json

Variables

sourceid
Real number (ℝ)

Distinct70227
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.3699065 × 1010
Minimum12000
Maximum2.110106 × 1010
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size548.8 KiB
2023-05-04T15:12:13.870722image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Quantile statistics

Minimum12000
5-th percentile16616.3
Q198868
median1.9900194 × 1010
Q32.1100457 × 1010
95-th percentile2.1100932 × 1010
Maximum2.110106 × 1010
Range2.1101048 × 1010
Interquartile range (IQR)2.1100358 × 1010

Descriptive statistics

Standard deviation9.3402116 × 109
Coefficient of variation (CV)0.68181378
Kurtosis-1.472561
Mean1.3699065 × 1010
Median Absolute Deviation (MAD)1.2006955 × 109
Skewness-0.64060688
Sum9.6204427 × 1014
Variance8.7239552 × 1019
MonotonicityStrictly increasing
2023-05-04T15:12:14.024323image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
12000 1
 
< 0.1%
2.110032772 × 10101
 
< 0.1%
2.110032773 × 10101
 
< 0.1%
2.110032772 × 10101
 
< 0.1%
2.110032772 × 10101
 
< 0.1%
2.110032772 × 10101
 
< 0.1%
2.110032772 × 10101
 
< 0.1%
2.110032772 × 10101
 
< 0.1%
2.110032772 × 10101
 
< 0.1%
2.110032823 × 10101
 
< 0.1%
Other values (70217) 70217
> 99.9%
ValueCountFrequency (%)
12000 1
< 0.1%
12001 1
< 0.1%
12002 1
< 0.1%
12004 1
< 0.1%
12005 1
< 0.1%
12006 1
< 0.1%
12007 1
< 0.1%
12008 1
< 0.1%
12009 1
< 0.1%
12010 1
< 0.1%
ValueCountFrequency (%)
2.110105979 × 10101
< 0.1%
2.110105978 × 10101
< 0.1%
2.110105978 × 10101
< 0.1%
2.110105949 × 10101
< 0.1%
2.11010593 × 10101
< 0.1%
2.11010593 × 10101
< 0.1%
2.110105901 × 10101
< 0.1%
2.110105901 × 10101
< 0.1%
2.110105897 × 10101
< 0.1%
2.110105896 × 10101
< 0.1%

country
Categorical

HIGH CARDINALITY  IMBALANCE 

Distinct127
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size548.8 KiB
United States
37138 
United Kingdom
8812 
Netherlands
 
3481
Germany
 
2712
Switzerland
 
1167
Other values (122)
16917 

Length

Max length22
Median length13
Mean length11.377476
Min length4

Characters and Unicode

Total characters799006
Distinct characters50
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique18 ?
Unique (%)< 0.1%

Sample

1st rowUnited States
2nd rowUnited States
3rd rowUnited States
4th rowUnited States
5th rowUnited States

Common Values

ValueCountFrequency (%)
United States 37138
52.9%
United Kingdom 8812
 
12.5%
Netherlands 3481
 
5.0%
Germany 2712
 
3.9%
Switzerland 1167
 
1.7%
China 1150
 
1.6%
France 1136
 
1.6%
Italy 1058
 
1.5%
Spain 967
 
1.4%
Japan 862
 
1.2%
Other values (117) 11744
 
16.7%

Length

2023-05-04T15:12:14.172895image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
united 46079
38.9%
states 37138
31.4%
kingdom 8812
 
7.4%
netherlands 3481
 
2.9%
germany 2712
 
2.3%
switzerland 1167
 
1.0%
china 1150
 
1.0%
france 1136
 
1.0%
italy 1058
 
0.9%
spain 967
 
0.8%
Other values (136) 14679
 
12.4%

Most occurring characters

ValueCountFrequency (%)
t 129585
16.2%
e 101648
12.7%
n 73646
9.2%
i 66270
8.3%
a 65217
8.2%
d 63365
7.9%
48152
 
6.0%
U 46202
 
5.8%
s 43443
 
5.4%
S 40698
 
5.1%
Other values (40) 120780
15.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 632496
79.2%
Uppercase Letter 118358
 
14.8%
Space Separator 48152
 
6.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
t 129585
20.5%
e 101648
16.1%
n 73646
11.6%
i 66270
10.5%
a 65217
10.3%
d 63365
10.0%
s 43443
 
6.9%
r 14470
 
2.3%
o 13760
 
2.2%
m 12645
 
2.0%
Other values (16) 48447
 
7.7%
Uppercase Letter
ValueCountFrequency (%)
U 46202
39.0%
S 40698
34.4%
K 9274
 
7.8%
N 3787
 
3.2%
C 2931
 
2.5%
G 2880
 
2.4%
I 2386
 
2.0%
F 1932
 
1.6%
P 1474
 
1.2%
R 1350
 
1.1%
Other values (13) 5444
 
4.6%
Space Separator
ValueCountFrequency (%)
48152
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 750854
94.0%
Common 48152
 
6.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
t 129585
17.3%
e 101648
13.5%
n 73646
9.8%
i 66270
8.8%
a 65217
8.7%
d 63365
8.4%
U 46202
 
6.2%
s 43443
 
5.8%
S 40698
 
5.4%
r 14470
 
1.9%
Other values (39) 106310
14.2%
Common
ValueCountFrequency (%)
48152
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 799006
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
t 129585
16.2%
e 101648
12.7%
n 73646
9.2%
i 66270
8.3%
a 65217
8.2%
d 63365
7.9%
48152
 
6.0%
U 46202
 
5.8%
s 43443
 
5.4%
S 40698
 
5.1%
Other values (40) 120780
15.1%

coverage
Categorical

HIGH CARDINALITY  IMBALANCE 

Distinct10044
Distinct (%)14.3%
Missing0
Missing (%)0.0%
Memory size548.8 KiB
None
33355 
2008-2021
 
789
2009-2021
 
767
2010-2021
 
750
2018-2021
 
739
Other values (10039)
33827 

Length

Max length303
Median length213
Mean length9.1563786
Min length4

Characters and Unicode

Total characters643025
Distinct characters17
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique8447 ?
Unique (%)12.0%

Sample

1st row1999-2003, 2005, 2008
2nd row1958-2021
3rd row1969-2021
4th row2000-2021
5th row1988-2021

Common Values

ValueCountFrequency (%)
None 33355
47.5%
2008-2021 789
 
1.1%
2009-2021 767
 
1.1%
2010-2021 750
 
1.1%
2018-2021 739
 
1.1%
2019-2021 703
 
1.0%
1996-2021 694
 
1.0%
2011-2021 689
 
1.0%
2020-2021 657
 
0.9%
2017-2021 640
 
0.9%
Other values (10034) 30444
43.4%

Length

2023-05-04T15:12:14.348484image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
none 33355
36.2%
1996-2021 1049
 
1.1%
2008-2021 927
 
1.0%
2009-2021 912
 
1.0%
2010-2021 863
 
0.9%
2020-2021 850
 
0.9%
2018-2021 847
 
0.9%
2019-2021 813
 
0.9%
2011-2021 811
 
0.9%
2017-2021 778
 
0.8%
Other values (2621) 50924
55.3%

Most occurring characters

ValueCountFrequency (%)
2 102463
15.9%
0 96451
15.0%
1 89218
13.9%
9 62377
9.7%
- 46141
7.2%
o 33355
 
5.2%
N 33355
 
5.2%
e 33355
 
5.2%
n 33355
 
5.2%
, 21902
 
3.4%
Other values (7) 91053
14.2%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 419660
65.3%
Lowercase Letter 100065
 
15.6%
Dash Punctuation 46141
 
7.2%
Uppercase Letter 33355
 
5.2%
Other Punctuation 21902
 
3.4%
Space Separator 21902
 
3.4%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
2 102463
24.4%
0 96451
23.0%
1 89218
21.3%
9 62377
14.9%
8 18387
 
4.4%
7 14777
 
3.5%
6 11980
 
2.9%
5 8512
 
2.0%
4 8154
 
1.9%
3 7341
 
1.7%
Lowercase Letter
ValueCountFrequency (%)
o 33355
33.3%
e 33355
33.3%
n 33355
33.3%
Dash Punctuation
ValueCountFrequency (%)
- 46141
100.0%
Uppercase Letter
ValueCountFrequency (%)
N 33355
100.0%
Other Punctuation
ValueCountFrequency (%)
, 21902
100.0%
Space Separator
ValueCountFrequency (%)
21902
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 509605
79.3%
Latin 133420
 
20.7%

Most frequent character per script

Common
ValueCountFrequency (%)
2 102463
20.1%
0 96451
18.9%
1 89218
17.5%
9 62377
12.2%
- 46141
9.1%
, 21902
 
4.3%
21902
 
4.3%
8 18387
 
3.6%
7 14777
 
2.9%
6 11980
 
2.4%
Other values (3) 24007
 
4.7%
Latin
ValueCountFrequency (%)
o 33355
25.0%
N 33355
25.0%
e 33355
25.0%
n 33355
25.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 643025
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2 102463
15.9%
0 96451
15.0%
1 89218
13.9%
9 62377
9.7%
- 46141
7.2%
o 33355
 
5.2%
N 33355
 
5.2%
e 33355
 
5.2%
n 33355
 
5.2%
, 21902
 
3.4%
Other values (7) 91053
14.2%

issn
Categorical

Distinct37076
Distinct (%)52.8%
Missing0
Missing (%)0.0%
Memory size548.8 KiB
-
33072 
10716947
 
9
15417719
 
7
09353224
 
3
1038412X
 
3
Other values (37071)
37133 

Length

Max length38
Median length28
Mean length7.2422573
Min length1

Characters and Unicode

Total characters508602
Distinct characters14
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique37009 ?
Unique (%)52.7%

Sample

1st row15276228
2nd row00225002, 19383711
3rd row00225061, 15206696
4th row15299740, 15299732
5th row15736598, 08949867

Common Values

ValueCountFrequency (%)
- 33072
47.1%
10716947 9
 
< 0.1%
15417719 7
 
< 0.1%
09353224 3
 
< 0.1%
1038412X 3
 
< 0.1%
10503862 2
 
< 0.1%
15938883, 11296569 2
 
< 0.1%
10672478 2
 
< 0.1%
07347464 2
 
< 0.1%
10928138 2
 
< 0.1%
Other values (37066) 37123
52.9%

Length

2023-05-04T15:12:14.498168image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
33072
37.6%
10716947 9
 
< 0.1%
15417719 7
 
< 0.1%
16608151 3
 
< 0.1%
13474065 3
 
< 0.1%
00214922 3
 
< 0.1%
1038412x 3
 
< 0.1%
09353224 3
 
< 0.1%
0148396x 2
 
< 0.1%
14320681 2
 
< 0.1%
Other values (54838) 54949
62.4%

Most occurring characters

ValueCountFrequency (%)
1 63861
12.6%
0 59077
11.6%
2 48805
9.6%
5 40128
7.9%
3 39654
7.8%
7 38534
7.6%
4 38129
7.5%
9 36273
7.1%
6 35512
7.0%
8 34913
6.9%
Other values (4) 73716
14.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 434886
85.5%
Dash Punctuation 33072
 
6.5%
Other Punctuation 17829
 
3.5%
Space Separator 17829
 
3.5%
Uppercase Letter 4986
 
1.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 63861
14.7%
0 59077
13.6%
2 48805
11.2%
5 40128
9.2%
3 39654
9.1%
7 38534
8.9%
4 38129
8.8%
9 36273
8.3%
6 35512
8.2%
8 34913
8.0%
Dash Punctuation
ValueCountFrequency (%)
- 33072
100.0%
Other Punctuation
ValueCountFrequency (%)
, 17829
100.0%
Space Separator
ValueCountFrequency (%)
17829
100.0%
Uppercase Letter
ValueCountFrequency (%)
X 4986
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 503616
99.0%
Latin 4986
 
1.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1 63861
12.7%
0 59077
11.7%
2 48805
9.7%
5 40128
8.0%
3 39654
7.9%
7 38534
7.7%
4 38129
7.6%
9 36273
7.2%
6 35512
7.1%
8 34913
6.9%
Other values (3) 68730
13.6%
Latin
ValueCountFrequency (%)
X 4986
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 508602
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1 63861
12.6%
0 59077
11.6%
2 48805
9.6%
5 40128
7.9%
3 39654
7.8%
7 38534
7.6%
4 38129
7.5%
9 36273
7.1%
6 35512
7.0%
8 34913
6.9%
Other values (4) 73716
14.5%

publisher
Categorical

HIGH CARDINALITY  IMBALANCE 

Distinct11095
Distinct (%)15.8%
Missing0
Missing (%)0.0%
Memory size548.8 KiB
None
33944 
Taylor and Francis Ltd.
 
1483
Elsevier BV
 
771
Routledge
 
731
Elsevier
 
638
Other values (11090)
32660 

Length

Max length158
Median length144
Mean length15.862574
Min length3

Characters and Unicode

Total characters1113981
Distinct characters77
Distinct categories10 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique8493 ?
Unique (%)12.1%

Sample

1st rowColumbus State University
2nd rowWiley-Blackwell
3rd rowJohn Wiley &amp; Sons Inc.
4th rowRoutledge
5th rowWiley-Blackwell

Common Values

ValueCountFrequency (%)
None 33944
48.3%
Taylor and Francis Ltd. 1483
 
2.1%
Elsevier BV 771
 
1.1%
Routledge 731
 
1.0%
Elsevier 638
 
0.9%
Wiley-Blackwell Publishing Ltd 638
 
0.9%
SAGE Publications Inc. 516
 
0.7%
Springer Verlag 511
 
0.7%
Elsevier Ltd. 473
 
0.7%
Emerald Group Publishing Ltd. 447
 
0.6%
Other values (11085) 30075
42.8%

Length

2023-05-04T15:12:14.669588image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
none 33944
 
20.6%
ltd 6130
 
3.7%
of 5689
 
3.5%
and 4038
 
2.4%
university 3672
 
2.2%
publishing 3496
 
2.1%
inc 2902
 
1.8%
press 2755
 
1.7%
elsevier 2522
 
1.5%
de 2212
 
1.3%
Other values (10581) 97505
59.1%

Most occurring characters

ValueCountFrequency (%)
e 115032
 
10.3%
n 97400
 
8.7%
94648
 
8.5%
o 82933
 
7.4%
i 82401
 
7.4%
a 59831
 
5.4%
r 53928
 
4.8%
s 50889
 
4.6%
t 48634
 
4.4%
l 43017
 
3.9%
Other values (67) 385268
34.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 836450
75.1%
Uppercase Letter 164894
 
14.8%
Space Separator 94648
 
8.5%
Other Punctuation 13912
 
1.2%
Dash Punctuation 2155
 
0.2%
Open Punctuation 853
 
0.1%
Close Punctuation 849
 
0.1%
Math Symbol 132
 
< 0.1%
Decimal Number 87
 
< 0.1%
Connector Punctuation 1
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 115032
13.8%
n 97400
11.6%
o 82933
9.9%
i 82401
9.9%
a 59831
 
7.2%
r 53928
 
6.4%
s 50889
 
6.1%
t 48634
 
5.8%
l 43017
 
5.1%
c 36358
 
4.3%
Other values (16) 166027
19.8%
Uppercase Letter
ValueCountFrequency (%)
N 37339
22.6%
S 13843
 
8.4%
P 13758
 
8.3%
A 9717
 
5.9%
E 9594
 
5.8%
I 9494
 
5.8%
L 8824
 
5.4%
C 7841
 
4.8%
M 6371
 
3.9%
B 6000
 
3.6%
Other values (16) 42113
25.5%
Other Punctuation
ValueCountFrequency (%)
. 10725
77.1%
, 1419
 
10.2%
; 537
 
3.9%
& 535
 
3.8%
' 460
 
3.3%
/ 168
 
1.2%
" 45
 
0.3%
: 22
 
0.2%
* 1
 
< 0.1%
Decimal Number
ValueCountFrequency (%)
8 29
33.3%
1 26
29.9%
5 12
13.8%
0 7
 
8.0%
3 5
 
5.7%
4 5
 
5.7%
2 2
 
2.3%
9 1
 
1.1%
Open Punctuation
ValueCountFrequency (%)
( 852
99.9%
[ 1
 
0.1%
Close Punctuation
ValueCountFrequency (%)
) 848
99.9%
] 1
 
0.1%
Space Separator
ValueCountFrequency (%)
94648
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 2155
100.0%
Math Symbol
ValueCountFrequency (%)
+ 132
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 1001344
89.9%
Common 112637
 
10.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 115032
 
11.5%
n 97400
 
9.7%
o 82933
 
8.3%
i 82401
 
8.2%
a 59831
 
6.0%
r 53928
 
5.4%
s 50889
 
5.1%
t 48634
 
4.9%
l 43017
 
4.3%
N 37339
 
3.7%
Other values (42) 329940
32.9%
Common
ValueCountFrequency (%)
94648
84.0%
. 10725
 
9.5%
- 2155
 
1.9%
, 1419
 
1.3%
( 852
 
0.8%
) 848
 
0.8%
; 537
 
0.5%
& 535
 
0.5%
' 460
 
0.4%
/ 168
 
0.1%
Other values (15) 290
 
0.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1113981
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 115032
 
10.3%
n 97400
 
8.7%
94648
 
8.5%
o 82933
 
7.4%
i 82401
 
7.4%
a 59831
 
5.4%
r 53928
 
4.8%
s 50889
 
4.6%
t 48634
 
4.4%
l 43017
 
3.9%
Other values (67) 385268
34.6%

region
Categorical

Distinct9
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size548.8 KiB
Northern America
37936 
Western Europe
21255 
Asiatic Region
4399 
Eastern Europe
 
3397
Latin America
 
1176
Other values (4)
 
2064

Length

Max length18
Median length16
Mean length15.006166
Min length6

Characters and Unicode

Total characters1053838
Distinct characters27
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNorthern America
2nd rowNorthern America
3rd rowNorthern America
4th rowNorthern America
5th rowNorthern America

Common Values

ValueCountFrequency (%)
Northern America 37936
54.0%
Western Europe 21255
30.3%
Asiatic Region 4399
 
6.3%
Eastern Europe 3397
 
4.8%
Latin America 1176
 
1.7%
Middle East 892
 
1.3%
Pacific Region 792
 
1.1%
Africa 240
 
0.3%
Africa/Middle East 140
 
0.2%

Length

2023-05-04T15:12:14.830105image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-05-04T15:12:14.992581image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
ValueCountFrequency (%)
america 39112
27.9%
northern 37936
27.1%
europe 24652
17.6%
western 21255
15.2%
region 5191
 
3.7%
asiatic 4399
 
3.1%
eastern 3397
 
2.4%
latin 1176
 
0.8%
east 1032
 
0.7%
middle 892
 
0.6%
Other values (3) 1172
 
0.8%

Most occurring characters

ValueCountFrequency (%)
r 164668
15.6%
e 153830
14.6%
69987
 
6.6%
t 69195
 
6.6%
n 68955
 
6.5%
o 67779
 
6.4%
i 57273
 
5.4%
a 50288
 
4.8%
c 45475
 
4.3%
A 43891
 
4.2%
Other values (17) 262497
24.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 843357
80.0%
Uppercase Letter 140354
 
13.3%
Space Separator 69987
 
6.6%
Other Punctuation 140
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
r 164668
19.5%
e 153830
18.2%
t 69195
8.2%
n 68955
8.2%
o 67779
8.0%
i 57273
 
6.8%
a 50288
 
6.0%
c 45475
 
5.4%
m 39112
 
4.6%
h 37936
 
4.5%
Other values (7) 88846
10.5%
Uppercase Letter
ValueCountFrequency (%)
A 43891
31.3%
N 37936
27.0%
E 29081
20.7%
W 21255
15.1%
R 5191
 
3.7%
L 1176
 
0.8%
M 1032
 
0.7%
P 792
 
0.6%
Space Separator
ValueCountFrequency (%)
69987
100.0%
Other Punctuation
ValueCountFrequency (%)
/ 140
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 983711
93.3%
Common 70127
 
6.7%

Most frequent character per script

Latin
ValueCountFrequency (%)
r 164668
16.7%
e 153830
15.6%
t 69195
 
7.0%
n 68955
 
7.0%
o 67779
 
6.9%
i 57273
 
5.8%
a 50288
 
5.1%
c 45475
 
4.6%
A 43891
 
4.5%
m 39112
 
4.0%
Other values (15) 223245
22.7%
Common
ValueCountFrequency (%)
69987
99.8%
/ 140
 
0.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1053838
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
r 164668
15.6%
e 153830
14.6%
69987
 
6.6%
t 69195
 
6.6%
n 68955
 
6.5%
o 67779
 
6.4%
i 57273
 
5.4%
a 50288
 
4.8%
c 45475
 
4.3%
A 43891
 
4.2%
Other values (17) 262497
24.9%

title
Categorical

HIGH CARDINALITY  UNIFORM 

Distinct68293
Distinct (%)97.2%
Missing0
Missing (%)0.0%
Memory size548.8 KiB
Optics InfoBase Conference Papers
 
51
22nd International Congress on Sound and Vibration, ICSV 2015
 
11
41st EPS Conference on Plasma Physics, EPS 2014
 
10
Proceedings of the ASME Turbo Expo
 
9
IEEE Workshop on Applications of Signal Processing to Audio and Acoustics
 
7
Other values (68288)
70139 

Length

Max length444
Median length248
Mean length62.681775
Min length2

Characters and Unicode

Total characters4401953
Distinct characters87
Distinct categories10 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique66867 ?
Unique (%)95.2%

Sample

1st rowJournal of Technology in Counseling
2nd rowJournal of the Experimental Analysis of Behavior
3rd rowJournal of the History of the Behavioral Sciences
4th rowJournal of Trauma and Dissociation
5th rowJournal of Traumatic Stress

Common Values

ValueCountFrequency (%)
Optics InfoBase Conference Papers 51
 
0.1%
22nd International Congress on Sound and Vibration, ICSV 2015 11
 
< 0.1%
41st EPS Conference on Plasma Physics, EPS 2014 10
 
< 0.1%
Proceedings of the ASME Turbo Expo 9
 
< 0.1%
IEEE Workshop on Applications of Signal Processing to Audio and Acoustics 7
 
< 0.1%
National Radio Science Conference, NRSC, Proceedings 7
 
< 0.1%
2014 13th International Conference on Control Automation Robotics and Vision, ICARCV 2014 7
 
< 0.1%
Proceedings of the Annual International Conference on Mobile Computing and Networking, MOBICOM 7
 
< 0.1%
DOLAP: Proceedings of the ACM International Workshop on Data Warehousing and OLAP 6
 
< 0.1%
Proceedings of the Electronic Packaging Technology Conference, EPTC 6
 
< 0.1%
Other values (68283) 70106
99.8%

Length

2023-05-04T15:12:15.171896image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
and 31436
 
5.4%
of 27668
 
4.7%
on 23799
 
4.1%
international 22208
 
3.8%
conference 21554
 
3.7%
18233
 
3.1%
proceedings 18050
 
3.1%
the 14273
 
2.4%
journal 10937
 
1.9%
in 7122
 
1.2%
Other values (31286) 391802
66.7%

Most occurring characters

ValueCountFrequency (%)
516927
 
11.7%
n 388063
 
8.8%
e 365186
 
8.3%
o 304133
 
6.9%
i 260449
 
5.9%
a 241835
 
5.5%
t 223534
 
5.1%
r 210511
 
4.8%
s 148099
 
3.4%
c 147729
 
3.4%
Other values (77) 1595487
36.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 3044695
69.2%
Uppercase Letter 556104
 
12.6%
Space Separator 516927
 
11.7%
Decimal Number 208172
 
4.7%
Other Punctuation 47795
 
1.1%
Dash Punctuation 24581
 
0.6%
Open Punctuation 1757
 
< 0.1%
Close Punctuation 1752
 
< 0.1%
Math Symbol 152
 
< 0.1%
Connector Punctuation 18
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
n 388063
12.7%
e 365186
12.0%
o 304133
10.0%
i 260449
8.6%
a 241835
 
7.9%
t 223534
 
7.3%
r 210511
 
6.9%
s 148099
 
4.9%
c 147729
 
4.9%
l 129958
 
4.3%
Other values (16) 625198
20.5%
Uppercase Letter
ValueCountFrequency (%)
C 74882
13.5%
I 67952
12.2%
S 60891
10.9%
E 59521
10.7%
P 46901
8.4%
A 43460
 
7.8%
M 32939
 
5.9%
T 24405
 
4.4%
R 18317
 
3.3%
D 15284
 
2.7%
Other values (16) 111552
20.1%
Other Punctuation
ValueCountFrequency (%)
, 33321
69.7%
: 5506
 
11.5%
' 2817
 
5.9%
/ 2603
 
5.4%
. 1865
 
3.9%
; 761
 
1.6%
& 382
 
0.8%
" 268
 
0.6%
# 176
 
0.4%
? 30
 
0.1%
Other values (4) 66
 
0.1%
Decimal Number
ValueCountFrequency (%)
0 60981
29.3%
2 52105
25.0%
1 45042
21.6%
5 7477
 
3.6%
9 7409
 
3.6%
4 7390
 
3.5%
8 7300
 
3.5%
3 7183
 
3.5%
6 6989
 
3.4%
7 6296
 
3.0%
Math Symbol
ValueCountFrequency (%)
= 103
67.8%
+ 48
31.6%
| 1
 
0.7%
Dash Punctuation
ValueCountFrequency (%)
- 24580
> 99.9%
1
 
< 0.1%
Open Punctuation
ValueCountFrequency (%)
( 1735
98.7%
[ 22
 
1.3%
Close Punctuation
ValueCountFrequency (%)
) 1730
98.7%
] 22
 
1.3%
Space Separator
ValueCountFrequency (%)
516927
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 18
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 3600799
81.8%
Common 801154
 
18.2%

Most frequent character per script

Latin
ValueCountFrequency (%)
n 388063
 
10.8%
e 365186
 
10.1%
o 304133
 
8.4%
i 260449
 
7.2%
a 241835
 
6.7%
t 223534
 
6.2%
r 210511
 
5.8%
s 148099
 
4.1%
c 147729
 
4.1%
l 129958
 
3.6%
Other values (42) 1181302
32.8%
Common
ValueCountFrequency (%)
516927
64.5%
0 60981
 
7.6%
2 52105
 
6.5%
1 45042
 
5.6%
, 33321
 
4.2%
- 24580
 
3.1%
5 7477
 
0.9%
9 7409
 
0.9%
4 7390
 
0.9%
8 7300
 
0.9%
Other values (25) 38622
 
4.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII 4401952
> 99.9%
Punctuation 1
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
516927
 
11.7%
n 388063
 
8.8%
e 365186
 
8.3%
o 304133
 
6.9%
i 260449
 
5.9%
a 241835
 
5.5%
t 223534
 
5.1%
r 210511
 
4.8%
s 148099
 
3.4%
c 147729
 
3.4%
Other values (76) 1595486
36.2%
Punctuation
ValueCountFrequency (%)
1
100.0%

type
Categorical

Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size548.8 KiB
journal
34331 
conference and proceedings
33645 
book series
 
1462
trade journal
 
789

Length

Max length26
Median length13
Mean length16.253378
Min length7

Characters and Unicode

Total characters1141426
Distinct characters19
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowjournal
2nd rowjournal
3rd rowjournal
4th rowjournal
5th rowjournal

Common Values

ValueCountFrequency (%)
journal 34331
48.9%
conference and proceedings 33645
47.9%
book series 1462
 
2.1%
trade journal 789
 
1.1%

Length

2023-05-04T15:12:15.318259image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-05-04T15:12:15.463582image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
ValueCountFrequency (%)
journal 35120
25.1%
conference 33645
24.1%
and 33645
24.1%
proceedings 33645
24.1%
book 1462
 
1.0%
series 1462
 
1.0%
trade 789
 
0.6%

Most occurring characters

ValueCountFrequency (%)
e 171938
15.1%
n 169700
14.9%
o 105334
9.2%
r 104661
9.2%
c 100935
8.8%
a 69554
 
6.1%
69541
 
6.1%
d 68079
 
6.0%
s 36569
 
3.2%
j 35120
 
3.1%
Other values (9) 209995
18.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1071885
93.9%
Space Separator 69541
 
6.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 171938
16.0%
n 169700
15.8%
o 105334
9.8%
r 104661
9.8%
c 100935
9.4%
a 69554
 
6.5%
d 68079
 
6.4%
s 36569
 
3.4%
j 35120
 
3.3%
l 35120
 
3.3%
Other values (8) 174875
16.3%
Space Separator
ValueCountFrequency (%)
69541
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 1071885
93.9%
Common 69541
 
6.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 171938
16.0%
n 169700
15.8%
o 105334
9.8%
r 104661
9.8%
c 100935
9.4%
a 69554
 
6.5%
d 68079
 
6.4%
s 36569
 
3.4%
j 35120
 
3.3%
l 35120
 
3.3%
Other values (8) 174875
16.3%
Common
ValueCountFrequency (%)
69541
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1141426
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 171938
15.1%
n 169700
14.9%
o 105334
9.2%
r 104661
9.2%
c 100935
8.8%
a 69554
 
6.1%
69541
 
6.1%
d 68079
 
6.0%
s 36569
 
3.2%
j 35120
 
3.1%
Other values (9) 209995
18.4%

Interactions

2023-05-04T15:12:13.148964image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Correlations

2023-05-04T15:12:15.561854image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
sourceidregiontype
sourceid1.0000.0900.315
region0.0901.0000.337
type0.3150.3371.000

Missing values

2023-05-04T15:12:13.375342image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
A simple visualization of nullity by column.
2023-05-04T15:12:13.630301image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

sourceidcountrycoverageissnpublisherregiontitletype
012000United States1999-2003, 2005, 200815276228Columbus State UniversityNorthern AmericaJournal of Technology in Counselingjournal
112001United States1958-202100225002, 19383711Wiley-BlackwellNorthern AmericaJournal of the Experimental Analysis of Behaviorjournal
212002United States1969-202100225061, 15206696John Wiley &amp; Sons Inc.Northern AmericaJournal of the History of the Behavioral Sciencesjournal
312004United States2000-202115299740, 15299732RoutledgeNorthern AmericaJournal of Trauma and Dissociationjournal
412005United States1988-202115736598, 08949867Wiley-BlackwellNorthern AmericaJournal of Traumatic Stressjournal
512006United States1971-202110959084, 00018791Academic Press Inc.Northern AmericaJournal of Vocational Behaviorjournal
612007Hungary1946, 1948, 1977-199900390690Kozponti Statisztikai HivatalEastern EuropeStatisztikai Szemlejournal
712008Hungary1980, 1982-1983, 1985, 2016-202120648251, 00187828Hungarian Central Statistical OfficeEastern EuropeTeruleti Statisztikajournal
812009Germany2000-201809426051J.C. Cotta'sche Buchhandlung Nachvolger GmbHWestern EuropeKinderanalyse (discontinued)journal
912010United States1950-1958, 1960-1963, 1965-202100664308, 15452085Annual Reviews Inc.Northern AmericaAnnual Review of Psychologyjournal
sourceidcountrycoverageissnpublisherregiontitletype
7021721101058963United States2016-202120597991SAGE Publications Inc.Northern AmericaMethodological Innovationsjournal
7021821101058966Denmark202122468498Aalborg University PressWestern EuropeJournal of Somaestheticsjournal
7021921101059010Netherlands202125424246, 25424238Brill Academic PublishersWestern EuropeInternational Journal of Asian Christianityjournal
7022021101059012Germany202125693263Walter de Gruyter GmbHWestern EuropeChemistry Teacher Internationaljournal
7022121101059299United States2014-202123482451, 23220058SAGE Publications Inc.Northern AmericaAsian Journal of Legal Educationjournal
7022221101059300Ukraine202120753829, 20753810V. N. Karazin Kharkiv National UniversityEastern EuropeBiophysical Bulletinjournal
7022321101059489United States202125735985EnPress Publisher, LLCNorthern AmericaTrends in Immunotherapyjournal
7022421101059784China2020-202120961146Chinese Academy of SciencesAsiatic RegionJournal of Cyber Securityjournal
7022521101059785Thailand2015-202124523151Kasetsart University Research and Development InstituteAsiatic RegionKasetsart Journal of Social Sciencesjournal
7022621101059786United States2019-202015297470, 15336239Brookings Institution PressNorthern AmericaEconomiajournal